Traffic Controls
Shadow Mode And Canary
Arc has two distinct rollout mechanisms: shadow traffic for background comparison and canary rollout for controlled user-facing exposure.
Shadow Evaluation Loop
Shadow mode duplicates a percentage of traffic to a second model in the background. The primary response still goes back to the user, while Arc stores comparison results for later analysis.
Shadow Evaluation Loop
Live Request
The route receives normal production traffic.
Primary Response
The configured route model answers the user-facing request.
Caller
The user only sees the primary response.
Same Request
A sampled request is copied in the background.
Shadow Response
Arc asks the shadow model to answer the same prompt without affecting production output.
Dashboard Results
Accuracy, conciseness, completeness, reasoning, and win-rate views.
Best use
Privacy posture
How A User Configures It
1. Enable
Open a route, go to the Shadow tab, and turn shadow mode on for that route.
2. Sample
Choose the percentage of requests that should be shadow tested in the background.
3. Compare
Pick a candidate shadow model, save the route, and then send normal production traffic through it.
The route UI already guides the operator through the real prerequisites: if no provider keys are configured, Arc prompts them to add keys before cross-model testing can work.
Canary interaction
What Results Look Like
Route shadow tab
Shows aggregate win-rate cards, an overall primary-vs-shadow bar, and an individual test table with expandable evaluator reasoning.
Shadow overview
Groups active tests by route, showing sample rate, primary-versus-shadow pairing, route-level win rates, and recent evaluations across the workspace.
Logs drawer
A single request can expose its linked shadow test, including per-dimension scores, winner labels, reasoning, and a model legend.
This matters because shadow mode lives in three places in the product: route configuration, workspace-level evaluation monitoring, and request-level drill-down.
What Arc Measures
accuracyScore
Evaluator judgment on which response better answered the request correctly.
concisenessScore
Evaluator judgment on which response was more concise without losing too much value.
completenessScore
Evaluator judgment on which response covered the user request more fully.
reasoning
Natural-language explanation of the comparison, surfaced in the route table and request log drawer.
winner
Arc converts randomized A/B evaluator output back into primary-versus-shadow language so the operator sees model winners instead of raw A/B labels.
Under the hood, Arc randomizes whether the primary or shadow response is shown to the evaluator as Response A or Response B. The raw score is therefore an A/B comparison, not a fixed "primary score." The dashboard remaps that into primary win, shadow win, or tie.
Interpretation guidance
Canary Rollout
Canary rollout sends a controlled percentage of live traffic to a candidate model. Unlike shadow mode, the canary response is user-facing for the traffic slice that matches the rollout percentage.
Simple distinction
Autotune
Autotune uses background candidate evaluation and quality scoring to surface recommended changes. It is a suggestion loop, not an autonomous model switcher.
In practice, shadow mode is the evidence layer and canary is the rollout layer. Autotune sits above them as a recommendation layer.