Machine-readable companion: /docs-for-robots

Arc/docs

Traffic Controls

Shadow Mode And Canary

Arc has two distinct rollout mechanisms: shadow traffic for background comparison and canary rollout for controlled user-facing exposure.

Shadow Evaluation Loop

Shadow mode duplicates a percentage of traffic to a second model in the background. The primary response still goes back to the user, while Arc stores comparison results for later analysis.

Shadow Evaluation Loop

Live Request

The route receives normal production traffic.

primary path

Primary Response

The configured route model answers the user-facing request.

returned to user

Caller

The user only sees the primary response.

Same Request

A sampled request is copied in the background.

candidate model

Shadow Response

Arc asks the shadow model to answer the same prompt without affecting production output.

evaluator scores

Dashboard Results

Accuracy, conciseness, completeness, reasoning, and win-rate views.

Best use

Use shadow mode when you want signal before rollout. It is evaluation infrastructure, not a traffic shaping feature.

Privacy posture

Arc stores scores and evaluator reasoning for the comparison, but the shadow evaluation service does not persist the prompt or model outputs as shadow-test records.

How A User Configures It

1. Enable

Open a route, go to the Shadow tab, and turn shadow mode on for that route.

2. Sample

Choose the percentage of requests that should be shadow tested in the background.

3. Compare

Pick a candidate shadow model, save the route, and then send normal production traffic through it.

The route UI already guides the operator through the real prerequisites: if no provider keys are configured, Arc prompts them to add keys before cross-model testing can work.

Canary interaction

If a canary deployment is active on the route, the Shadow tab pauses shadow mode until the canary finishes or is reverted.

What Results Look Like

Route shadow tab

Shows aggregate win-rate cards, an overall primary-vs-shadow bar, and an individual test table with expandable evaluator reasoning.

Shadow overview

Groups active tests by route, showing sample rate, primary-versus-shadow pairing, route-level win rates, and recent evaluations across the workspace.

Logs drawer

A single request can expose its linked shadow test, including per-dimension scores, winner labels, reasoning, and a model legend.

This matters because shadow mode lives in three places in the product: route configuration, workspace-level evaluation monitoring, and request-level drill-down.

What Arc Measures

accuracyScore

0..1

Evaluator judgment on which response better answered the request correctly.

concisenessScore

0..1

Evaluator judgment on which response was more concise without losing too much value.

completenessScore

0..1

Evaluator judgment on which response covered the user request more fully.

reasoning

string

Natural-language explanation of the comparison, surfaced in the route table and request log drawer.

winner

derived

Arc converts randomized A/B evaluator output back into primary-versus-shadow language so the operator sees model winners instead of raw A/B labels.

Under the hood, Arc randomizes whether the primary or shadow response is shown to the evaluator as Response A or Response B. The raw score is therefore an A/B comparison, not a fixed "primary score." The dashboard remaps that into primary win, shadow win, or tie.

Interpretation guidance

Shadow mode tells you which model tends to produce better answers on sampled live prompts. It is strongest for quality comparison. Cost and latency still need to be read alongside logs and route analytics.

Canary Rollout

Canary rollout sends a controlled percentage of live traffic to a candidate model. Unlike shadow mode, the canary response is user-facing for the traffic slice that matches the rollout percentage.

Simple distinction

Shadow mode asks, "Would the candidate have done better?" Canary asks, "What happens if users actually receive the candidate?"

Autotune

Autotune uses background candidate evaluation and quality scoring to surface recommended changes. It is a suggestion loop, not an autonomous model switcher.

In practice, shadow mode is the evidence layer and canary is the rollout layer. Autotune sits above them as a recommendation layer.