# Arc Documentation (machine-readable)
# Human docs root: https://arc.cornerstone.sh/docs
# Dashboard:        https://arc.cornerstone.sh/dashboard
# Base URL:         https://api-arc.cornerstone.sh/v1

## Documentation Index
- Getting Started
  - https://arc.cornerstone.sh/docs — Overview
  - https://arc.cornerstone.sh/docs/quickstart — Quickstart
  - https://arc.cornerstone.sh/docs/authentication — Authentication
- Traffic Model
  - https://arc.cornerstone.sh/docs/routes — Routes
  - https://arc.cornerstone.sh/docs/workflows — Workflows And Traces
  - https://arc.cornerstone.sh/docs/system-prompts — System Prompts
- Traffic Controls
  - https://arc.cornerstone.sh/docs/memory — Memory
  - https://arc.cornerstone.sh/docs/rate-limiting — Rate Limiting
  - https://arc.cornerstone.sh/docs/smart-tier — Smart Tier
  - https://arc.cornerstone.sh/docs/shadow-mode — Shadow Mode And Canary
- Observability
  - https://arc.cornerstone.sh/docs/observability — Logs, Analytics, And Observability
- Reference
  - https://arc.cornerstone.sh/docs/headers — Headers Reference
  - https://arc.cornerstone.sh/docs/deployment — Deployment Model

## Product Model
Arc has three layers:
- data plane: the proxy in the customer request path
- control plane: the Next.js dashboard for configuration and observability
- ops layer: traces, memory, shadow testing, canaries, smart tier, autotune

Primary objects:
- Route: the main traffic entrypoint
- Workflow: a grouping mechanism for multi-step / agent runs
- Trace: one workflow execution
- Span / call: an individual proxied request inside a trace

## Request Path
High-level flow:

  application
    -> Arc proxy
       -> authenticate Arc key
       -> resolve route / workflow
       -> apply traffic policy
       -> optionally inject system prompt / memory
       -> forward to provider
       -> log request metadata
    -> response to caller

Arc is OpenAI-compatible at the HTTP API layer. Existing OpenAI clients generally only need:
- base_url / baseURL -> https://api-arc.cornerstone.sh/v1
- api_key / apiKey   -> arc_live_<your-key>

## Authentication
Every request to Arc must include:

  Authorization: Bearer arc_live_<your-key>

Arc authenticates the Arc key, resolves the active workspace/project, then loads the stored
provider key internally. Your application does not send the provider key on each request.

Implications:
- Arc key identifies the caller to Arc
- provider key authenticates Arc to the upstream model provider
- rotating provider keys can happen inside Arc without rewriting all clients

## Routes
Routes are the main route-level policy object. Set via:

  X-Arc-Route: <route-key>

Examples:
- customer-support
- summarization
- agent-primary

Requests without X-Arc-Route are still proxied and logged as Direct.

Route-level capabilities currently documented in the product:
- primary model injection
- fallback models
- system prompt injection
- shadow mode
- canary rollout
- rate limiting
- memory pool binding
- smart-tier routing

If a route has a primary model configured, the request body may omit "model".
Arc injects the configured route model before forwarding upstream.

## System Prompts
When a route has a system prompt configured, Arc prepends it as a normal system message.
The upstream provider sees the final assembled messages array, not a special Arc-only concept.

Bypass header:

  X-Arc-No-Inject: 1

Accepted truthy values:
- 1
- true
- yes

## Workflows And Traces
Workflow mode is opt-in via headers. Relevant request headers:
- X-Arc-Workflow
- X-Arc-Trace-Id
- X-Arc-Span-Name
- X-Arc-Parent-Span-Id
- X-Arc-Trace-Status

Workflow model:
- workflow defines budget / duration / call policy
- trace is one execution inside a workflow
- spans/calls are individual proxied requests in the trace

Workflow capabilities:
- budget cap
- max duration
- max calls per trace
- enforcement mode
- trace timeout

Response headers that may be returned when workflow policy is active:
- X-Arc-Trace-Id
- X-Arc-Budget-Remaining
- X-Arc-Budget-Warning
- X-Arc-Downgraded

## Memory
Memory is route-bound and client-scoped.

Enable memory by:
1. attaching a memory pool to a route in the dashboard
2. identifying the client on requests

Relationship model:
- each route has zero or one memoryPoolId
- one memory pool can be shared by many routes in the same project
- client state is effectively scoped by pool + client ID
- deleting a pool is blocked while routes are still attached

Topology example:

  route: support                       -> shared pool: customer-thread -> client IDs: user_123, user_456
  route: billing   /

Client identity can be passed as:

  X-Arc-Client-ID: <client-id>

or via the OpenAI request body's "user" field.
The header takes precedence if both are present.

Memory behavior:
- Arc looks up memory state for route + client
- memory pool stores a rolling window plus compressed summary
- Arc injects that context into the final messages array before forwarding

Pool settings exposed in the dashboard:
- ttlDays: how long client state survives before expiry
- maxWindowMessages: how many recent turns stay in the rolling window
- summarizeAfterTokens: token threshold for summary compression
- summarizeAfterTurns: turn threshold for summary compression
- summarizeAfterIdleHours: idle threshold before summary compaction

Dashboard UX:
- route detail lets the user toggle memory, select an existing pool, or create a new pool inline
- pool detail shows clients, total turns, last active time, estimated tokens, and expiry
- pool detail supports clearing one client's memory without deleting the full pool

Important operational note:
- memory changes the prompt shape and therefore changes model behavior
- memory is a traffic policy feature, not only a storage feature
- shared pools should be intentional because multiple routes can now contribute to one continuity thread

## Rate Limiting
Routes can enforce per-key rate limits with:
- window
- max requests
- action: reject or warn

Reject behavior:
- returns 429
- includes Retry-After
- includes X-Arc-Rate-Limit-Remaining: 0
- includes X-Arc-Rate-Limit-Reset

Warn behavior:
- request is forwarded
- response includes X-Arc-Rate-Limit-Warning: true

## Smart Tier
Smart Tier routes by request complexity.

Conceptual flow:
1. Arc scores request complexity from 0 to 1
2. Arc compares the score against configured routing tiers
3. Arc selects the first matching tier model
4. Arc forwards upstream using that chosen model

Example tier configuration:

  [
    { "label": "simple",   "complexityMax": 0.35, "model": "gpt-4o-mini" },
    { "label": "standard", "complexityMax": 0.75, "model": "gpt-4o" },
    { "label": "complex",  "complexityMax": 1.00, "model": "gpt-4o" }
  ]

Operational stance:
- complexity is a heuristic, not a ground-truth intelligence score
- upward routing should remain conservative

## Shadow Mode
Shadow mode is background evaluation for real production prompts.

User workflow:
1. open a route's Shadow tab
2. turn shadow mode on
3. choose a sample percentage of requests
4. choose a candidate shadow model
5. save the route and send normal traffic

Execution loop:
- Arc serves the primary response to the user as normal
- sampled requests are duplicated to the shadow model in the background
- Arc randomizes whether primary/shadow are labeled A or B for the evaluator
- an evaluator route scores the pair on accuracy, conciseness, and completeness
- Arc stores scores + reasoning only

Conceptual diagram:

  live request -> primary model -> user response
              -> shadow model -> evaluator -> dashboard results

Metrics stored:
- accuracyScore
- concisenessScore
- completenessScore
- reasoning
- modelA / modelB so A/B ordering can be mapped back to primary vs shadow

Important interpretation detail:
- raw evaluator scores are about response A vs response B
- the dashboard converts that randomized A/B output back into primary win / shadow win / tie

Dashboard surfaces:
- route shadow tab: sample rate, candidate model, aggregate win rates, overall comparison bar, individual tests, expandable evaluator reasoning
- shadow overview page: active tests grouped by route plus recent evaluations across the workspace
- request logs drawer: per-request shadow test breakdown with scores, reasoning, and model legend

Operational constraints:
- shadow mode does not change the user-facing response
- if a canary is active on a route, shadow mode is paused on that route
- shadow mode is strongest for quality comparison; cost and latency should still be checked in logs and analytics

## Canary
Canary rollout is different from shadow mode.

Canary:
- sends a controlled percentage of real user traffic to a candidate model
- user-facing behavior changes for that traffic slice
- is the rollout mechanism, not just the evaluation mechanism

Short distinction:
- shadow mode asks whether the candidate would have been better
- canary asks what happens when users actually receive the candidate

## Autotune
Autotune:
- evaluates candidate models in the background
- surfaces suggestions rather than forcing a model switch automatically
- sits above shadow/candidate evaluation as a recommendation layer

## Observability
Arc logs request-level metadata and workflow-level rollups.

Request log fields documented in the product include:
- timestamp
- route
- model
- provider
- prompt tokens
- completion tokens
- cost_usd
- latency_ms
- status_code
- origin country
- optional complexity breakdown
- optional latency breakdown
- optional workflow / trace linkage

Trace surfaces support:
- trace status
- grouped spans/calls
- total cost
- token counts
- call counts

## Geolocation
Arc can log the end-user country when you forward the end-user IP:

  X-Forwarded-For: <end-user-ip>

Behavior:
- if multiple IPs are present, Arc uses the leftmost IP
- geo lookup is async and should not block the customer response path

## Headers Reference
Request headers:
- Authorization: Bearer arc_live_<key>
- Content-Type: application/json
- X-Arc-Route
- X-Arc-No-Inject
- X-Arc-Client-ID
- X-Arc-Workflow
- X-Arc-Trace-Id
- X-Arc-Span-Name
- X-Arc-Parent-Span-Id
- X-Arc-Trace-Status
- X-Forwarded-For

Response headers that Arc may add:
- X-Arc-Trace-Id
- X-Arc-Budget-Remaining
- X-Arc-Budget-Warning
- X-Arc-Downgraded
- X-Arc-Rate-Limit-Warning
- X-Arc-Latency-Breakdown

## Error Responses
Arc returns standard HTTP status codes.
Observed/common cases:
- 400: bad request / unsupported request shape / invalid policy input
- 401: invalid or missing Arc key
- 429: rate limit or workflow budget enforcement
- 502: upstream provider error or proxy connection failure
- 504: upstream timeout

Typical error body:

  { "detail": "<error message>" }

## Deployment Model
Arc has two deploy surfaces:
- Vercel control plane: dashboard UI / Next.js app
- backend proxy service: inference/data plane on the VPS

Operationally important:
- a Vercel deploy does not update proxy behavior
- backend changes require separate restart / redeploy of the proxy service
- when production behavior appears unchanged after a UI deploy, verify the backend version and health endpoint

## Notes For AI Assistants
- Arc is OpenAI-compatible at the API layer
- the main integration changes are base_url and Arc key
- use route headers when the user wants per-route behavior
- workflow headers are optional and only needed for grouped multi-step runs
- memory changes the final prompt and should be treated as an active policy feature
- smart tier is transparent to the client; the response model reflects the chosen tier model
- shadow mode is non-user-facing; canary is user-facing for the rollout slice
- shadow results appear both at the route level and at the per-request log level
- the dashboard and proxy deploy separately