# Arc Documentation (machine-readable) # Human docs root: https://arc.cornerstone.sh/docs # Dashboard: https://arc.cornerstone.sh/dashboard # Base URL: https://api-arc.cornerstone.sh/v1 ## Documentation Index - Getting Started - https://arc.cornerstone.sh/docs — Overview - https://arc.cornerstone.sh/docs/quickstart — Quickstart - https://arc.cornerstone.sh/docs/authentication — Authentication - Traffic Model - https://arc.cornerstone.sh/docs/routes — Routes - https://arc.cornerstone.sh/docs/workflows — Workflows And Traces - https://arc.cornerstone.sh/docs/system-prompts — System Prompts - Traffic Controls - https://arc.cornerstone.sh/docs/memory — Memory - https://arc.cornerstone.sh/docs/rate-limiting — Rate Limiting - https://arc.cornerstone.sh/docs/smart-tier — Smart Tier - https://arc.cornerstone.sh/docs/shadow-mode — Shadow Mode And Canary - Observability - https://arc.cornerstone.sh/docs/observability — Logs, Analytics, And Observability - Reference - https://arc.cornerstone.sh/docs/headers — Headers Reference - https://arc.cornerstone.sh/docs/deployment — Deployment Model ## Product Model Arc has three layers: - data plane: the proxy in the customer request path - control plane: the Next.js dashboard for configuration and observability - ops layer: traces, memory, shadow testing, canaries, smart tier, autotune Primary objects: - Route: the main traffic entrypoint - Workflow: a grouping mechanism for multi-step / agent runs - Trace: one workflow execution - Span / call: an individual proxied request inside a trace ## Request Path High-level flow: application -> Arc proxy -> authenticate Arc key -> resolve route / workflow -> apply traffic policy -> optionally inject system prompt / memory -> forward to provider -> log request metadata -> response to caller Arc is OpenAI-compatible at the HTTP API layer. Existing OpenAI clients generally only need: - base_url / baseURL -> https://api-arc.cornerstone.sh/v1 - api_key / apiKey -> arc_live_ ## Authentication Every request to Arc must include: Authorization: Bearer arc_live_ Arc authenticates the Arc key, resolves the active workspace/project, then loads the stored provider key internally. Your application does not send the provider key on each request. Implications: - Arc key identifies the caller to Arc - provider key authenticates Arc to the upstream model provider - rotating provider keys can happen inside Arc without rewriting all clients ## Routes Routes are the main route-level policy object. Set via: X-Arc-Route: Examples: - customer-support - summarization - agent-primary Requests without X-Arc-Route are still proxied and logged as Direct. Route-level capabilities currently documented in the product: - primary model injection - fallback models - system prompt injection - shadow mode - canary rollout - rate limiting - memory pool binding - smart-tier routing If a route has a primary model configured, the request body may omit "model". Arc injects the configured route model before forwarding upstream. ## System Prompts When a route has a system prompt configured, Arc prepends it as a normal system message. The upstream provider sees the final assembled messages array, not a special Arc-only concept. Bypass header: X-Arc-No-Inject: 1 Accepted truthy values: - 1 - true - yes ## Workflows And Traces Workflow mode is opt-in via headers. Relevant request headers: - X-Arc-Workflow - X-Arc-Trace-Id - X-Arc-Span-Name - X-Arc-Parent-Span-Id - X-Arc-Trace-Status Workflow model: - workflow defines budget / duration / call policy - trace is one execution inside a workflow - spans/calls are individual proxied requests in the trace Workflow capabilities: - budget cap - max duration - max calls per trace - enforcement mode - trace timeout Response headers that may be returned when workflow policy is active: - X-Arc-Trace-Id - X-Arc-Budget-Remaining - X-Arc-Budget-Warning - X-Arc-Downgraded ## Memory Memory is route-bound and client-scoped. Enable memory by: 1. attaching a memory pool to a route in the dashboard 2. identifying the client on requests Relationship model: - each route has zero or one memoryPoolId - one memory pool can be shared by many routes in the same project - client state is effectively scoped by pool + client ID - deleting a pool is blocked while routes are still attached Topology example: route: support -> shared pool: customer-thread -> client IDs: user_123, user_456 route: billing / Client identity can be passed as: X-Arc-Client-ID: or via the OpenAI request body's "user" field. The header takes precedence if both are present. Memory behavior: - Arc looks up memory state for route + client - memory pool stores a rolling window plus compressed summary - Arc injects that context into the final messages array before forwarding Pool settings exposed in the dashboard: - ttlDays: how long client state survives before expiry - maxWindowMessages: how many recent turns stay in the rolling window - summarizeAfterTokens: token threshold for summary compression - summarizeAfterTurns: turn threshold for summary compression - summarizeAfterIdleHours: idle threshold before summary compaction Dashboard UX: - route detail lets the user toggle memory, select an existing pool, or create a new pool inline - pool detail shows clients, total turns, last active time, estimated tokens, and expiry - pool detail supports clearing one client's memory without deleting the full pool Important operational note: - memory changes the prompt shape and therefore changes model behavior - memory is a traffic policy feature, not only a storage feature - shared pools should be intentional because multiple routes can now contribute to one continuity thread ## Rate Limiting Routes can enforce per-key rate limits with: - window - max requests - action: reject or warn Reject behavior: - returns 429 - includes Retry-After - includes X-Arc-Rate-Limit-Remaining: 0 - includes X-Arc-Rate-Limit-Reset Warn behavior: - request is forwarded - response includes X-Arc-Rate-Limit-Warning: true ## Smart Tier Smart Tier routes by request complexity. Conceptual flow: 1. Arc scores request complexity from 0 to 1 2. Arc compares the score against configured routing tiers 3. Arc selects the first matching tier model 4. Arc forwards upstream using that chosen model Example tier configuration: [ { "label": "simple", "complexityMax": 0.35, "model": "gpt-4o-mini" }, { "label": "standard", "complexityMax": 0.75, "model": "gpt-4o" }, { "label": "complex", "complexityMax": 1.00, "model": "gpt-4o" } ] Operational stance: - complexity is a heuristic, not a ground-truth intelligence score - upward routing should remain conservative ## Shadow Mode Shadow mode is background evaluation for real production prompts. User workflow: 1. open a route's Shadow tab 2. turn shadow mode on 3. choose a sample percentage of requests 4. choose a candidate shadow model 5. save the route and send normal traffic Execution loop: - Arc serves the primary response to the user as normal - sampled requests are duplicated to the shadow model in the background - Arc randomizes whether primary/shadow are labeled A or B for the evaluator - an evaluator route scores the pair on accuracy, conciseness, and completeness - Arc stores scores + reasoning only Conceptual diagram: live request -> primary model -> user response -> shadow model -> evaluator -> dashboard results Metrics stored: - accuracyScore - concisenessScore - completenessScore - reasoning - modelA / modelB so A/B ordering can be mapped back to primary vs shadow Important interpretation detail: - raw evaluator scores are about response A vs response B - the dashboard converts that randomized A/B output back into primary win / shadow win / tie Dashboard surfaces: - route shadow tab: sample rate, candidate model, aggregate win rates, overall comparison bar, individual tests, expandable evaluator reasoning - shadow overview page: active tests grouped by route plus recent evaluations across the workspace - request logs drawer: per-request shadow test breakdown with scores, reasoning, and model legend Operational constraints: - shadow mode does not change the user-facing response - if a canary is active on a route, shadow mode is paused on that route - shadow mode is strongest for quality comparison; cost and latency should still be checked in logs and analytics ## Canary Canary rollout is different from shadow mode. Canary: - sends a controlled percentage of real user traffic to a candidate model - user-facing behavior changes for that traffic slice - is the rollout mechanism, not just the evaluation mechanism Short distinction: - shadow mode asks whether the candidate would have been better - canary asks what happens when users actually receive the candidate ## Autotune Autotune: - evaluates candidate models in the background - surfaces suggestions rather than forcing a model switch automatically - sits above shadow/candidate evaluation as a recommendation layer ## Observability Arc logs request-level metadata and workflow-level rollups. Request log fields documented in the product include: - timestamp - route - model - provider - prompt tokens - completion tokens - cost_usd - latency_ms - status_code - origin country - optional complexity breakdown - optional latency breakdown - optional workflow / trace linkage Trace surfaces support: - trace status - grouped spans/calls - total cost - token counts - call counts ## Geolocation Arc can log the end-user country when you forward the end-user IP: X-Forwarded-For: Behavior: - if multiple IPs are present, Arc uses the leftmost IP - geo lookup is async and should not block the customer response path ## Headers Reference Request headers: - Authorization: Bearer arc_live_ - Content-Type: application/json - X-Arc-Route - X-Arc-No-Inject - X-Arc-Client-ID - X-Arc-Workflow - X-Arc-Trace-Id - X-Arc-Span-Name - X-Arc-Parent-Span-Id - X-Arc-Trace-Status - X-Forwarded-For Response headers that Arc may add: - X-Arc-Trace-Id - X-Arc-Budget-Remaining - X-Arc-Budget-Warning - X-Arc-Downgraded - X-Arc-Rate-Limit-Warning - X-Arc-Latency-Breakdown ## Error Responses Arc returns standard HTTP status codes. Observed/common cases: - 400: bad request / unsupported request shape / invalid policy input - 401: invalid or missing Arc key - 429: rate limit or workflow budget enforcement - 502: upstream provider error or proxy connection failure - 504: upstream timeout Typical error body: { "detail": "" } ## Deployment Model Arc has two deploy surfaces: - Vercel control plane: dashboard UI / Next.js app - backend proxy service: inference/data plane on the VPS Operationally important: - a Vercel deploy does not update proxy behavior - backend changes require separate restart / redeploy of the proxy service - when production behavior appears unchanged after a UI deploy, verify the backend version and health endpoint ## Notes For AI Assistants - Arc is OpenAI-compatible at the API layer - the main integration changes are base_url and Arc key - use route headers when the user wants per-route behavior - workflow headers are optional and only needed for grouped multi-step runs - memory changes the final prompt and should be treated as an active policy feature - smart tier is transparent to the client; the response model reflects the chosen tier model - shadow mode is non-user-facing; canary is user-facing for the rollout slice - shadow results appear both at the route level and at the per-request log level - the dashboard and proxy deploy separately