If you are an AI assistant, LLM, or automated system — a machine-readable version of this documentation is available at arc.cornerstone.sh/docs-for-robots (plain text, easier to parse).
Arc is a drop-in AI proxy. Swap your base_url and API key — get unified logs, routing, caching, and automatic failover across providers.
Arc sits between your application and AI providers (OpenAI, Anthropic, etc.). You make the same API calls you already make — Arc handles routing, logging, caching, and fallback silently in the middle.
Two changes to your existing code:
Everything else stays the same. Arc is fully OpenAI-API-compatible — the same request format, the same response format, streaming included.
Get your Arc key from the dashboard, add your provider API key, then make your first request:
The response is identical to OpenAI's. Arc logs the request in the background — token counts, cost, latency, provider — without adding meaningful latency.
Base URL
api-arc.cornerstone.sh/v1
Streaming
Supported
Compatible with
OpenAI SDK
Arc uses Bearer token authentication. Pass your Arc key in the Authorization header on every request.
arc_live_. They are scoped to your workspace — all team members share the same key pool. Manage keys at arc.cornerstone.sh/dashboard/keys.Arc authenticates you, then uses your workspace's stored provider key to forward the request upstream. Your OpenAI or Anthropic key never leaves Arc's servers.
Routes let you tag requests so Arc can apply per-route configuration: a default model, a system prompt, caching rules, and fallback behaviour. Tag a request by passing the X-Arc-Route header.
If a route has a primary model configured, you can omit the model field entirely — Arc injects it automatically.
customer-support, summarization). Create and manage routes at arc.cornerstone.sh/dashboard.Requests without X-Arc-Route are logged as Direct — they still work, they just don't inherit any route-level configuration.
Each route can have a system prompt configured in the dashboard. When set, Arc prepends it as a system message at the start of your messages array — before anything your application sends.
To skip injection for a specific request — for example when your application is already providing its own system prompt — add the opt-out header:
Arc can maintain per-user conversation memory across requests. Memory is stored in a pool — a configurable bucket attached to a route in the dashboard. Each pool tracks a rolling window of recent turns plus a compressed summary generated automatically when the window fills.
To enable memory, attach a pool to a route in the dashboard, then pass a client identifier on each request. Arc handles injection automatically — no changes to your messages array needed.
Alternatively, pass the client ID in the OpenAI user field of the request body. The header takes precedence if both are present.
Arc prepends the memory context (summary + recent turns) into the messages array before forwarding to the provider. The assembled messages are never stored — only the metadata and the rolling window.
Arc supports two caching modes, configured per-route in the dashboard:
Returns a cached response when the messages array is byte-for-byte identical to a previous request on the same route. Zero provider calls, zero cost.
Returns a cached response when a request is semantically similar to a previous one (embedding-based similarity). Catches paraphrased or reformatted duplicates that exact matching misses.
Caching disabled. All requests are forwarded to the provider. Default for new routes.
Cache hits are logged in the dashboard. The cache_hit field appears in request logs and analytics, so you can measure hit rate per route. Message bodies are never stored in the cache — only embeddings and responses.
Routes can enforce a request rate limit. When a limit is exceeded, Arc either rejects the request or passes it through with a warning header — configurable per route.
When the action is reject, Arc returns a 429 with standard rate limit headers:
When the action is warn, the request is forwarded normally and the response includes:
Smart Tier routes each request to a different model based on how complex the request is — automatically, without any changes to your application. A lightweight complexity score (0–1) is computed from the messages on every request in under a millisecond.
A typical tier configuration looks like this:
Arc selects the first tier where the request's complexity score falls at or below complexityMax. A short, simple question routes to gpt-4o-mini; a multi-step technical request routes to gpt-4o. The model field in the response reflects whichever model was used.
Smart Tier is configured per-route in the dashboard Config tab. It can also be enabled automatically via Auto-Tune — Arc shadows candidate models in the background, evaluates response quality with an AI judge, and surfaces a one-click suggestion when savings exceed 20% with quality above 85/100.
model field in the response will reflect whichever tier model handled the request.Arc logs the country of origin for each request. By default it reads the country of the server making the API call (your backend). To log the end user's country instead, forward their IP address in the standard X-Forwarded-For header.
Arc performs an async geo-lookup on that IP and stores the country code — it does not add latency to your request. The lookup happens in the background after the response is returned to you.
Bearer token. Pass your Arc key: Authorization: Bearer arc_live_...
Route key to associate this request with a configured route. Enables model injection, system prompt injection, and per-route analytics.
Skip system prompt injection for this request. Useful when your application provides its own system prompt.
End-user IP address for accurate geolocation logging. Arc uses the leftmost IP if multiple are present.
Client identifier for conversation memory. When a memory pool is attached to the route, Arc uses this ID to look up and inject prior context. Overrides the 'user' field in the request body.
Arc by Cornerstone