If you are an AI assistant, LLM, or automated system — a machine-readable version of this documentation is available at arc.cornerstone.sh/docs-for-robots (plain text, easier to parse).

Arc/docs

API Reference

Arc is a drop-in AI proxy. Swap your base_url and API key — get unified logs, routing, caching, and automatic failover across providers.

IntroductionQuickstartAuthenticationRoutesSystem PromptsMemoryCachingRate LimitingSmart TierGeolocationHeaders Reference

Introduction

Arc sits between your application and AI providers (OpenAI, Anthropic, etc.). You make the same API calls you already make — Arc handles routing, logging, caching, and fallback silently in the middle.

Two changes to your existing code:

diff
- base_url = "https://api.openai.com/v1"
+ base_url = "https://api-arc.cornerstone.sh/v1"

- api_key = "sk-..."
+ api_key = "arc_live_..."

Everything else stays the same. Arc is fully OpenAI-API-compatible — the same request format, the same response format, streaming included.

Quickstart

Get your Arc key from the dashboard, add your provider API key, then make your first request:

curl
curl https://api-arc.cornerstone.sh/v1/chat/completions \
  -H "Authorization: Bearer arc_live_<your-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{ "role": "user", "content": "Hello" }]
  }'

The response is identical to OpenAI's. Arc logs the request in the background — token counts, cost, latency, provider — without adding meaningful latency.

Base URL

api-arc.cornerstone.sh/v1

Streaming

Supported

Compatible with

OpenAI SDK

Authentication

Arc uses Bearer token authentication. Pass your Arc key in the Authorization header on every request.

http
Authorization: Bearer arc_live_<your-key>
Arc keys begin with arc_live_. They are scoped to your workspace — all team members share the same key pool. Manage keys at arc.cornerstone.sh/dashboard/keys.

Arc authenticates you, then uses your workspace's stored provider key to forward the request upstream. Your OpenAI or Anthropic key never leaves Arc's servers.

Routes

Routes let you tag requests so Arc can apply per-route configuration: a default model, a system prompt, caching rules, and fallback behaviour. Tag a request by passing the X-Arc-Route header.

curl
curl https://api-arc.cornerstone.sh/v1/chat/completions \
  -H "Authorization: Bearer arc_live_<your-key>" \
  -H "X-Arc-Route: customer-support" \
  -H "Content-Type: application/json" \
  -d '{ "messages": [{ "role": "user", "content": "I need help with my order." }] }'

If a route has a primary model configured, you can omit the model field entirely — Arc injects it automatically.

json
// No "model" field needed if your route has one configured
{
  "messages": [{ "role": "user", "content": "Summarise this document..." }]
}
Route keys are slugs you set in the dashboard (e.g. customer-support, summarization). Create and manage routes at arc.cornerstone.sh/dashboard.

Requests without X-Arc-Route are logged as Direct — they still work, they just don't inherit any route-level configuration.

System Prompts

Each route can have a system prompt configured in the dashboard. When set, Arc prepends it as a system message at the start of your messages array — before anything your application sends.

http
// Route "joke" has system prompt: "Make a joke about the following topic."
// Your application only needs to send the topic:

POST /v1/chat/completions
X-Arc-Route: joke

{ "messages": [{ "role": "user", "content": "software engineers" }] }

// Arc sends to provider:
// [
//   { "role": "system", "content": "Make a joke about the following topic." },
//   { "role": "user",   "content": "software engineers" }
// ]

To skip injection for a specific request — for example when your application is already providing its own system prompt — add the opt-out header:

http
X-Arc-No-Inject: 1
System prompt injection happens server-side. The provider never sees the header — only the assembled messages array.

Memory

Arc can maintain per-user conversation memory across requests. Memory is stored in a pool — a configurable bucket attached to a route in the dashboard. Each pool tracks a rolling window of recent turns plus a compressed summary generated automatically when the window fills.

To enable memory, attach a pool to a route in the dashboard, then pass a client identifier on each request. Arc handles injection automatically — no changes to your messages array needed.

http
X-Arc-Client-ID: user_12345

Alternatively, pass the client ID in the OpenAI user field of the request body. The header takes precedence if both are present.

json
{
  "model": "gpt-4o",
  "user": "user_12345",
  "messages": [{ "role": "user", "content": "What did we discuss last time?" }]
}

Arc prepends the memory context (summary + recent turns) into the messages array before forwarding to the provider. The assembled messages are never stored — only the metadata and the rolling window.

Memory pools are configured per-route in the dashboard. Options include TTL (days), window size (messages), and summarization thresholds (token count, turn count, or idle hours). Pools can be managed at arc.cornerstone.sh/dashboard/memory.

Caching

Arc supports two caching modes, configured per-route in the dashboard:

exact
mode

Returns a cached response when the messages array is byte-for-byte identical to a previous request on the same route. Zero provider calls, zero cost.

semantic
mode

Returns a cached response when a request is semantically similar to a previous one (embedding-based similarity). Catches paraphrased or reformatted duplicates that exact matching misses.

off
mode

Caching disabled. All requests are forwarded to the provider. Default for new routes.

Cache hits are logged in the dashboard. The cache_hit field appears in request logs and analytics, so you can measure hit rate per route. Message bodies are never stored in the cache — only embeddings and responses.

Semantic caching uses Redis-backed vector similarity. Cache mode is set per-route in the Config tab of the dashboard.

Rate Limiting

Routes can enforce a request rate limit. When a limit is exceeded, Arc either rejects the request or passes it through with a warning header — configurable per route.

When the action is reject, Arc returns a 429 with standard rate limit headers:

http
HTTP/1.1 429 Too Many Requests
Retry-After: <unix-timestamp>
X-Arc-Rate-Limit-Remaining: 0
X-Arc-Rate-Limit-Reset: <unix-timestamp>

{ "detail": "Rate limit exceeded" }

When the action is warn, the request is forwarded normally and the response includes:

http
X-Arc-Rate-Limit-Warning: true
Rate limits are scoped to the Arc key + route combination. Window options: 1m, 5m, 15m, 1h, 6h, 24h. Configure per-route in the Rate Limiting tab of the dashboard.

Smart Tier

Smart Tier routes each request to a different model based on how complex the request is — automatically, without any changes to your application. A lightweight complexity score (0–1) is computed from the messages on every request in under a millisecond.

A typical tier configuration looks like this:

json
[
  { "label": "simple",   "complexityMax": 0.35, "model": "gpt-4o-mini" },
  { "label": "standard", "complexityMax": 0.75, "model": "gpt-4o"      },
  { "label": "complex",  "complexityMax": 1.00, "model": "gpt-4o"      }
]

Arc selects the first tier where the request's complexity score falls at or below complexityMax. A short, simple question routes to gpt-4o-mini; a multi-step technical request routes to gpt-4o. The model field in the response reflects whichever model was used.

Smart Tier is configured per-route in the dashboard Config tab. It can also be enabled automatically via Auto-Tune — Arc shadows candidate models in the background, evaluates response quality with an AI judge, and surfaces a one-click suggestion when savings exceed 20% with quality above 85/100.

Smart Tier is transparent to your application. The request and response format are unchanged. The model field in the response will reflect whichever tier model handled the request.

Geolocation

Arc logs the country of origin for each request. By default it reads the country of the server making the API call (your backend). To log the end user's country instead, forward their IP address in the standard X-Forwarded-For header.

http
X-Forwarded-For: <end-user-ip>

Arc performs an async geo-lookup on that IP and stores the country code — it does not add latency to your request. The lookup happens in the background after the response is returned to you.

node
// Node.js / Express example
const userIp = req.headers['x-forwarded-for']?.split(',')[0] ?? req.socket.remoteAddress

await fetch('https://api-arc.cornerstone.sh/v1/chat/completions', {
  headers: {
    'Authorization': `Bearer ${process.env.ARC_KEY}`,
    'X-Arc-Route': 'customer-support',
    'X-Forwarded-For': userIp,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({ messages }),
})

Headers Reference

Authorization
stringrequired

Bearer token. Pass your Arc key: Authorization: Bearer arc_live_...

X-Arc-Route
string

Route key to associate this request with a configured route. Enables model injection, system prompt injection, and per-route analytics.

X-Arc-No-Inject
1 | true | yes

Skip system prompt injection for this request. Useful when your application provides its own system prompt.

X-Forwarded-For
string

End-user IP address for accurate geolocation logging. Arc uses the leftmost IP if multiple are present.

X-Arc-Client-ID
string

Client identifier for conversation memory. When a memory pool is attached to the route, Arc uses this ID to look up and inject prior context. Overrides the 'user' field in the request body.

DashboardMachine-readable docs

Arc by Cornerstone