Operator runbook: adding a new LLM provider
Operator runbook.
Two paths depending on how much the new provider looks like OpenAI:
- OpenAI-compatible provider (e.g. Fireworks, Together, Groq, DeepSeek, Z.ai) — no code change needed. Add keys via the dashboard, optionally set
baseURLenv, done. - Native-API provider (e.g. new Anthropic-shape endpoint, new streaming protocol) — requires a new adapter under
packages/gateway/src/providers/. Estimate: 1–2 hours.
Path A — OpenAI-compatible
-
Add the API key via
/dashboard/api-keyson the affected tenant. Name it after the provider (e.g.FIREWORKS_API_KEY). The key is AES-256-GCM encrypted at rest withPROVARA_MASTER_KEY. -
Set the base URL via env if it's not
api.openai.com. Example for Fireworks:# Railway env var on provara-gateway FIREWORKS_BASE_URL=https://api.fireworks.ai/inference/v1 -
Register the provider by name. If it's not already in
packages/gateway/src/providers/index.tsautoregister list, add an entry:registerOpenAICompatible({ name: "fireworks", apiKeyEnv: "FIREWORKS_API_KEY", baseUrlEnv: "FIREWORKS_BASE_URL", defaultBaseUrl: "https://api.fireworks.ai/inference/v1", }); -
Add pricing in
packages/gateway/src/cost/pricing.ts(MODEL_PRICINGrecord). Use per-1M-token[input, output]in USD. Missing entries default to$0cost which will break analytics rollups — don't skip this. -
Deploy. The provider's models are discovered at startup via the registry's
refreshModelshook. -
Verify. Send a completion pinned to the new provider:
curl -X POST https://gateway.provara.xyz/v1/chat/completions \ -H "Authorization: Bearer <your-token>" \ -H "Content-Type: application/json" \ -d '{"model":"accounts/fireworks/models/llama-v3p3-70b","messages":[{"role":"user","content":"hi"}]}'
Path B — Native-API provider
Use packages/gateway/src/providers/anthropic.ts as the cleanest template; it's a native adapter that also handles streaming + token counting.
- Create
packages/gateway/src/providers/<provider>.tsimplementing theProviderinterface from./types.ts(complete,completeStream, optionaldiscoverModels). - Handle streaming — this is the trickiest part. Watch for first-chunk-fallback (empty chunks from the upstream API shouldn't be forwarded as SSE). The adapter must synthesize the
_provarameta event at the end so the gateway knows the stream finished cleanly. - Add to the registry in
packages/gateway/src/providers/index.tsunder the sameregister(...)pattern other native adapters use. - Pricing in
cost/pricing.tsas with Path A. - Tests under
packages/gateway/tests/providers/<provider>.test.ts— at minimum, assert the adapter handles200 OK,4xx error, and stream-aborted cases. - Deploy and verify as Path A step 6.
Common gotchas
- Dashboard shows the provider but routing never picks it. Adaptive router requires
MIN_SAMPLES(default 5) of feedback before a cell routes to a model. Force traffic via a pinnedmodel/providerfor a few completions + judge sample to bootstrap. SetPROVARA_MIN_SAMPLES=2temporarily for faster cold-start. - Tokens reported wrong. The OpenAI-compatible adapter trusts the
usageblock in the upstream response. If a provider returnsnullforinput_tokens, Provara falls back to counting characters / 4 — which is approximate. Patch the provider-specific adapter to do better counting if accurate cost attribution matters. - Streaming cuts off early. Often a keepalive timeout at Railway's edge. Either shorten
PROVARA_STREAM_TIMEOUT_MSso client-side retry kicks in earlier, or disable streaming for that provider until it's debugged.