One OpenAI-compatible endpoint, two lanes — cloud (Crazyrouter, 247 models) and local (this machine's GPU). No API key needed; auth is handled server-side. Docs also at docs.llm.hostbun.cc.
| Method | Path | Lane | Purpose |
|---|---|---|---|
| POST | /v1/chat/completions | cloud | Chat — streaming, tools, structured outputs, vision (any of 247 models) |
| POST | /v1/completions · /v1/embeddings · /v1/images/generations · /v1/audio/* · /v1/rerank | cloud | Full Crazyrouter OpenAI-compatible surface (key injected) |
| GET | /v1/models | cloud | List cloud models |
| POST | /local/v1/chat/completions | local | Chat with the local machine (gemma on the GPU) |
| GET | /local/v1/models | local | List local models |
| GET | /prices.json · /prices | Computed actual prices (discount applied); refreshes every 6h | |
| GET | /dashboard/billing/subscription | Credit limit (hard_limit_usd) | |
| GET | /dashboard/billing/usage | Usage to date | |
| GET | /api/pricing | cloud | Raw Crazyrouter pricing (ratios/discount) |
| GET | /docs | This page |
Identical OpenAI API on both lanes — the only difference is the base URL path: /v1 for cloud, /local/v1 to talk to the local machine.
| Lane | Base URL | Models |
|---|---|---|
| cloud | https://llm.hostbun.cc/v1 | 247 (Claude, GPT, Gemini, DeepSeek, Qwen, Doubao, Grok…) |
| local | https://llm.hostbun.cc/local/v1 | google/gemma-4-26b-a4b (free, on-prem, vision) |
# cloud
curl https://llm.hostbun.cc/v1/chat/completions -H "Content-Type: application/json" \
-d '{"model":"claude-sonnet-4-6","messages":[{"role":"user","content":"hi"}]}'
# local machine
curl https://llm.hostbun.cc/local/v1/chat/completions -H "Content-Type: application/json" \
-d '{"model":"google/gemma-4-26b-a4b","messages":[{"role":"user","content":"hi"}]}'
from openai import OpenAI # api_key is any placeholder (gateway injects it)
cloud = OpenAI(base_url="https://llm.hostbun.cc/v1", api_key="x")
local = OpenAI(base_url="https://llm.hostbun.cc/local/v1", api_key="x")
cloud.chat.completions.create(model="gpt-4o", messages=[{"role":"user","content":"hi"}])
local.chat.completions.create(model="google/gemma-4-26b-a4b", messages=[{"role":"user","content":"hi"}])
OpenAI-compatible aggregator (upstream api.crazyrouter.com/v1). Everything not under /local, /docs or /prices is proxied here with the key injected. Capabilities: chat, streaming ("stream":true), tool/function calling, structured outputs, vision, image/video/audio generation, embeddings, rerank. Full upstream reference: crazyrouter.com/docs.
# streaming
curl -N https://llm.hostbun.cc/v1/chat/completions -H "Content-Type: application/json" \
-d '{"model":"gemini-2.5-pro","stream":true,"messages":[{"role":"user","content":"count to 5"}]}'
Routes to LM Studio on the pbox box (RTX 4090) via llm.bofrid.dev. Private (data stays on-prem), free, fast. Pinned model google/gemma-4-26b-a4b (26B MoE, vision + tools, ~8K ctx) — requests for other ids are served by it. Vision works (send image_url parts incl. base64). Anthropic /local/v1/messages and native /local/api/v1/chat are reachable too.
curl https://llm.hostbun.cc/local/v1/models
curl https://llm.hostbun.cc/local/v1/chat/completions -H "Content-Type: application/json" \
-d '{"model":"google/gemma-4-26b-a4b","messages":[{"role":"user","content":"describe"}]}'
Actual prices (per-model discount applied, default group). Fetch programmatically:
curl https://llm.hostbun.cc/prices.json # or /prices
# { "generated_at":"…","group_ratio":1,"count":247,
# "models":[{"model":"claude-sonnet-4-6","type":"token",
# "input_per_1m":1.65,"output_per_1m":8.25,"discount":0.55}, …] }
Token models: USD per 1M tokens (in/out). Image/video/audio: USD per call. Local lane (google/gemma-4-26b-a4b) = $0. Self-refreshes every 6h — last update: ….
| Model (…) | Type | Input $/1M | Output $/1M | Disc |
|---|---|---|---|---|
| loading prices… | ||||
curl https://llm.hostbun.cc/dashboard/billing/subscription # {"hard_limit_usd":13,…}
curl https://llm.hostbun.cc/dashboard/billing/usage # {"total_usage":…}
/v1 cloud lane is open — the key is injected, so anyone with this URL spends the Crazyrouter balance. Keep it private, or ask for an IP-allowlist / token gate.Caddy on hostbun (Coolify). Source: devdashco/llm-hostbun-proxy.