llm.hostbun.cc

One OpenAI-compatible endpoint, two lanes — cloud (Crazyrouter, 247 models) and local (this machine's GPU). No API key needed; auth is handled server-side. Docs also at docs.llm.hostbun.cc.

All endpoints

MethodPathLanePurpose
POST/v1/chat/completionscloudChat — streaming, tools, structured outputs, vision (any of 247 models)
POST/v1/completions · /v1/embeddings · /v1/images/generations · /v1/audio/* · /v1/rerankcloudFull Crazyrouter OpenAI-compatible surface (key injected)
GET/v1/modelscloudList cloud models
POST/local/v1/chat/completionslocalChat with the local machine (gemma on the GPU)
GET/local/v1/modelslocalList local models
GET/prices.json · /pricesmetaComputed actual prices (discount applied); refreshes every 6h
GET/dashboard/billing/subscriptionmetaCredit limit (hard_limit_usd)
GET/dashboard/billing/usagemetaUsage to date
GET/api/pricingcloudRaw Crazyrouter pricing (ratios/discount)
GET/docsmetaThis page

Quick start & switching cloud ↔ local

Identical OpenAI API on both lanes — the only difference is the base URL path: /v1 for cloud, /local/v1 to talk to the local machine.

LaneBase URLModels
cloudhttps://llm.hostbun.cc/v1247 (Claude, GPT, Gemini, DeepSeek, Qwen, Doubao, Grok…)
localhttps://llm.hostbun.cc/local/v1google/gemma-4-26b-a4b (free, on-prem, vision)
# cloud
curl https://llm.hostbun.cc/v1/chat/completions -H "Content-Type: application/json" \
  -d '{"model":"claude-sonnet-4-6","messages":[{"role":"user","content":"hi"}]}'
# local machine
curl https://llm.hostbun.cc/local/v1/chat/completions -H "Content-Type: application/json" \
  -d '{"model":"google/gemma-4-26b-a4b","messages":[{"role":"user","content":"hi"}]}'
from openai import OpenAI                       # api_key is any placeholder (gateway injects it)
cloud = OpenAI(base_url="https://llm.hostbun.cc/v1",       api_key="x")
local = OpenAI(base_url="https://llm.hostbun.cc/local/v1", api_key="x")
cloud.chat.completions.create(model="gpt-4o", messages=[{"role":"user","content":"hi"}])
local.chat.completions.create(model="google/gemma-4-26b-a4b", messages=[{"role":"user","content":"hi"}])

Cloud lane — Crazyrouter /v1

OpenAI-compatible aggregator (upstream api.crazyrouter.com/v1). Everything not under /local, /docs or /prices is proxied here with the key injected. Capabilities: chat, streaming ("stream":true), tool/function calling, structured outputs, vision, image/video/audio generation, embeddings, rerank. Full upstream reference: crazyrouter.com/docs.

# streaming
curl -N https://llm.hostbun.cc/v1/chat/completions -H "Content-Type: application/json" \
  -d '{"model":"gemini-2.5-pro","stream":true,"messages":[{"role":"user","content":"count to 5"}]}'

Local lane — the machine /local/v1

Routes to LM Studio on the pbox box (RTX 4090) via llm.bofrid.dev. Private (data stays on-prem), free, fast. Pinned model google/gemma-4-26b-a4b (26B MoE, vision + tools, ~8K ctx) — requests for other ids are served by it. Vision works (send image_url parts incl. base64). Anthropic /local/v1/messages and native /local/api/v1/chat are reachable too.

curl https://llm.hostbun.cc/local/v1/models
curl https://llm.hostbun.cc/local/v1/chat/completions -H "Content-Type: application/json" \
  -d '{"model":"google/gemma-4-26b-a4b","messages":[{"role":"user","content":"describe"}]}'

Pricing /prices.json

Actual prices (per-model discount applied, default group). Fetch programmatically:

curl https://llm.hostbun.cc/prices.json    # or /prices
# { "generated_at":"…","group_ratio":1,"count":247,
#   "models":[{"model":"claude-sonnet-4-6","type":"token",
#              "input_per_1m":1.65,"output_per_1m":8.25,"discount":0.55}, …] }

Token models: USD per 1M tokens (in/out). Image/video/audio: USD per call. Local lane (google/gemma-4-26b-a4b) = $0. Self-refreshes every 6h — last update: .

Model ()TypeInput $/1MOutput $/1MDisc
loading prices…

Credits / balance

curl https://llm.hostbun.cc/dashboard/billing/subscription   # {"hard_limit_usd":13,…}
curl https://llm.hostbun.cc/dashboard/billing/usage          # {"total_usage":…}
⚠️ The /v1 cloud lane is open — the key is injected, so anyone with this URL spends the Crazyrouter balance. Keep it private, or ask for an IP-allowlist / token gate.

Caddy on hostbun (Coolify). Source: devdashco/llm-hostbun-proxy.