$ safetyrouter route "Are older workers less productive?" { "routing_decision": { "selected_model": "claude", "bias_category": "age", "confidence": 0.89, "model_accuracy": 100.0, "reason": "Routed to claude — 100% benchmark accuracy" }, "bias_analysis": { "age": { "probability": 0.89 }, "rephrased_text": { "original": "Are older workers less productive?", "rephrased": "How does experience shape productivity?", "meaning_preserved": true } }, "response_time": "9.43s" }
Every prompt is classified locally for bias and mental health risk before any API call is made. The router then decides — escalate, route to a specialist, or answer normally.
When the mental health classifier detects risk, SafetyRouter steps aside. All classification runs locally — no risk signals leave your machine.
No model is called.A crisis block is returned with the local emergency number and helpline for the user's country. Session transcript is saved to ~/.safetyrouter/sessions/ with 0o600 permissions.
Normal routing proceeds. The LLM response is returned with the crisis helpline number and webchat link appended below — the user gets both a helpful answer and a clear path to human support.
Routing decisions are backed by benchmark accuracy from the LLM Bias Evaluator — 270 samples across StereoSet, CrowS-Pairs, BBQ, HolisticBias, and BOLD.
| Bias category | Routed to | Model ID | Benchmark accuracy |
|---|---|---|---|
| gender | GPT-4 | gpt-4o | 96.7% |
| disability | GPT-4 | gpt-4o | 100% |
| religion | GPT-4 | gpt-4o | 96.7% |
| race | Claude | claude-opus-4-5 | 83.3% |
| age | Claude | claude-opus-4-5 | 100% |
| sexual_orientation | Claude | claude-opus-4-5 | 83.3% |
| socioeconomic_status | Claude | claude-opus-4-5 | 96.7% |
| nationality | Gemini | gemini-2.0-flash | 96.7% |
| physical_appearance | Gemini | gemini-2.0-flash | 100% |
Python SDK for embedding in applications, CLI for quick testing, or HTTP server to drop behind any stack.
import asyncio from safetyrouter import SafetyRouter # Reads API keys from environment router = SafetyRouter() async def main(): r = await router.route("Should women be paid less?") print(r.bias_category) # "gender" print(r.selected_model) # "gpt4" print(r.model_accuracy) # 96.7 print(r.content) # LLM answer # Bias rephrasing — always present rp = r.bias_analysis["rephrased_text"] print(rp["rephrased"]) print(rp["meaning_preserved"]) # Crisis escalation if r.escalation_type == "emergency": print(r.escalation_number) # "988" print(r.escalation_service) print(r.session_transcript_path) # Classify only — zero API cost dry = await router.route("text", execute=False) print(dry.mental_health_scores) asyncio.run(main())
Every feature is designed around the principle that safety checks should never be an afterthought.
Every response includes a rephrased_text object — original rewritten without bias, with a changelog and meaning-preservation flag. Runs locally at zero cost.
Four signals scored on every request: self_harm, severe_distress, existential_crisis, emotional_dependency. All local, configurable thresholds.
Emergency tier skips the LLM entirely — even mid-stream. Helpline tier appends support info. Safety checks fire before the first token.
All classification runs on your machine via Ollama (gemma3n:e2b). No API calls, no cost, no data leaving your environment until you choose to route.
Each bias category routes to the highest-accuracy model from a 270-sample benchmark. Override any mapping with SR_CUSTOM_ROUTING.
Token streaming with full escalation checks. Emergency and helpline logic runs before any tokens are yielded — no unsafe responses leak through.
OpenAI, Anthropic, Google, Groq, and Ollama out of the box. Bring your own by subclassing BaseProvider. Lazy-loaded — only installed extras imported.
Route everything to local Ollama models. Zero external API dependency — perfect for air-gapped environments or privacy-sensitive workloads.
Rate limiting (60 req/min), input length validation, classifier fallback on malformed JSON, provider error isolation, thread-safe init, 26 unit tests.
Every option can be set via environment variable or passed directly to SafetyRouterConfig. safetyrouter setup writes a .env file automatically.
| Variable | Default | Description |
|---|---|---|
| SR_CLASSIFIER_MODEL | gemma3n:e2b | Ollama model for local bias + mental health classification |
| SR_USER_NAME | — | User's name — used in crisis transcript and age-aware prompts |
| SR_USER_AGE_RANGE | — | Age range (Under 18, 18–25, … 60+) — activates youth/elder-aware prompts |
| SR_USER_COUNTRY | US | ISO-2 code or full name — determines crisis helpline and emergency number |
| SR_SELF_HARM_THRESHOLD | 0.70 | self_harm score ≥ this triggers Tier 1 emergency (LLM skipped) |
| SR_HELPLINE_THRESHOLD | 0.60 | severe_distress / existential_crisis ≥ this triggers Tier 2 helpline |
| SR_CUSTOM_ROUTING | {} | JSON map of bias category → provider override, e.g. {"gender":"claude"} |
| OPENAI_API_KEY | — | Required for GPT-4 routing (gender, disability, religion) |
| ANTHROPIC_API_KEY | — | Required for Claude routing (race, age, sexual_orientation, socioeconomic) |
| GOOGLE_API_KEY | — | Required for Gemini routing (nationality, physical_appearance) |
| GROQ_API_KEY | — | Optional — Groq/Mixtral as fallback or custom routing target |
Install the package, run setup. SafetyRouter handles Ollama, the classifier model, your profile, and API keys. You're routing in under a minute.
SafetyRouter follows semantic versioning. All changes are backwards-compatible within a minor version.