v0.2.3 · Open Source · Apache 2.0

Route every prompt to the
safest possible answer.

SafetyRouter classifies bias and mental health risk locally at zero API cost, routes to the best specialist model, and escalates to crisis services when a human is the right answer.

Read the docs →View on GitHub

●Running locally · 0 API cost

9bias categories scored

!2-tier crisis escalation

↑96.7% routing accuracy

~/safety-router · zsh

$ safetyrouter route "Are older workers less productive?"

{
  "routing_decision": {
    "selected_model": "claude",
    "bias_category":  "age",
    "confidence":     0.89,
    "model_accuracy": 100.0,
    "reason": "Routed to claude — 100% benchmark accuracy"
  },
  "bias_analysis": {
    "age": { "probability": 0.89 },
    "rephrased_text": {
      "original":  "Are older workers less productive?",
      "rephrased": "How does experience shape productivity?",
      "meaning_preserved": true
    }
  },
  "response_time": "9.43s"
}

SafetyRouter

How it works

A safer path
to a fairer answer.

Every prompt is classified locally for bias and mental health risk before any API call is made. The router then decides — escalate, route to a specialist, or answer normally.

— 01

Receive prompt

Any text sent via SDK, CLI, or HTTP. Input is length-validated up to 10,000 characters before classification.

— 02

Classify locally

gemma3n:e2b scores 9 bias categories and 4 mental health signals on-device via Ollama. Zero API cost, zero data egress.

— 03

Route or escalate

Emergency → skip LLM, return crisis line. Helpline → LLM + appended support info. Otherwise → route to bias-specialist model.

— 04

Safe response

Fair answer plus a rephrased bias-free version of your prompt — or crisis resources if a human is the right answer.

Crisis safety

When a human is
the right answer.

When the mental health classifier detects risk, SafetyRouter steps aside. All classification runs locally — no risk signals leave your machine.

Tier 1 — Emergency

LLM is skipped
entirely.

self_harm ≥ 0.70

No model is called.A crisis block is returned with the local emergency number and helpline for the user's country. Session transcript is saved to ~/.safetyrouter/sessions/ with 0o600 permissions.

Tier 2 — Helpline

LLM responds
+ helpline.

severe_distress ≥ 0.60

Normal routing proceeds. The LLM response is returned with the crisis helpline number and webchat link appended below — the user gets both a helpful answer and a clear path to human support.

15 countries supported out of the box

🇺🇸United States988

🇬🇧United Kingdom116 123

🇨🇦Canada1-833-456

🇦🇺Australia13 11 14

🇮🇳India9152987821

🇳🇿New Zealand1737

🇩🇪Germany0800 111

🇫🇷France3114

🇯🇵Japan0120-783

🇧🇷Brazil188

🇲🇽Mexico800 290

🇿🇦South Africa0800 567

🇸🇬Singapore1800 221

🇮🇪Ireland116 123

🇲🇾Malaysia015-4882

Routing table

Every bias type
has a specialist.

Routing decisions are backed by benchmark accuracy from the LLM Bias Evaluator — 270 samples across StereoSet, CrowS-Pairs, BBQ, HolisticBias, and BOLD.

Bias category	Routed to	Model ID	Benchmark accuracy
gender	GPT-4	`gpt-4o`	96.7%
disability	GPT-4	`gpt-4o`	100%
religion	GPT-4	`gpt-4o`	96.7%
race	Claude	`claude-opus-4-5`	83.3%
age	Claude	`claude-opus-4-5`	100%
sexual_orientation	Claude	`claude-opus-4-5`	83.3%
socioeconomic_status	Claude	`claude-opus-4-5`	96.7%
nationality	Gemini	`gemini-2.0-flash`	96.7%
physical_appearance	Gemini	`gemini-2.0-flash`	100%

Documentation

Three ways
to integrate.

Python SDK for embedding in applications, CLI for quick testing, or HTTP server to drop behind any stack.

main.py

import asyncio
from safetyrouter import SafetyRouter

# Reads API keys from environment
router = SafetyRouter()

async def main():
    r = await router.route("Should women be paid less?")

    print(r.bias_category)   # "gender"
    print(r.selected_model)  # "gpt4"
    print(r.model_accuracy)   # 96.7
    print(r.content)         # LLM answer

    # Bias rephrasing — always present
    rp = r.bias_analysis["rephrased_text"]
    print(rp["rephrased"])
    print(rp["meaning_preserved"])

    # Crisis escalation
    if r.escalation_type == "emergency":
        print(r.escalation_number)  # "988"
        print(r.escalation_service)
        print(r.session_transcript_path)

    # Classify only — zero API cost
    dry = await router.route("text", execute=False)
    print(dry.mental_health_scores)

asyncio.run(main())

Features

Built for
responsible AI.

Every feature is designed around the principle that safety checks should never be an afterthought.

— 01

Bias rephrasing

Every response includes a rephrased_text object — original rewritten without bias, with a changelog and meaning-preservation flag. Runs locally at zero cost.

— 02

Mental health
detection

Four signals scored on every request: self_harm, severe_distress, existential_crisis, emotional_dependency. All local, configurable thresholds.

— 03

Two-tier crisis
escalation

Emergency tier skips the LLM entirely — even mid-stream. Helpline tier appends support info. Safety checks fire before the first token.

— 04

Zero-cost
classification

All classification runs on your machine via Ollama (gemma3n:e2b). No API calls, no cost, no data leaving your environment until you choose to route.

— 05

Benchmark-backed
routing

Each bias category routes to the highest-accuracy model from a 270-sample benchmark. Override any mapping with SR_CUSTOM_ROUTING.

— 06

Safe streaming

Token streaming with full escalation checks. Emergency and helpline logic runs before any tokens are yielded — no unsafe responses leak through.

— 07

Pluggable
providers

OpenAI, Anthropic, Google, Groq, and Ollama out of the box. Bring your own by subclassing BaseProvider. Lazy-loaded — only installed extras imported.

— 08

Fully local
mode

Route everything to local Ollama models. Zero external API dependency — perfect for air-gapped environments or privacy-sensitive workloads.

— 09

Production
hardened

Rate limiting (60 req/min), input length validation, classifier fallback on malformed JSON, provider error isolation, thread-safe init, 26 unit tests.

Configuration

All environment
variables.

Every option can be set via environment variable or passed directly to SafetyRouterConfig. safetyrouter setup writes a .env file automatically.

Variable	Default	Description
SR_CLASSIFIER_MODEL	gemma3n:e2b	Ollama model for local bias + mental health classification
SR_USER_NAME	—	User's name — used in crisis transcript and age-aware prompts
SR_USER_AGE_RANGE	—	Age range (Under 18, 18–25, … 60+) — activates youth/elder-aware prompts
SR_USER_COUNTRY	US	ISO-2 code or full name — determines crisis helpline and emergency number
SR_SELF_HARM_THRESHOLD	0.70	self_harm score ≥ this triggers Tier 1 emergency (LLM skipped)
SR_HELPLINE_THRESHOLD	0.60	severe_distress / existential_crisis ≥ this triggers Tier 2 helpline
SR_CUSTOM_ROUTING	{}	JSON map of bias category → provider override, e.g. `{"gender":"claude"}`
OPENAI_API_KEY	—	Required for GPT-4 routing (gender, disability, religion)
ANTHROPIC_API_KEY	—	Required for Claude routing (race, age, sexual_orientation, socioeconomic)
GOOGLE_API_KEY	—	Required for Gemini routing (nationality, physical_appearance)
GROQ_API_KEY	—	Optional — Groq/Mixtral as fallback or custom routing target

Changelog

What's new.

SafetyRouter follows semantic versioning. All changes are backwards-compatible within a minor version.

v0.2.3

April 2026

Production hardening

fiximprove

CriticalStream escalation gap closed — stream() now checks self_harm and crisis scores before yielding any tokens; emergency stops the stream entirely.
HighAsync classifier — wrapped in asyncio.to_thread(); no longer blocks the event loop on Ollama calls.
HighSafe fallback on malformed JSON — classifier degrades gracefully instead of crashing the request.
MediumSR_CUSTOM_ROUTING env var — JSON-encoded per-category routing overrides without code changes.
MediumRate limiting + input length — 60 req/min per IP; 10,000-char input cap enforced at router and server.
LowDemographic skip + 0o600 transcripts — catch-all category no longer wins routing; transcripts owner-only.

v0.2.2

April 2026

Bias rephrasing + structured CLI output

newimprove

NewBias rephrasing — classifier returns rephrased_text with original, rephrased, changes_made, meaning_preserved, and meaning_change_risk.
NewJSON CLI output — safetyrouter route returns structured JSON by default. The --json-output flag is removed.
ImproveServer hardening — thread-safe double-checked locking for router init and streaming escalation in HTTP /route.

v0.2.0

March 2026

Mental health risk + crisis escalation

new

New4 mental health signals — self_harm, severe_distress, existential_crisis, emotional_dependency, all scored locally.
NewTwo-tier escalation — EMERGENCY skips LLM entirely; HELPLINE appends crisis line to LLM response.
New15-country crisis database — emergency numbers, helplines, and webchat links. Session transcripts saved to ~/.safetyrouter/sessions/.
NewFastAPI server — /route, /classify, /health, /routing-table endpoints.

v0.1.0

February 2026

Initial release

new

New9 bias categories classified locally with gemma3n:e2b via Ollama.
NewRouting table backed by LLM Bias Evaluator benchmark (270 samples).
New5 providers — OpenAI, Anthropic, Google, Groq, Ollama. Python SDK + CLI + pip package.

Route every prompt to thesafest possible answer.

A safer pathto a fairer answer.

When a human isthe right answer.

Every bias typehas a specialist.

Three waysto integrate.

Built forresponsible AI.

Bias rephrasing

Mental healthdetection

Two-tier crisisescalation

Zero-costclassification

Benchmark-backedrouting