v0.2.3  ·  Open Source  ·  Apache 2.0

Route every prompt to the
safest possible answer.

SafetyRouter classifies bias and mental health risk locally at zero API cost, routes to the best specialist model, and escalates to crisis services when a human is the right answer.

Running locally · 0 API cost
9bias categories scored
!2-tier crisis escalation
96.7% routing accuracy
~/safety-router · zsh
$ safetyrouter route "Are older workers less productive?"

{
  "routing_decision": {
    "selected_model": "claude",
    "bias_category":  "age",
    "confidence":     0.89,
    "model_accuracy": 100.0,
    "reason": "Routed to claude — 100% benchmark accuracy"
  },
  "bias_analysis": {
    "age": { "probability": 0.89 },
    "rephrased_text": {
      "original":  "Are older workers less productive?",
      "rephrased": "How does experience shape productivity?",
      "meaning_preserved": true
    }
  },
  "response_time": "9.43s"
}
SafetyRouter
Routes across
OpenAI
Anthropic
Google
Groq
Ollama
5 providers · BYO welcome
How it works

A safer path
to a fairer answer.

Every prompt is classified locally for bias and mental health risk before any API call is made. The router then decides — escalate, route to a specialist, or answer normally.

01
Receive prompt
Any text sent via SDK, CLI, or HTTP. Input is length-validated up to 10,000 characters before classification.
02
Classify locally
gemma3n:e2b scores 9 bias categories and 4 mental health signals on-device via Ollama. Zero API cost, zero data egress.
03
Route or escalate
Emergency → skip LLM, return crisis line. Helpline → LLM + appended support info. Otherwise → route to bias-specialist model.
04
Safe response
Fair answer plus a rephrased bias-free version of your prompt — or crisis resources if a human is the right answer.
Crisis safety

When a human is
the right answer.

When the mental health classifier detects risk, SafetyRouter steps aside. All classification runs locally — no risk signals leave your machine.

Tier 1 — Emergency
LLM is skipped
entirely.
self_harm ≥ 0.70

No model is called.A crisis block is returned with the local emergency number and helpline for the user's country. Session transcript is saved to ~/.safetyrouter/sessions/ with 0o600 permissions.

Tier 2 — Helpline
LLM responds
+ helpline.
severe_distress ≥ 0.60

Normal routing proceeds. The LLM response is returned with the crisis helpline number and webchat link appended below — the user gets both a helpful answer and a clear path to human support.

15 countries supported out of the box
🇺🇸United States988
🇬🇧United Kingdom116 123
🇨🇦Canada1-833-456
🇦🇺Australia13 11 14
🇮🇳India9152987821
🇳🇿New Zealand1737
🇩🇪Germany0800 111
🇫🇷France3114
🇯🇵Japan0120-783
🇧🇷Brazil188
🇲🇽Mexico800 290
🇿🇦South Africa0800 567
🇸🇬Singapore1800 221
🇮🇪Ireland116 123
🇲🇾Malaysia015-4882
Routing table

Every bias type
has a specialist.

Routing decisions are backed by benchmark accuracy from the LLM Bias Evaluator — 270 samples across StereoSet, CrowS-Pairs, BBQ, HolisticBias, and BOLD.

Bias categoryRouted toModel IDBenchmark accuracy
genderGPT-4gpt-4o
96.7%
disabilityGPT-4gpt-4o
100%
religionGPT-4gpt-4o
96.7%
raceClaudeclaude-opus-4-5
83.3%
ageClaudeclaude-opus-4-5
100%
sexual_orientationClaudeclaude-opus-4-5
83.3%
socioeconomic_statusClaudeclaude-opus-4-5
96.7%
nationalityGeminigemini-2.0-flash
96.7%
physical_appearanceGeminigemini-2.0-flash
100%
Documentation

Three ways
to integrate.

Python SDK for embedding in applications, CLI for quick testing, or HTTP server to drop behind any stack.

main.py
import asyncio
from safetyrouter import SafetyRouter

# Reads API keys from environment
router = SafetyRouter()

async def main():
    r = await router.route("Should women be paid less?")

    print(r.bias_category)   # "gender"
    print(r.selected_model)  # "gpt4"
    print(r.model_accuracy)   # 96.7
    print(r.content)         # LLM answer

    # Bias rephrasing — always present
    rp = r.bias_analysis["rephrased_text"]
    print(rp["rephrased"])
    print(rp["meaning_preserved"])

    # Crisis escalation
    if r.escalation_type == "emergency":
        print(r.escalation_number)  # "988"
        print(r.escalation_service)
        print(r.session_transcript_path)

    # Classify only — zero API cost
    dry = await router.route("text", execute=False)
    print(dry.mental_health_scores)

asyncio.run(main())
Features

Built for
responsible AI.

Every feature is designed around the principle that safety checks should never be an afterthought.

01

Bias rephrasing

Every response includes a rephrased_text object — original rewritten without bias, with a changelog and meaning-preservation flag. Runs locally at zero cost.

02

Mental health
detection

Four signals scored on every request: self_harm, severe_distress, existential_crisis, emotional_dependency. All local, configurable thresholds.

03

Two-tier crisis
escalation

Emergency tier skips the LLM entirely — even mid-stream. Helpline tier appends support info. Safety checks fire before the first token.

04

Zero-cost
classification

All classification runs on your machine via Ollama (gemma3n:e2b). No API calls, no cost, no data leaving your environment until you choose to route.

05

Benchmark-backed
routing

Each bias category routes to the highest-accuracy model from a 270-sample benchmark. Override any mapping with SR_CUSTOM_ROUTING.

06

Safe streaming

Token streaming with full escalation checks. Emergency and helpline logic runs before any tokens are yielded — no unsafe responses leak through.

07

Pluggable
providers

OpenAI, Anthropic, Google, Groq, and Ollama out of the box. Bring your own by subclassing BaseProvider. Lazy-loaded — only installed extras imported.

08

Fully local
mode

Route everything to local Ollama models. Zero external API dependency — perfect for air-gapped environments or privacy-sensitive workloads.

09

Production
hardened

Rate limiting (60 req/min), input length validation, classifier fallback on malformed JSON, provider error isolation, thread-safe init, 26 unit tests.

Configuration

All environment
variables.

Every option can be set via environment variable or passed directly to SafetyRouterConfig. safetyrouter setup writes a .env file automatically.

VariableDefaultDescription
SR_CLASSIFIER_MODELgemma3n:e2bOllama model for local bias + mental health classification
SR_USER_NAMEUser's name — used in crisis transcript and age-aware prompts
SR_USER_AGE_RANGEAge range (Under 18, 18–25, … 60+) — activates youth/elder-aware prompts
SR_USER_COUNTRYUSISO-2 code or full name — determines crisis helpline and emergency number
SR_SELF_HARM_THRESHOLD0.70self_harm score ≥ this triggers Tier 1 emergency (LLM skipped)
SR_HELPLINE_THRESHOLD0.60severe_distress / existential_crisis ≥ this triggers Tier 2 helpline
SR_CUSTOM_ROUTING{}JSON map of bias category → provider override, e.g. {"gender":"claude"}
OPENAI_API_KEYRequired for GPT-4 routing (gender, disability, religion)
ANTHROPIC_API_KEYRequired for Claude routing (race, age, sexual_orientation, socioeconomic)
GOOGLE_API_KEYRequired for Gemini routing (nationality, physical_appearance)
GROQ_API_KEYOptional — Groq/Mixtral as fallback or custom routing target

Two commands
to safer AI.

Install the package, run setup. SafetyRouter handles Ollama, the classifier model, your profile, and API keys. You're routing in under a minute.

Python 3.10+
Apache 2.0
v0.2.3
Changelog

What's new.

SafetyRouter follows semantic versioning. All changes are backwards-compatible within a minor version.

v0.2.3
April 2026

Production hardening

fiximprove
  • CriticalStream escalation gap closed — stream() now checks self_harm and crisis scores before yielding any tokens; emergency stops the stream entirely.
  • HighAsync classifier — wrapped in asyncio.to_thread(); no longer blocks the event loop on Ollama calls.
  • HighSafe fallback on malformed JSON — classifier degrades gracefully instead of crashing the request.
  • MediumSR_CUSTOM_ROUTING env var — JSON-encoded per-category routing overrides without code changes.
  • MediumRate limiting + input length — 60 req/min per IP; 10,000-char input cap enforced at router and server.
  • LowDemographic skip + 0o600 transcripts — catch-all category no longer wins routing; transcripts owner-only.
v0.2.2
April 2026

Bias rephrasing + structured CLI output

newimprove
  • NewBias rephrasing — classifier returns rephrased_text with original, rephrased, changes_made, meaning_preserved, and meaning_change_risk.
  • NewJSON CLI output — safetyrouter route returns structured JSON by default. The --json-output flag is removed.
  • ImproveServer hardening — thread-safe double-checked locking for router init and streaming escalation in HTTP /route.
v0.2.0
March 2026

Mental health risk + crisis escalation

new
  • New4 mental health signals — self_harm, severe_distress, existential_crisis, emotional_dependency, all scored locally.
  • NewTwo-tier escalation — EMERGENCY skips LLM entirely; HELPLINE appends crisis line to LLM response.
  • New15-country crisis database — emergency numbers, helplines, and webchat links. Session transcripts saved to ~/.safetyrouter/sessions/.
  • NewFastAPI server — /route, /classify, /health, /routing-table endpoints.
v0.1.0
February 2026

Initial release

new
  • New9 bias categories classified locally with gemma3n:e2b via Ollama.
  • NewRouting table backed by LLM Bias Evaluator benchmark (270 samples).
  • New5 providers — OpenAI, Anthropic, Google, Groq, Ollama. Python SDK + CLI + pip package.