LLM Guardrails Compared (2026)
/ Arvid Andersson
Orchestration gets an agent answering questions in a day. Then it meets real traffic, a wrong or unsafe answer reaches a user, and the real work starts: stopping the bad calls before they get out. Guardrails are the runtime layer that does that. This post compares the main options as of June 2026, organized by what each one actually checks and where it runs.
Looking for a side-by-side table with filters? See the Observability & Analytics comparison.
Three places a guardrail runs
Each position catches a different class of failure. Production systems often use all three.
- Input rails. Screen the user message and retrieved context before the model sees them. Target: prompt injection, jailbreaks, PII leaking in, off-policy topics.
- Output rails. Screen the model's response before it reaches the user. Target: toxicity, data leaks, hallucination, off-topic drift.
- Tool-execution rails. Sit between the agent and its tools, deciding whether a tool call is allowed to run. Target: unsafe or unauthorized actions.
Why this is a separate decision
A guardrail is not observability and it is not evaluation. Observability records what happened. Evaluation scores quality offline. A guardrail makes a decision in the request path, right now, about whether content or an action is allowed through. That puts it directly in your trust path, which is why the open-source question matters more here than in most categories: you may want to read, fork, and run the checks in your own infrastructure with your data staying on your side.
Prompt injection is the anchor threat. It ranks first (LLM01) on the OWASP Top 10 for LLM Applications, where it has held the top spot since 2023, including indirect injection hidden inside retrieved documents. Guardrails reduce that risk by detecting known patterns at the input stage, but no detector catches everything, so the standard guidance pairs input screening with least-privilege agent design.
Open-source libraries
Guardrails AI (Apache 2.0) wraps LLM calls with validators pulled from the Guardrails Hub, a library of community and first-party checks for hallucination, PII, toxicity, and format compliance. Validators compose into guards in code, and it can re-ask or fix outputs that fail. It runs as a Python library or a server with an OpenAI-compatible endpoint.
NeMo Guardrails (Apache 2.0, NVIDIA) defines programmable rails in Colang, a Python-like language for modeling dialogue flows. It covers five rail types, input, dialog, retrieval, execution, and output, so it reaches the tool-execution layer that scanner-only libraries do not. It installs via pip and can run as a guardrails server. The trade-off is Colang itself: the dialogue-flow modeling is more to learn than dropping in a scanner, which is the cost of the broader rail coverage.
LLM Guard (MIT, maintained by Protect AI, which Palo Alto Networks acquired in 2025) provides input scanners (prompt injection, toxicity, secrets, code, banned topics) and output scanners (toxicity, bias, sensitive data, relevance). It is a focused scanner toolkit you drop in front of and behind the model.
Presidio (MIT, Microsoft) is narrower on purpose: it detects and anonymizes PII in text, images, and structured data, using NLP, regex, rule-based logic, and checksums. It is often the PII layer inside a larger guardrails stack rather than a standalone safety system. Microsoft is explicit that it offers no guarantee of finding all sensitive data, so it is best treated as one layer, not the whole defense.
Hosted security services
Lakera is an API-first runtime security service (acquired by Check Point in 2025) focused on prompt injection and data-leak defense for apps and agents. It deploys without model or prompt changes and adds low latency. It also runs Gandalf, a gamified red-teaming platform used to source adversarial data. The trade-off common to hosted guardrails applies: prompts and responses route through a third party, which is a data-residency question worth checking against your requirements.
Cleanlab validates agent responses in real time, detecting hallucinations, retrieval errors, and policy violations, and pairs detection with human-in-the-loop remediation. It runs as an independent layer that scores responses without changing the underlying stack, available as SaaS or private VPC.
Adjacent: evaluation and observability platforms with guardrails
Several broader platforms bundle guardrails alongside tracing and evaluation. Future AGI includes built-in scanners and vendor adapters next to its eval and tracing layers. Patronus AI centers on hallucination and correctness scoring. Giskard scans models for vulnerabilities before deployment. These are a good fit when you want safety, evaluation, and monitoring under one roof rather than a dedicated guardrail library.
Comparison
| Tool | Type | Focus | Open source |
|---|---|---|---|
| Library | Input/output validators from a hub | Yes (Apache 2.0) | |
| Library | Programmable rails (incl. tool-execution) | Yes (Apache 2.0) | |
| Library | Input/output scanners | Yes (MIT) | |
| Library | PII detection and anonymization | Yes (MIT) | |
| Hosted API | Prompt injection, data-leak defense | No | |
| Hosted | Real-time hallucination/policy validation | No |
Licenses and ownership as of June 2026. Protect AI (LLM Guard) is part of Palo Alto Networks; Lakera is part of Check Point.
How to choose
- PII is the main concern: Presidio, often combined with a broader scanner.
- You want a validator library you compose in code: Guardrails AI or LLM Guard.
- You need tool-execution rails for agents: NeMo Guardrails reaches the action layer.
- Prompt injection is the priority and you want managed detection: Lakera.
- Real-time answer validation with remediation: Cleanlab.
- Safety plus evaluation and tracing in one platform: Future AGI, Patronus, or Giskard.
Most of the open-source options are free to run, so the practical move is to wire one input and one output check into a real path and measure false positives before expanding coverage.
Related reading
Guardrails are one layer of the stack around an agent. These cover the others.
Frequently asked questions
What are LLM guardrails?
Guardrails are the runtime layer between an LLM or agent and the outside world. Every input going in and every response coming out is checked against policies: prompt injection, jailbreak, PII, toxicity, off-topic drift, hallucination, and any custom domain rules. A guardrail can block, redact, or rewrite content before it reaches a user or before a tool call fires. This is distinct from observability (which records what happened) and evaluation (which scores quality offline).
What is the difference between input, output, and tool-execution guardrails?
Input guardrails screen the user message and any retrieved context before the model sees them, catching prompt injection, jailbreaks, and PII leaking in. Output guardrails screen the model's response before it reaches the user, catching toxicity, data leaks, and unfaithful answers. Tool-execution (or action) guardrails sit between the agent and its tools, deciding whether a specific tool call is allowed to run. Many production systems need all three, since each catches a different class of failure.
Which open-source LLM guardrail libraries are available?
Guardrails AI (Apache 2.0) wraps calls with validators from its Guardrails Hub. NeMo Guardrails (Apache 2.0, NVIDIA) defines programmable rails in Colang across five rail types. LLM Guard (MIT, Protect AI) provides input and output scanners. Presidio (MIT, Microsoft) handles PII detection and anonymization specifically. All four are Python libraries you self-host and run in your own infrastructure, which matters because the guardrail sits in your trust path.
How do guardrails help with prompt injection?
Prompt injection sits at the top of the OWASP Top 10 for LLM Applications, so it is the threat most guardrail tools target first. Input guardrails detect and block known injection and jailbreak patterns before the model processes them, including indirect injection hidden in retrieved documents. No detector catches everything, so the common guidance is to layer input screening with least-privilege agent design (limit which tools an agent can call) rather than relying on detection alone.
Should I use an open-source guardrail library or a hosted security API?
Open-source libraries (Guardrails AI, NeMo Guardrails, LLM Guard, Presidio) keep data in your infrastructure and let you read and fork the checks, which matters when the guardrail decides what is safe. Hosted APIs (Lakera, Cleanlab) trade that for managed, continuously-updated detection and lower setup effort, at the cost of routing prompts through a third party. Many teams combine them: an open-source PII and format layer plus a hosted service for fast-moving threats like injection.
Browse all Observability & Analytics tools on Infrabase.ai
Is your product missing?