≫ Home / Observability & Analytics / LLM Guardrails Compared

An abstract mesh of connected nodes, representing a guardrail layer that filters traffic

LLM Guardrails Compared (2026)

June 11, 2026 / Arvid Andersson

Orchestration gets an agent answering questions in a day. Then it meets real traffic, a wrong or unsafe answer reaches a user, and the real work starts: stopping the bad calls before they get out. Guardrails are the runtime layer that does that. This post compares the main options as of June 2026, organized by what each one actually checks and where it runs.

Looking for a side-by-side table with filters? See the Observability & Analytics comparison.

Three places a guardrail runs

Each position catches a different class of failure. Production systems often use all three.

Input rails. Screen the user message and retrieved context before the model sees them. Target: prompt injection, jailbreaks, PII leaking in, off-policy topics.
Output rails. Screen the model's response before it reaches the user. Target: toxicity, data leaks, hallucination, off-topic drift.
Tool-execution rails. Sit between the agent and its tools, deciding whether a tool call is allowed to run. Target: unsafe or unauthorized actions.

Why this is a separate decision

A guardrail is not observability and it is not evaluation. Observability records what happened. Evaluation scores quality offline. A guardrail makes a decision in the request path, right now, about whether content or an action is allowed through. That puts it directly in your trust path, which is why the open-source question matters more here than in most categories: you may want to read, fork, and run the checks in your own infrastructure with your data staying on your side.

Prompt injection is the anchor threat. It ranks first (LLM01) on the OWASP Top 10 for LLM Applications, where it has held the top spot since 2023, including indirect injection hidden inside retrieved documents. Guardrails reduce that risk by detecting known patterns at the input stage, but no detector catches everything, so the standard guidance pairs input screening with least-privilege agent design.

Open-source libraries

Guardrails AI (Apache 2.0) wraps LLM calls with validators pulled from the Guardrails Hub, a library of community and first-party checks for hallucination, PII, toxicity, and format compliance. Validators compose into guards in code, and it can re-ask or fix outputs that fail. It runs as a Python library or a server with an OpenAI-compatible endpoint.

NeMo Guardrails (Apache 2.0, NVIDIA) defines programmable rails in Colang, a Python-like language for modeling dialogue flows. It covers five rail types, input, dialog, retrieval, execution, and output, so it reaches the tool-execution layer that scanner-only libraries do not. It installs via pip and can run as a guardrails server. The trade-off is Colang itself: the dialogue-flow modeling is more to learn than dropping in a scanner, which is the cost of the broader rail coverage.

LLM Guard (MIT, built by Protect AI, which Palo Alto Networks acquired in 2025) provides input scanners (prompt injection, toxicity, secrets, code, banned topics) and output scanners (toxicity, bias, sensitive data, relevance). One important caveat: the repository was archived in July 2026 and is no longer maintained. The code still works and the MIT license still applies, but there will be no updates against new attack patterns, which is a real limitation for a security component.

Presidio (MIT, Microsoft) is narrower on purpose: it detects and anonymizes PII in text, images, and structured data, using NLP, regex, rule-based logic, and checksums. It is often the PII layer inside a larger guardrails stack rather than a standalone safety system. Microsoft is explicit that it offers no guarantee of finding all sensitive data, so it is best treated as one layer, not the whole defense.

Hosted security services

Lakera is an API-first runtime security service (acquired by Check Point in 2025) focused on prompt injection and data-leak defense for apps and agents. It deploys without model or prompt changes and adds low latency. It also runs Gandalf, a gamified red-teaming platform used to source adversarial data. The trade-off common to hosted guardrails applies: prompts and responses route through a third party, which is a data-residency question worth checking against your requirements.

Cleanlab validates agent responses in real time, detecting hallucinations, retrieval errors, and policy violations, and pairs detection with human-in-the-loop remediation. It runs as an independent layer that scores responses without changing the underlying stack, available as SaaS or private VPC.

Adjacent: evaluation and observability platforms with guardrails

Several broader platforms bundle guardrails alongside tracing and evaluation. Future AGI includes built-in scanners and vendor adapters next to its eval and tracing layers. Patronus AI centers on hallucination and correctness scoring. Giskard scans models for vulnerabilities before deployment. These are a good fit when you want safety, evaluation, and monitoring under one roof rather than a dedicated guardrail library.

Comparison

Tool	Type	Focus	Open source
Guardrails AI	Library	Input/output validators from a hub	Yes (Apache 2.0)
NeMo Guardrails	Library	Programmable rails (incl. tool-execution)	Yes (Apache 2.0)
LLM Guard	Library	Input/output scanners (archived July 2026)	Yes (MIT)
Presidio	Library	PII detection and anonymization	Yes (MIT)
Lakera	Hosted API	Prompt injection, data-leak defense	No
Cleanlab	Hosted	Real-time hallucination/policy validation	No

Licenses and ownership as of July 2026. Protect AI (LLM Guard) is part of Palo Alto Networks and archived LLM Guard in July 2026; Lakera is part of Check Point.

How to choose

PII is the main concern: Presidio, often combined with a broader scanner.
You want a validator library you compose in code: Guardrails AI. (LLM Guard used to fit here too, but its repository was archived in July 2026, so treat it as frozen.)
You need tool-execution rails for agents: NeMo Guardrails reaches the action layer.
Prompt injection is the priority and you want managed detection: Lakera.
Real-time answer validation with remediation: Cleanlab.
Safety plus evaluation and tracing in one platform: Future AGI, Patronus, or Giskard.

Most of the open-source options are free to run, so the practical move is to wire one input and one output check into a real path and measure false positives before expanding coverage.

Frequently asked questions

What are LLM guardrails?

Guardrails are the runtime layer between an LLM or agent and the outside world. Every input going in and every response coming out is checked against policies: prompt injection, jailbreak, PII, toxicity, off-topic drift, hallucination, and any custom domain rules. A guardrail can block, redact, or rewrite content before it reaches a user or before a tool call fires. This is distinct from observability (which records what happened) and evaluation (which scores quality offline).

What is the difference between input, output, and tool-execution guardrails?

Input guardrails screen the user message and any retrieved context before the model sees them, catching prompt injection, jailbreaks, and PII leaking in. Output guardrails screen the model's response before it reaches the user, catching toxicity, data leaks, and unfaithful answers. Tool-execution (or action) guardrails sit between the agent and its tools, deciding whether a specific tool call is allowed to run. Many production systems need all three, since each catches a different class of failure.

Which open-source LLM guardrail libraries are available?

Guardrails AI (Apache 2.0) wraps calls with validators from its Guardrails Hub. NeMo Guardrails (Apache 2.0, NVIDIA) defines programmable rails in Colang across five rail types. LLM Guard (MIT, Protect AI) provides input and output scanners, though its repository was archived in July 2026 and no longer receives updates. Presidio (MIT, Microsoft) handles PII detection and anonymization specifically. All four are Python libraries you self-host and run in your own infrastructure, which matters because the guardrail sits in your trust path.

How do guardrails help with prompt injection?

Prompt injection sits at the top of the OWASP Top 10 for LLM Applications, so it is the threat most guardrail tools target first. Input guardrails detect and block known injection and jailbreak patterns before the model processes them, including indirect injection hidden in retrieved documents. No detector catches everything, so the common guidance is to layer input screening with least-privilege agent design (limit which tools an agent can call) rather than relying on detection alone.

Should I use an open-source guardrail library or a hosted security API?

Open-source libraries (Guardrails AI, NeMo Guardrails, LLM Guard, Presidio) keep data in your infrastructure and let you read and fork the checks, which matters when the guardrail decides what is safe. Hosted APIs (Lakera, Cleanlab) trade that for managed, continuously-updated detection and lower setup effort, at the cost of routing prompts through a third party. Many teams combine them: an open-source PII and format layer plus a hosted service for fast-moving threats like injection.

Browse all Observability & Analytics tools on Infrabase.ai

Is your product missing?

Add it here →