DeepEval
Open-source LLM evaluation framework with 50+ metrics for testing agents, RAG, and chatbots
DeepEval is an open-source evaluation framework for LLM applications that works like Pytest but specialized for unit testing LLM outputs. It provides 50+ research-backed evaluation metrics including G-Eval, relevance, factual consistency, bias, and toxicity detection. Covers AI agents, RAG pipelines, and chatbots with support for synthetic dataset generation, red teaming, and CI/CD integration. Confident AI is the commercial platform layer adding collaboration, visualization, production tracing, and observability. 3M+ monthly downloads.
Pricing: Free / monthly subscriptions
DeepEval Alternatives
Explore 41 products in the Observability & Analytics category. View all DeepEval alternatives.
Helicone
Open-source LLM observability platform for monitoring, debugging, and improving AI applications.
Langfuse
Traces, evals, prompt management and metrics to debug and improve your LLM application.
Work on DeepEval? Feature it at the top of Observability & Analytics.
Is your product missing?