DeepEval

Open-source LLM evaluation framework with 50+ metrics for testing agents, RAG, and chatbots

Open Source Free Trial

DeepEval is an open-source evaluation framework for LLM applications that works like Pytest but specialized for unit testing LLM outputs. It provides 50+ research-backed evaluation metrics including G-Eval, relevance, factual consistency, bias, and toxicity detection. Covers AI agents, RAG pipelines, and chatbots with support for synthetic dataset generation, red teaming, and CI/CD integration. Confident AI is the commercial platform layer adding collaboration, visualization, production tracing, and observability. 3M+ monthly downloads.

Pricing: Free / monthly subscriptions

Screenshot of DeepEval webpage

Is your product missing? 👀 Add it here →