HomeBenchmark

#1 on Every Major Benchmark

Evaluated on DRACO, DeepSearchQA, and DeepResearch Bench — PhD-level research tasks graded by domain experts.

78.6%

DRACO

84.5%

DeepSearchQA

56.27

DeepResearch Bench

Join the Waitlist View on GitHub

DRACO — Perplexity + Harvard

DRACO Benchmark

100 open-ended research questions across 10 domains, judged by Gemini-2.5-Pro against 3,934 weighted rubric criteria. Grep leads all four evaluation axes and wins 9 of 10 domains.

Grep wins 9 of 10 domains

Grep

Perplexity DR (Opus 4.6)

Claude Opus 4.6

Gemini Deep Research

OpenAI Deep Research (o3)

Factual Accuracy

75.4%

+7.5pp vs Perplexity

Breadth & Depth

80.3%

+7.2pp

Presentation

93.3%

+3.0pp

Citation

79.1%

+14.5pp

DeepSearchQA — Google

Grep

Perplexity Deep Research

Moonshot K2.5

Anthropic Opus 4.5

Parallel Ultra2x

DeepSearchQA

896 multi-step research questions across 17 subject domains. Judge: Gemini 2.5 Flash. Grep achieves 84.5% FC with perfect scores in Linguistics, Biology, and Arts & Entertainment.

14 of 17 categories exceed 80% FC

DeepResearch Bench — RACE Framework

DeepResearch Bench

100 PhD-level research questions (50 Chinese, 50 English), judged by Gemini-2.5-Pro. A score above 50 means the system outperformed the human expert. Grep leads the field of 34 systems.

Grep

0.00

Cellcog Max

0.00

nvidia-aiq

0.00

Cellcog

0.00

CMCC-DeepInsight

0.00

Insight

58.98

Comprehensiveness

56.79

Instruction Following

53.49

Readability

53.50

Methodology

Multi-Agent Architecture

Grep orchestrates specialised sub-agents — each responsible for search, synthesis, verification, and citation — then merges their outputs into a single, coherent research report.

Claude Opus 4.6 Backbone

All reasoning and synthesis steps are powered by Claude Opus 4.6, giving Grep best-in-class analytical depth, nuanced judgement, and instruction following.

Full data and reproduction scripts on GitHub

Experience #1 Ranked Research

See why Grep outperforms OpenAI, Google, Perplexity, and every specialised research platform on PhD-level tasks.

Join the Waitlist API Docs