How it works
Superhuman exploratory data analysis.
01
Upload
Drop in a tabular dataset and select your target variable. That's it – we do the rest.
02
Analyse
Discovery Engine fits neural networks to your data, then applies interpretability methods to extract the patterns they learned. All findings are validated on hold-out data, and contextualised with existing literature.
03
Discover
You get a ranked list of statistically significant patterns, with p-values, effect sizes, evidence, and context.
Publications
bioRxiv · 2025
Growth Cost and Transport Efficiency Tradeoffs Define Root System Optimization Across Varying Developmental Stages and Environments in Arabidopsis
Faizi, Mehta, Maida, Humphreys, Berrigan, McKee Reid, McCorkell, Tagade, Rumbelow, Showalter, Brent, Coroenne, Rigaud, Chandrasekhar, Navlakha, Martin, Pradal, Lee, Busch, Platre
bioRxiv · 2025
Automated Discovery of Patterns in T-Cell Receptor Physicochemical Signatures
Shams, Bishop, Mckee-Reid, Rumbelow
arXiv · 2025
Explaining Surface Layer Theory Departures in Marine Flux Profiles with Data-Driven Discovery
Foxabbott, Mckee-Reid, Cusick, McCorkell, Patel, Rumbelow, Rumbelow, Shams, Tagade, Hawbecker, Haupt
arXiv · 2025
Open Problems in Mechanistic Interpretability
Sharkey, Chughtai, Batson, Lindsey, Wu, Bushnaq, Goldowsky-Dill, Heimersheim, Ortega, Bloom, Biderman, Garriga-Alonso, Conmy, Nanda, Rumbelow, Wattenberg, Schoots, Miller, Michaud, Casper, Tegmark, Saunders, Bau, Todd, Geiger, Geva, Hoogland, Murfet, McGrath
AI 4 X Conference · 2025
Towards Data-Driven Scientific Discovery
Tagade, Mckee-Reid, McCorkell, Cusick, Sosa, Platre, Rumbelow, Shams
medRxiv · 2026
The Decline in Influenza Antibody Titers and Modifiers of Vaccine Immunity from over Ten Years of Serological Data
Fenoy, Plant, Xie, Ye, Tagade, Rumbelow, Einav
bioRxiv · 2025
Growth Cost and Transport Efficiency Tradeoffs Define Root System Optimization Across Varying Developmental Stages and Environments in Arabidopsis
Faizi, Mehta, Maida, Humphreys, Berrigan, McKee Reid, McCorkell, Tagade, Rumbelow, Showalter, Brent, Coroenne, Rigaud, Chandrasekhar, Navlakha, Martin, Pradal, Lee, Busch, Platre
bioRxiv · 2025
Automated Discovery of Patterns in T-Cell Receptor Physicochemical Signatures
Shams, Bishop, Mckee-Reid, Rumbelow
arXiv · 2025
Explaining Surface Layer Theory Departures in Marine Flux Profiles with Data-Driven Discovery
Foxabbott, Mckee-Reid, Cusick, McCorkell, Patel, Rumbelow, Rumbelow, Shams, Tagade, Hawbecker, Haupt
arXiv · 2025
Open Problems in Mechanistic Interpretability
Sharkey, Chughtai, Batson, Lindsey, Wu, Bushnaq, Goldowsky-Dill, Heimersheim, Ortega, Bloom, Biderman, Garriga-Alonso, Conmy, Nanda, Rumbelow, Wattenberg, Schoots, Miller, Michaud, Casper, Tegmark, Saunders, Bau, Todd, Geiger, Geva, Hoogland, Murfet, McGrath
AI 4 X Conference · 2025
Towards Data-Driven Scientific Discovery
Tagade, Mckee-Reid, McCorkell, Cusick, Sosa, Platre, Rumbelow, Shams
medRxiv · 2026
The Decline in Influenza Antibody Titers and Modifiers of Vaccine Immunity from over Ten Years of Serological Data
Fenoy, Plant, Xie, Ye, Tagade, Rumbelow, Einav
Pricing
Free for public data. Flexible for everything else.
Public analyses are free. For private data and deeper analysis, choose a plan that suits you.
Explorer
/month
For open science.
10 credits/mo
- +
Unlimited public analyses (data and reports published)
- +
10 credits/month for private analyses (no roll over)
- +
Additional credits available to purchase
- +
Standard processing queue
Researcher
/month
For individual researchers with proprietary data.
50 credits/mo (which roll over)
- +
Unlimited public analyses (data and reports published)
- +
50 credits/month for private analysis (which roll over)
- +
Additional credits available to purchase
- +
Deep analysis for more comprehensive pattern search
- +
Priority processing queue
- +
Email support
Most popular
Team
/month
For research teams with proprietary data.
200 credits/mo (which roll over)
- +
Unlimited public analyses (data and reports published)
- +
200 credits/month for private analysis (which roll over)
- +
Additional credits available to purchase
- +
Deep analysis for more comprehensive pattern search
- +
Highest priority processing
- +
Priority email support
- +
Up to 5 seats
Enterprise
For discovery at scale, dedicated compute, and custom integrations.
Unlimited credits
- +
Everything in Team, plus:
- +
Dedicated compute
- +
Unlimited seats
- +
Dedicated support
API
Built for agents and developers.
Faster and cheaper than prompting for data analysis — and finds patterns that your agent would miss. Run Discovery Engine via API, Python SDK, or MCP. Skills included.
Python SDK
from discovery import Engine
engine = Engine(api_key="disco_...")
result = await engine.discover(
file="data.csv",
target_column="outcome",
)
for p in result.patterns:
if p.novelty_type == "novel":
print(p.description)Get started
Your data has more to tell you.
Upload a dataset and get ranked, validated discoveries in minutes. Free for public analyses — no credit card required.
Try Discovery EngineWhy not just use an LLM?
Language models inherit our assumptions.
Discovery Engine is systematic and data-first.
Like humans, LLMs only find patterns they can hypothesise in the first place – and the literature that informs those hypotheses is full of biases, errors, and unreplicable findings. This means that most of the space of possible discoveries remains unexplored. By contrast, Discovery Engine finds patterns systematically, without assumptions – and so surfaces insights that would otherwise remain hidden.
Language is lossy.
Language is a lossy abstraction over data, and valuable nuance is lost in aggregation. Scientific papers are an incomplete representation of the underlying observations. Discovery Engine finds patterns directly in the data, disregarding scientific narrative and the pressure to publish. It finds raw patterns in the numbers, not the story in the paper.
The pattern discovery API that agents call.
Discovery Engine gives AI agents a capability they can't replicate with prompting and pandas: validated, novel pattern discovery — interactions, thresholds, and subgroup effects — without requiring prior hypotheses. One API call, structured results, citations included.
FAQ
Common questions.
What's the difference between standard and deep analysis?
Standard analysis finds most patterns — and is powerful enough for novel discoveries. Deep analysis (available on paid plans) runs a more exhaustive process, finding more patterns and often surfacing further novel relationships.
What's the difference between public and private?
Public datasets and their results are visible to all users — great for open science and academic work. Private datasets and reports are only visible to you and your team, ideal for proprietary or pre-publication data.
What's a credit?
Credits are used for private analyses. Cost scales with dataset size — a typical 10K-row dataset uses 1–3 credits, while larger datasets use more. Public analyses do not require credits.
Can I buy more credits?
Yes. All users can purchase additional credits for private analyses at $1 per credit. Purchased credits never expire.
What kind of data is supported?
We currently support tabular data up to 1GB, in CSV, TSV, Excel (.xlsx), JSON, Parquet, ARFF, and Feather formats, with timeseries and image support coming soon. For larger datasets or other modalities, please contact us.
How long does an analysis take?
Most analyses complete in minutes to hours, depending on dataset size. Public analyses and free plans have lower priority in the queue, which may result in long wait times to begin processing when the engine is busy. Our paid plans offer priority processing with no wait time.
Can AI agents use Discovery Engine?
Yes. Discovery Engine is available as a Python SDK, MCP server, and REST API. Agents can sign up, manage billing, and run analyses entirely programmatically. The SDK returns structured results that agents can reason over directly, plus a shareable report URL for the human.
Talk to us
Have a dataset in mind? Let's find what's hiding in it.
Whether you're exploring public data or running enterprise-scale discovery, we'd love to hear from you.
Contact
Get in touch with our team.