How it works
From raw data to ranked insights, 100x faster.
01
Upload
Drop in a tabular dataset and select your target variable. That's it – we do the rest.
02
Analyse
Discovery Engine fits neural networks to your data, then applies interpretability methods to extract the patterns they learned. All findings are validated on hold-out data, and contextualised with existing literature.
03
Discover
You get a ranked list of statistically significant patterns, with p-values, effect sizes, evidence, and context.
Publications
bioRxiv · 2025
Growth Cost and Transport Efficiency Tradeoffs Define Root System Optimization Across Varying Developmental Stages and Environments in Arabidopsis
Faizi, Mehta, Maida, Humphreys, Berrigan, McKee Reid, McCorkell, Tagade, Rumbelow, Showalter, Brent, Coroenne, Rigaud, Chandrasekhar, Navlakha, Martin, Pradal, Lee, Busch, Platre
bioRxiv · 2025
Automated Discovery of Patterns in T-Cell Receptor Physicochemical Signatures
Shams, Bishop, Mckee-Reid, Rumbelow
arXiv · 2025
Explaining Surface Layer Theory Departures in Marine Flux Profiles with Data-Driven Discovery
Foxabbott, Mckee-Reid, Cusick, McCorkell, Patel, Rumbelow, Rumbelow, Shams, Tagade, Hawbecker, Haupt
arXiv · 2025
Open Problems in Mechanistic Interpretability
Sharkey, Chughtai, Batson, Lindsey, Wu, Bushnaq, Goldowsky-Dill, Heimersheim, Ortega, Bloom, Biderman, Garriga-Alonso, Conmy, Nanda, Rumbelow, Wattenberg, Schoots, Miller, Michaud, Casper, Tegmark, Saunders, Bau, Todd, Geiger, Geva, Hoogland, Murfet, McGrath
AI 4 X Conference · 2025
Towards Data-Driven Scientific Discovery
Tagade, Mckee-Reid, McCorkell, Cusick, Sosa, Platre, Rumbelow, Shams
medRxiv · 2026
The Decline in Influenza Antibody Titers and Modifiers of Vaccine Immunity from over Ten Years of Serological Data
Fenoy, Plant, Xie, Ye, Tagade, Rumbelow, Einav
bioRxiv · 2025
Growth Cost and Transport Efficiency Tradeoffs Define Root System Optimization Across Varying Developmental Stages and Environments in Arabidopsis
Faizi, Mehta, Maida, Humphreys, Berrigan, McKee Reid, McCorkell, Tagade, Rumbelow, Showalter, Brent, Coroenne, Rigaud, Chandrasekhar, Navlakha, Martin, Pradal, Lee, Busch, Platre
bioRxiv · 2025
Automated Discovery of Patterns in T-Cell Receptor Physicochemical Signatures
Shams, Bishop, Mckee-Reid, Rumbelow
arXiv · 2025
Explaining Surface Layer Theory Departures in Marine Flux Profiles with Data-Driven Discovery
Foxabbott, Mckee-Reid, Cusick, McCorkell, Patel, Rumbelow, Rumbelow, Shams, Tagade, Hawbecker, Haupt
arXiv · 2025
Open Problems in Mechanistic Interpretability
Sharkey, Chughtai, Batson, Lindsey, Wu, Bushnaq, Goldowsky-Dill, Heimersheim, Ortega, Bloom, Biderman, Garriga-Alonso, Conmy, Nanda, Rumbelow, Wattenberg, Schoots, Miller, Michaud, Casper, Tegmark, Saunders, Bau, Todd, Geiger, Geva, Hoogland, Murfet, McGrath
AI 4 X Conference · 2025
Towards Data-Driven Scientific Discovery
Tagade, Mckee-Reid, McCorkell, Cusick, Sosa, Platre, Rumbelow, Shams
medRxiv · 2026
The Decline in Influenza Antibody Titers and Modifiers of Vaccine Immunity from over Ten Years of Serological Data
Fenoy, Plant, Xie, Ye, Tagade, Rumbelow, Einav
Pricing
Free for public data. Flexible for everything else.
Public analyses are free. For private data and deeper analysis, choose a plan that suits you.
Explorer
/month
For open science.
10 credits/mo
- +
Unlimited public analyses (data and reports published)
- +
10 credits/month for private analyses
- +
Additional credits available to purchase
- +
Standard processing queue
Researcher
/month
For individual researchers with proprietary data.
50 credits/mo (rollover)
- +
Unlimited public analyses (data and reports published)
- +
50 credits/month for private analysis (rollover)
- +
Additional credits available to purchase
- +
Deep analysis for more comprehensive pattern search
- +
Priority processing queue
- +
Email support
Most popular
Team
/month
For research teams with proprietary data.
200 credits/mo (rollover)
- +
Unlimited public analyses (data and reports published)
- +
200 credits/month for private analysis (rollover)
- +
Additional credits available to purchase
- +
Deep analysis for more comprehensive pattern search
- +
Highest priority processing
- +
Priority email support
- +
Up to 5 seats
Enterprise
For discovery at scale, dedicated compute, and custom integrations.
Unlimited credits
- +
Everything in Team, plus:
- +
Dedicated compute
- +
Unlimited seats
- +
Dedicated support
Python SDK
Built for developers and agents.
Install our Python package, point it at your dataset, and get results programmatically. Everything in the dashboard is available via the API — ideal for pipelines and batch analysis.
from discovery import Engine
engine = Engine(api_key="your-key")
result = engine.run(
file="data.csv",
target_column="outcome",
)Get started
Your data has more to tell you.
Upload a dataset and get ranked, validated discoveries in minutes. Free for public analyses — no credit card required.
Try Discovery EngineWhy not just use an LLM?
Language models inherit our assumptions.
Discovery Engine is systematic and data-first.
Like humans, LLMs only find patterns they can hypothesise in the first place – and the literature that informs those hypotheses is full of biases, errors, and unreplicable findings. This means that most of the space of possible discoveries remains unexplored. By contrast, Discovery Engine finds patterns systematically, without assumptions – and so surfaces insights that would otherwise remain hidden.
Language is lossy.
Language is a lossy abstraction over data, and valuable nuance is lost in aggregation. Scientific papers are an incomplete representation of the underlying observations. Discovery Engine finds patterns directly in the data, disregarding scientific narrative and the pressure to publish. It finds raw patterns in the numbers, not the story in the paper.
A powerful tool for scientific agents.
Discovery Engine finds patterns in your data that LLMs alone would miss, far more efficiently than iterative, hypothesis-driven exploration – so tell your scientific agent about our API!
FAQ
Common questions.
What's the difference between standard and deep analysis?
Standard analysis finds most patterns — and is powerful enough for novel discoveries. Deep analysis (available on paid plans) runs a more exhaustive process, finding more patterns and often surfacing further novel relationships.
What's the difference between public and private?
Public datasets and their results are visible to all users — great for open science and academic work. Private datasets and reports are only visible to you and your team, ideal for proprietary or pre-publication data.
What's a credit?
Credits are used for private analyses. Cost scales with dataset size — a typical 10K-row dataset uses 1–3 credits, while larger datasets use more. Public analyses do not require credits.
Can I buy more credits?
Yes. All users can purchase additional credits for private analyses at $1 per credit. Purchased credits never expire.
What kind of data is supported?
We currently support tabular data up to 1GB, in CSV, TSV, Excel (.xlsx), JSON, Parquet, ARFF, and Feather formats, with timeseries and image support coming soon. For larger datasets or other modalities, please contact us.
How long does an analysis take?
Most analyses complete in minutes to hours, depending on dataset size. Public analyses and free plans have lower priority in the queue, which may result in long wait times to begin processing when the engine is busy. Our paid plans offer priority processing with no wait time.
Talk to us
Have a dataset in mind? Let's find what's hiding in it.
Whether you're exploring public data or running enterprise-scale discovery, we'd love to hear from you.
Contact
Get in touch with our team.