Data
The dataset
A neutral, entity-resolved graph of the wine trade. Each wine carries field-level detail with the source and confidence behind every fact. The public slice below is free; the full graph, realized-pricing time-series, and the resolution API sit behind a free, role-validated tier.
Public coverage
Entities by type
| Entity type | Count |
|---|---|
| restaurant | 20,292 |
| sku | 10,753 |
| retailer | 3,456 |
| producer | 2,872 |
| term | 107 |
| region | 107 |
| grape | 35 |
| place | 22 |
| importer | 3 |
| award | 3 |
| total | 37,650 |
Methodology
How it's built
Resolution
Sources are resolved through a deterministic cascade (normalize → block → score), with an LLM adjudicator for borderline decisions. Every field records its source tier, confidence, observation date, and provenance reference.
Privacy boundary
Public facts (producers, wines, grapes, menu listings) cross freely. Tenant-private commercial data never does. Cross-tenant pricing is published only as backward-looking aggregates, never below a k-anonymity floor — and never forward-looking.
Provenance at field level
Every Sourced<T> value carries a tier, confidence score, and observation timestamp. The API exposes the full provenance object on every field.
Provenance system
Source tiers
Every field is tagged with one of seven source tiers — a perceptually-even categorical scale so no single source visually dominates. Color is always paired with a label; it is never the sole signal.
Provenance tiers
- Verified (first-party)Confirmed directly by the tenant — the highest-confidence, first-party record.
- Tech sheetProducer-published technical document (PDF or web). High confidence, directly from source.
- Producer siteScraped from the producer's public website. Reliable but not formally verified.
- Importer siteScraped from the importer's catalog or website. One step removed from the producer.
- COLA / government recordTTB Certificate of Label Approval or other official registry. Authoritative but limited in scope.
- Inferred (LLM research)Filled by language-model inference from multiple public sources. Useful, but verify before acting.
- Manual / operatorEntered or curated by a WineGraph operator. Authoritative within its scope; check the ref.
Access
Provenance via API
Every entity response includes a provenance map — source tier and confidence per field. Field-level detail is available to free, validated-tier API keys.
GET /v1/entities/:id
Authorization: Bearer wg_live_…
{
"id": "…",
"entity_type": "sku",
"display_name": "Overnoy Ploussard 2022",
"provenance": {
"grapes": { "source": "producer_site", "confidence": 0.91 },
"farming": { "source": "tenant", "confidence": 1.00 }
}
}