Our Methodology
Etymology is part science, part detective work. Here is how we approach it β what our entries contain, how they are made, and where our limits lie.
How Entries Are Created
Each etymology entry is generated with the assistance of large language models, then structured into a consistent format: definition, language of origin, etymological journey, cognates, proto-root reconstruction, and β for many words β a long-form article exploring the wordβs full history.
Generation follows a multi-pass pipeline. Early passes establish core data (definition, origin, journey). Later passes add depth: cognates across language families, proto-root connections, cultural context, and scholarly notes. Each pass is tracked so entries can be revisited and improved.
Sources Consulted
Our entries draw on the established body of etymological scholarship. The language models used in generation have been trained on β and we cross-reference against β sources including:
- β’Proto-Indo-European reconstructions from comparative linguistics (Pokorny, Rix, LIV)
- β’Douglas Harper's Online Etymology Dictionary (etymonline.com)
- β’Wiktionary's crowd-sourced etymological annotations
- β’The Oxford English Dictionary, where attestation dates and historical citations are available
- β’Calvert Watkins' American Heritage Dictionary of Indo-European Roots
- β’Language-family-specific references for Semitic, Sino-Tibetan, and other non-IE origins
We do not claim to replace these references. We aim to make their insights more accessible and interconnected.
Confidence Levels
Not all etymologies carry the same degree of certainty. We assign a confidence level to each entry to help you gauge how settled the scholarship is.
Established scholarly consensus. The etymological path is well-attested in historical records, with broad agreement among major references.
e.g., "democracy" from Greek demokratia (demos + kratos)
Plausible reconstruction with some scholarly debate. The general trajectory is accepted, but specific details or intermediate forms may be contested.
e.g., "dog" β Old English docga, ultimate origin debated
Multiple competing theories exist. We present the main hypotheses without endorsing one over another, and flag the entry as disputed.
e.g., "assassin" β Arabic hashishiyyin theory vs. alternatives
What AI Does (and Does Not Do)
AI generates structured etymological data
- βDefinitions and summaries drawn from its training data
- βEtymological journeys tracing a word through historical languages
- βCognate lists across related language families
- βLong-form articles exploring cultural and linguistic context
- βProto-root reconstructions based on established comparative methods
AI does not
- βInvent proto-roots or fabricate reconstructions not found in scholarship
- βManufacture attestation dates or historical citations
- βPresent speculation as fact β uncertain claims are flagged
- βReplace peer-reviewed etymological research
Versioning & Continuous Improvement
Entries are not static. Each word carries an internal version field (etv) that tracks which generation pass it has received. As our models, prompts, and reference data improve, entries are revisited and upgraded.
This means an entry you read today may be richer tomorrow β with additional cognates, a more nuanced journey, or a newly written article. Version history ensures nothing is silently lost.
Limitations & Honest Caveats
Proto-language reconstructions are hypotheses, not facts. Forms marked with an asterisk (e.g., *werdho-) are scholarly reconstructions β educated inferences from comparative evidence, not attested words found in written records.
Where scholars disagree, we aim to represent the debate rather than pick a side. Some etymologies are genuinely uncertain, and we believe acknowledging that uncertainty is more honest than presenting a false consensus.
AI-generated content can contain errors. We review and improve entries continuously, but we encourage you to cross-reference with primary sources β especially for academic or professional use.
Report an Error
Found something wrong? We take corrections seriously. Etymology is a living field, and community input makes entries better.
Send corrections or suggestions to [email protected]
Please include the word in question, what you believe is incorrect, and β if possible β a reference supporting the correction.