Our Methodology

Etymology is part science, part detective work. Here is how we approach it β€” what our entries contain, how they are made, and where our limits lie.

How Entries Are Created

Each etymology entry is generated with the assistance of large language models, then structured into a consistent format: definition, language of origin, etymological journey, cognates, proto-root reconstruction, and β€” for many words β€” a long-form article exploring the word’s full history.

Generation follows a multi-pass pipeline. Early passes establish core data (definition, origin, journey). Later passes add depth: cognates across language families, proto-root connections, cultural context, and scholarly notes. Each pass is tracked so entries can be revisited and improved.

Sources Consulted

Our entries draw on the established body of etymological scholarship. The language models used in generation have been trained on β€” and we cross-reference against β€” sources including:

  • β€’Proto-Indo-European reconstructions from comparative linguistics (Pokorny, Rix, LIV)
  • β€’Douglas Harper's Online Etymology Dictionary (etymonline.com)
  • β€’Wiktionary's crowd-sourced etymological annotations
  • β€’The Oxford English Dictionary, where attestation dates and historical citations are available
  • β€’Calvert Watkins' American Heritage Dictionary of Indo-European Roots
  • β€’Language-family-specific references for Semitic, Sino-Tibetan, and other non-IE origins

We do not claim to replace these references. We aim to make their insights more accessible and interconnected.

Confidence Levels

Not all etymologies carry the same degree of certainty. We assign a confidence level to each entry to help you gauge how settled the scholarship is.

High

Established scholarly consensus. The etymological path is well-attested in historical records, with broad agreement among major references.

e.g., "democracy" from Greek demokratia (demos + kratos)

Medium

Plausible reconstruction with some scholarly debate. The general trajectory is accepted, but specific details or intermediate forms may be contested.

e.g., "dog" β€” Old English docga, ultimate origin debated

Low / Disputed

Multiple competing theories exist. We present the main hypotheses without endorsing one over another, and flag the entry as disputed.

e.g., "assassin" β€” Arabic hashishiyyin theory vs. alternatives

What AI Does (and Does Not Do)

AI generates structured etymological data

  • βœ“Definitions and summaries drawn from its training data
  • βœ“Etymological journeys tracing a word through historical languages
  • βœ“Cognate lists across related language families
  • βœ“Long-form articles exploring cultural and linguistic context
  • βœ“Proto-root reconstructions based on established comparative methods

AI does not

  • βœ—Invent proto-roots or fabricate reconstructions not found in scholarship
  • βœ—Manufacture attestation dates or historical citations
  • βœ—Present speculation as fact β€” uncertain claims are flagged
  • βœ—Replace peer-reviewed etymological research

Versioning & Continuous Improvement

Entries are not static. Each word carries an internal version field (etv) that tracks which generation pass it has received. As our models, prompts, and reference data improve, entries are revisited and upgraded.

This means an entry you read today may be richer tomorrow β€” with additional cognates, a more nuanced journey, or a newly written article. Version history ensures nothing is silently lost.

Limitations & Honest Caveats

Proto-language reconstructions are hypotheses, not facts. Forms marked with an asterisk (e.g., *werdho-) are scholarly reconstructions β€” educated inferences from comparative evidence, not attested words found in written records.

Where scholars disagree, we aim to represent the debate rather than pick a side. Some etymologies are genuinely uncertain, and we believe acknowledging that uncertainty is more honest than presenting a false consensus.

AI-generated content can contain errors. We review and improve entries continuously, but we encourage you to cross-reference with primary sources β€” especially for academic or professional use.

Report an Error

Found something wrong? We take corrections seriously. Etymology is a living field, and community input makes entries better.

Send corrections or suggestions to [email protected]

Please include the word in question, what you believe is incorrect, and β€” if possible β€” a reference supporting the correction.