The sentence decomposition logic is almost done.

The design:

  • A sentence is entered into the database, often (but not always) in English.
  • It is translated by LLM operation 1 into other languages: English, Chinese, French, Spanish, Liithuanian, Ukrainian, Kannada, Bengali. ⚙️ all PIE, except Chinese. This is unintentional but welcome. It remains within the areas of linguistics where I am comfortable.
  • We check the database for word matches. This is used to create a list of candidate lemmas. We avoid the need to use LLMs to do the word-meaning resolution problem this way. The "correct" meaning will be present in most/all languages 💡 not all sentence translations will have the same lemmas. I like bread and Man patinka duona ⚙️ Lithuanian - literally translated as "Bread is pleasing to me" are the same sentence without a 1-1 lemma match
  • Once we get the list of candidate lemmas, we ask the LLM to do the sentence decomposition.

For a sentence, the decomposition identifies:

  • What specific lemma / sense-of-meaning of a word is in use. Is it to lose a race or to lose one's keys? A river-bank or a financial bank?
  • What tense is the word in? Present/past? Is it a conjugated form? This should align with the lemma's information for the grammatical form, but there is no requirement for it to do so.

We do not yet have a full "sentence diagram" out of the words. The traditional Reed-Kellogg sentence diagram is uninteresting -- it only applies to English, is outdated, and remains confusing. However, the data missing to be able to generate this will be relevant later on.