{"channel":"barsukas","content":"The << sentence decomposition >> logic is almost done.\r\n\r\nThe design:\r\n\r\n# A sentence is entered into the database, often (but not always) in English.\r\n# It is translated by << LLM operation 1 >> into other languages: English, Chinese, French, Spanish, Liithuanian, Ukrainian, Kannada, Bengali. (<green> all PIE, except Chinese.  This is unintentional but welcome.  It remains within the areas of linguistics where I am comfortable.)\r\n# We check the database for word matches.  This is used to create a list of << candidate lemmas >>.  We avoid the need to use LLMs to do the word-meaning resolution problem this way.  The \"correct\" meaning will be present in most/all languages (<red> not all sentence translations will have the same lemmas.  << I like bread >> and << Man patinka duona (<green> Lithuanian - literally translated as \"Bread is pleasing to me\") >> are the same sentence without a 1-1 lemma match)\r\n# Once we get the list of << candidate lemmas >>, we ask the LLM to do the << sentence decomposition >>.\r\n\r\nFor a sentence, the decomposition identifies:\r\n# What specific lemma / sense-of-meaning of a word is in use.  Is it << to lose a race >> or << to lose one's keys >>?  A river-bank or a financial bank?\r\n# What tense is the word in?  Present/past?  Is it a conjugated form?  This should align with the lemma's information for the grammatical form, but there is no requirement for it to do so.\r\n\r\n----\r\n\r\nWe do not yet have a full \"sentence diagram\" out of the words.  The traditional [[Reed-Kellogg sentence diagram]] is uninteresting -- it only applies to English, is outdated, and remains confusing.  However, the data missing to be able to generate this will be relevant later on.","created_at":"2026-05-18T19:05:28.768910","id":800,"llm_annotations":{},"parent_id":null,"processed_content":"<p>The <span class=\"literal-text\">sentence decomposition</span> logic is almost done.\r</p>\n<p>The design:\r</p>\n<ul>\n<li class=\"number-list\"> A sentence is entered into the database, often (but not always) in English.\r</li>\n<li class=\"number-list\"> It is translated by <span class=\"literal-text\">LLM operation 1</span> into other languages: English, Chinese, French, Spanish, Liithuanian, Ukrainian, Kannada, Bengali. <span class=\"colorblock color-green\"><span class=\"sigil\">\u2699\ufe0f</span><span class=\"colortext-content\"> all PIE, except Chinese.  This is unintentional but welcome.  It remains within the areas of linguistics where I am comfortable.</span></span>\r</li>\n<li class=\"number-list\"> We check the database for word matches.  This is used to create a list of <span class=\"literal-text\">candidate lemmas</span>.  We avoid the need to use LLMs to do the word-meaning resolution problem this way.  The \"correct\" meaning will be present in most/all languages <span class=\"colorblock color-red\"><span class=\"sigil\">\ud83d\udca1</span><span class=\"colortext-content\"> not all sentence translations will have the same lemmas.  <span class=\"literal-text\">I like bread</span> and <span class=\"literal-text\">Man patinka duona <span class=\"colorblock color-green\"><span class=\"sigil\">\u2699\ufe0f</span><span class=\"colortext-content\"> Lithuanian - literally translated as \"Bread is pleasing to me\"</span></span></span> are the same sentence without a 1-1 lemma match</span></span>\r</li>\n<li class=\"number-list\"> Once we get the list of <span class=\"literal-text\">candidate lemmas</span>, we ask the LLM to do the <span class=\"literal-text\">sentence decomposition</span>.\r</li>\n</ul>\n<p>For a sentence, the decomposition identifies:\r</p>\n<ul>\n<li class=\"number-list\"> What specific lemma / sense-of-meaning of a word is in use.  Is it <span class=\"literal-text\">to lose a race</span> or <span class=\"literal-text\">to lose one's keys</span>?  A river-bank or a financial bank?\r</li>\n<li class=\"number-list\"> What tense is the word in?  Present/past?  Is it a conjugated form?  This should align with the lemma's information for the grammatical form, but there is no requirement for it to do so.\r</li>\n</ul>\n<hr class=\"section-break\" />\n<p>We do not yet have a full \"sentence diagram\" out of the words.  The traditional <a href=\"https://en.wikipedia.org/wiki/Reed-Kellogg_sentence_diagram\" class=\"wikilink\" target=\"_blank\">Reed-Kellogg sentence diagram</a> is uninteresting -- it only applies to English, is outdated, and remains confusing.  However, the data missing to be able to generate this will be relevant later on.</p>","quotes":[],"subject":"Sentence Decomposition"}
