{"chain":[{"channel":"barsukas","content":"The << sentence decomposition >> logic is almost done.\r\n\r\nThe design:\r\n\r\n# A sentence is entered into the database, often (but not always) in English.\r\n# It is translated by << LLM operation 1 >> into other languages: English, Chinese, French, Spanish, Liithuanian, Ukrainian, Kannada, Bengali. (<green> all PIE, except Chinese.  This is unintentional but welcome.  It remains within the areas of linguistics where I am comfortable.)\r\n# We check the database for word matches.  This is used to create a list of << candidate lemmas >>.  We avoid the need to use LLMs to do the word-meaning resolution problem this way.  The \"correct\" meaning will be present in most/all languages (<red> not all sentence translations will have the same lemmas.  << I like bread >> and << Man patinka duona (<green> Lithuanian - literally translated as \"Bread is pleasing to me\") >> are the same sentence without a 1-1 lemma match)\r\n# Once we get the list of << candidate lemmas >>, we ask the LLM to do the << sentence decomposition >>.\r\n\r\nFor a sentence, the decomposition identifies:\r\n# What specific lemma / sense-of-meaning of a word is in use.  Is it << to lose a race >> or << to lose one's keys >>?  A river-bank or a financial bank?\r\n# What tense is the word in?  Present/past?  Is it a conjugated form?  This should align with the lemma's information for the grammatical form, but there is no requirement for it to do so.\r\n\r\n----\r\n\r\nWe do not yet have a full \"sentence diagram\" out of the words.  The traditional [[Reed-Kellogg sentence diagram]] is uninteresting -- it only applies to English, is outdated, and remains confusing.  However, the data missing to be able to generate this will be relevant later on.","created_at":"2026-05-18T19:05:28.768910","id":800,"is_target":false,"parent_id":null,"processed_content":"<p>The <span class=\"literal-text\">sentence decomposition</span> logic is almost done.\r</p>\n<p>The design:\r</p>\n<ul>\n<li class=\"number-list\"> A sentence is entered into the database, often (but not always) in English.\r</li>\n<li class=\"number-list\"> It is translated by <span class=\"literal-text\">LLM operation 1</span> into other languages: English, Chinese, French, Spanish, Liithuanian, Ukrainian, Kannada, Bengali. <span class=\"colorblock color-green\"><span class=\"sigil\">\u2699\ufe0f</span><span class=\"colortext-content\"> all PIE, except Chinese.  This is unintentional but welcome.  It remains within the areas of linguistics where I am comfortable.</span></span>\r</li>\n<li class=\"number-list\"> We check the database for word matches.  This is used to create a list of <span class=\"literal-text\">candidate lemmas</span>.  We avoid the need to use LLMs to do the word-meaning resolution problem this way.  The \"correct\" meaning will be present in most/all languages <span class=\"colorblock color-red\"><span class=\"sigil\">\ud83d\udca1</span><span class=\"colortext-content\"> not all sentence translations will have the same lemmas.  <span class=\"literal-text\">I like bread</span> and <span class=\"literal-text\">Man patinka duona <span class=\"colorblock color-green\"><span class=\"sigil\">\u2699\ufe0f</span><span class=\"colortext-content\"> Lithuanian - literally translated as \"Bread is pleasing to me\"</span></span></span> are the same sentence without a 1-1 lemma match</span></span>\r</li>\n<li class=\"number-list\"> Once we get the list of <span class=\"literal-text\">candidate lemmas</span>, we ask the LLM to do the <span class=\"literal-text\">sentence decomposition</span>.\r</li>\n</ul>\n<p>For a sentence, the decomposition identifies:\r</p>\n<ul>\n<li class=\"number-list\"> What specific lemma / sense-of-meaning of a word is in use.  Is it <span class=\"literal-text\">to lose a race</span> or <span class=\"literal-text\">to lose one's keys</span>?  A river-bank or a financial bank?\r</li>\n<li class=\"number-list\"> What tense is the word in?  Present/past?  Is it a conjugated form?  This should align with the lemma's information for the grammatical form, but there is no requirement for it to do so.\r</li>\n</ul>\n<hr class=\"section-break\" />\n<p>We do not yet have a full \"sentence diagram\" out of the words.  The traditional <a href=\"https://en.wikipedia.org/wiki/Reed-Kellogg_sentence_diagram\" class=\"wikilink\" target=\"_blank\">Reed-Kellogg sentence diagram</a> is uninteresting -- it only applies to English, is outdated, and remains confusing.  However, the data missing to be able to generate this will be relevant later on.</p>","subject":"Sentence Decomposition"},{"channel":"barsukas","content":"These tasks are one end of the spectrum of \"language model\" tasks.\r\n\r\nA \"language model\" takes language and does operations on it.  Not *thought*.  The cross-language translation and \"sentence decomposition\" rely on a simpler understanding of knowledge. (<blue> it is much easier to say \"I can think of a castle\" than it is to build one)\r\n\r\n----\r\n\r\nWe do need some element of the second type of model.\r\n\r\nWhich is going to be an LLM/software *centaur* version.\r\n\r\nthe LLM here does need to be able to do *thought* tasks.  not just language-model.\r\n\r\n<xantham> there is a reason we don't put a dictionary in charge of the country.","created_at":"2026-05-18T19:45:32.027658","id":801,"is_target":true,"parent_id":800,"processed_content":"<p>These tasks are one end of the spectrum of \"language model\" tasks.\r</p>\n<p>A \"language model\" takes language and does operations on it.  Not <em>thought</em>.  The cross-language translation and \"sentence decomposition\" rely on a simpler understanding of knowledge. <span class=\"colorblock color-blue\"><span class=\"sigil\">\u2728</span><span class=\"colortext-content\"> it is much easier to say \"I can think of a castle\" than it is to build one</span></span>\r</p>\n<hr class=\"section-break\" />\n<p>We do need some element of the second type of model.\r</p>\n<p>Which is going to be an LLM/software <em>centaur</em> version.\r</p>\n<p>the LLM here does need to be able to do <em>thought</em> tasks.  not just language-model.\r</p>\n<p><span class=\"colorblock color-xantham\"><span class=\"sigil\">\ud83d\udd25</span><span class=\"colortext-content\"> there is a reason we don't put a dictionary in charge of the country.</span></span></p>","subject":"Sentence Decomposition (part 2)"}]}