Language Models
This Folder Contains information about languages and in-turn also, the existing language models.
The formal knowledge modellings of 'natural language', in a comprehensive manner, is a fairly complex endevour. The reason why the complexities are being explored in-detail is in response to a belief or hypothesis, is that the need to get stuck into the detail will have meaningful consequential effects upon the qualities associated with the use of AI that depends upon some form of language construct. The challenge becomes producing an informatics ecosystem that is reliably able to intepret the meaning and provonance of words.
NoteAlso; folder on Alphabets
There's some placeholders; and a whole bunch of room for improvement... (its entirely incomplete). Added to the language models folder, is also some notes about computer based LargeLanguageModels, some thoughts about StemLanguageModels and perhaps importantly also - Alphabets (although i'm not entirely sure where to put Heraldry).
Whilst these resources are indeed incomplete. The practice is opening up my view on the difference between a language and knowledge model. I'm starting to see the issue of seeking to restrict a model to simply the words; without ending-up with a far more comprehensive process of, in-effect, starting the PermissiveCommons tech process; more expansively, than was otherwise considered earlier. Whilst the goals remain the same; perhaps the method needs to be adapted to ensure the focus remains, making best efforts.
Large Language models
Some of the large language models that have been identified includes; BabelNet CYC FrameNet Framester Lemon Linguistlist SUMO Wordnet
High-Level Considerations
The amount of time and effort that goes into these and other related resources, is non-trivial. Whilst the derivative output of 'sense' is going to be different, it is entirely likely that it will still need to be maintained and that some of those works is best done by subject matter experts / professionals... Yet, this is potentially an area that could lead to a powergrab for centralised control! Which absolutely unwanted. protections need to be defined to engineer mechanisms / methodologies; that act, extensively, to seek to ensure any such form of attack vector is not possible; without unnecessarily exposing the informatics system to other unwanted forms of exploitation / attack.
There is a close relationship between language and history, which is a very complex (political) field; the realities of what a person learns from locals with local knowledge, is often different to what is taught in books or told otherwise via various forms of media, etc.
It is common for written works to feature different spelling for the same name or word; but particularly in relation to names, which often relate in-turn to words; from different places, etc. There are also different intepretations, that may relate to historical debates or conflicts.
Whilst the burden with implmentation more greatly concerns history, than present or future; the consequential implication, is that the systems are intended to be used to support 'living' languages - in-effect; as people work with the model (privately) through the activities shared (socially), the models are expected to evolve in many different ways, ranging from a form of biometric language use analytics capacity (in-effect); through to improved training and various other implications far more broadly. The computational energy that is thought to be required to develop the model to a degree of stability, is thought likely - non-trivial.
The use of language is fundamentally - personal.
Techniques need to be formed to 'shard' / 'chunk' or otherwise enable streaming of resources required for ML / GANN / (etc); programs running locally; and then, both dimension the resources available to the agent; and manage the use of those resources accordingly.
There are also a series of interface related considerations that are important; particularly with respect to the development and maintenance of personal ontologies, whereby people may define their elected definition of a word; which is in-effect, their own word or definition. Trademarks often also relate to the creation of words; and perhaps these should be able to be added to the broader 'graph', in a manner that would have a computational effect of registering it to some degree... perhaps also, there should be rules about how / where that is allowed; and perhaps also, this might present an opportunity to resource a revenue stream that could in-turn help to fund the cost of the language system as a whole; or perhaps that's a bad idea, as it may be then considered to be financially motivated by brands, not humans.
Then the question becomes - where to put any micropayments associated with its use; as to support the maintainence costs (ability to pay people who usefully maintain / improve it); and this in-turn also invokes another issue whereby any incentivisation model, should ideally seek to ensure that it doesn't discourage work to add / support / improve a topic that is far less popular than another topic; and that in-turn, false information is also discouraged.
but some of that is more about the GLAM (galleries, Libraries, Archives & Museums) issues rather than vocabulary specifically; notwithstanding the intrinsic link, between the two fields.
Civilisations
With respect to mapping; https://timemaps.com/ is a rudimentry example, others are needed; but i hope it provides some sense of the requirement to figure out the movements of civilisations who are in-turn associated with stem-languages, and various meanings; that may be spoken or associated with one cultural group / civilisation, but may actually be associated with another - in relation to a place that they visited and a name for something that they were provided by the 'locals'...
Whilst unconfirmed, it appears that i've found an example of these sorts of events / situations, from the UK ~900AD or so; i imagine, that there are common nuances, and this in-turn relates to the studies that associate to learning about such things, which relates to history; and in-turn, the complexities associated with history / heritage.
Consequence
its going to have to be somewhat generative, as the ability to define the agent logic, including but not limited to human ontology, socio-sphere ontology, etc. is going to be required in-order to support the means through which the rest can be managed somehow; yet equally,
it doesn't make the underlying considerations / question / problems; go away...