Skip to content

Links

list of language: https://www.ethnologue.com/ stats that there is 7,168 living languages. https://iso639-3.sil.org/ they have a github location: https://github.com/orgs/sillsdev/repositories which seems to have a lot of resources that may be useful for producing new language resources.... has a lot of NLP resources in there also...

https://worldgeodatasets.com/language/index.html

https://aclanthology.org/P19-1310/ https://github.com/flairNLP/flair https://data.linguistik.de/en/ https://github.com/CLLKazan/MathSearch https://nlp.stanford.edu/projects/glove/ https://github.com/meitinger/GraphSPARQL https://github.com/uduvudu/uduvudu

https://en.wikipedia.org/wiki/ASCII

https://en.wikipedia.org/wiki/List_of_Unicode_characters#

https://www.visualcapitalist.com/a-world-of-languages/

https://languageplayer.io/language-map/ is interesting.. but it did get stuck

https://web.archive.org/web/20080111142718/https://www2.ling.su.se/staff/ljuba/maps.html https://activehistory.ca/2016/07/visualizing-the-past-mapping-gis-and-teaching-historical-consciousness/

UTF-32 https://en.wikipedia.org/wiki/UTF-32 https://book.huihoo.com/creating-applications-with-mozilla/mozilla-chp-11-sect-6.html https://javarevisited.blogspot.com/2015/02/difference-between-utf-8-utf-16-and-utf.html

put simply, it appears the ability to encode any form of character requires UTF-32.

https://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings RDF is in UTF-8

UTF-8 is between 1-4 bytes, UTF-32 is 4 bytes fixed.

https://pkg.go.dev/golang.org/x/text/encoding/unicode/utf32 Please note that support for UTF-32 is discouraged as it is a rare and inefficient encoding, unfit for use as an interchange format. For use on the web, the W3C strongly discourages its use (https://www.w3.org/TR/html5/document-metadata.html#charset) while WHATWG directly prohibits supporting it (https://html.spec.whatwg.org/multipage/syntax.html#character-encodings).

UTF-16 https://en.wikipedia.org/wiki/UTF-16

https://en.wikipedia.org/wiki/Plane_(Unicode)#Basic_Multilingual_Plane

UTF-8 only Jupyter Notebooks https://github.com/jupyterlab/jupyterlab/issues/5451

JSON - Kinda? https://www.rfc-editor.org/rfc/rfc7159.html

https://github.com/simdutf/simdutf

Edit this page
Last updated on 3/9/2023