Skip to content
On this page

Linguist List

There is clearly an enormous amount of information on this site. There is a LinguistList index code for most if not all languages.

https://linguistlist.org/ https://linguistlist.org/subject/ https://linguistlist.org/software/ https://geoling.linguistlist.org/ (on side menu select 'languages' ) Some select information has been copied from the wikipedia article about it, per below.

The LINGUIST List is a major online resource for the academic field of linguistics. It was founded by Anthony Aristar in early 1990 at the University of Western Australia,1 and is used as a reference by the National Science Foundation in the United States.2 Its main and oldest feature is the premoderated electronic mailing list, now with thousands of subscribers all over the world, where queries and their summarised results, discussions, journal table of contents, dissertation abstracts, calls for papers, book and conference announcements, software notices and other useful pieces of linguistic information are posted.

Projects

The LINGUIST List has been one of the resources for the creation of the new ISO 639-3 language identification standard (aiming to classify all known languages with an alpha-3 language code).11 While the Ethnologue was used as the resource for natural languages currently in use, Linguist List has provided the information on historic varieties, ancient languages, international auxiliary languages and constructed languages.

The LINGUIST List has also received grants for

  • the Catalogue of Endangered Languages project, a joint effort with the University of Hawai'i at Manoa to build the most reliable, up-to-date source of information on the world's endangered languages12
  • the EMELD Project, designed to build infrastructure to facilitate the preservation of endangered languages data
  • the DATA project, designed to digitise data for the Dena'ina language13
  • the LL-MAP project (defunct), designed to produce a comprehensive GIS site for language;14
  • the MultiTree project (defunct), designed to produce a complete database and tree-viewing facility to study language relationships1516
  • the AARDVARC project, designed to address the problem of not transcribed, and therefore unavailable, documentation of understudied languages by building an interdisciplinary community of linguists, anthropologists, and computer scientists to share knowledge and collaborate on the specification of a repository and suite of tools to facilitate automatic or semi-automatic transcription and analysis of audio and visual information17

The EMELD project18 was the instigator of the GOLD ontology, the furthest advanced of the current attempts to build an ontology for the morphosyntax of linguistic data.19 It has also produced a phonetics ontology, based upon Peter Ladefoged's and Ian Maddieson's The Sounds of the World's Languages.

Some projects emerged from funded or internal activities at LINGUIST List:

  • GeoLing, a GIS-based information service that places events, jobs, institutions, conferences, and other announcements with a geo-location that are announced on LINGUIST List on the global map.20
  • AskALing, a discussion forum and question and answer platform for linguistically relevant questions and issues.21
  • GORILLA, a platform for archiving of language data, recordings, word lists, corpora, and technologies, and the development and conversion of language data to corpora and resources that bridge language documentation of low-resourced and endangered languages, and Human Language Technology (HLT) and Natural Language Processing (NLP).22

Source: https://en.wikipedia.org/wiki/Linguist_List

The site also had this thing, per below.

Global Open Resources and Information for Language and Linguistic Analysis (GORILLA)

https://gorilla.linguistlist.org/ https://bitbucket.org/dcavar/fle/src/master/ Content below sourced from their website.

An archive, repository, assembly line for language documentation data, corpora, computational linguistics, and speech and language resources.

The project brings together an interdisciplinary community of linguists, anthropologists, and computer scientists to collaborate on creating the tools for automatic or semi-automatic transcription and analysis of audio and visual information.

Software Repository

Contents

  • ELAN2split:
    Splitting ELAN annotation intervals into individual files for training Forced Aligners.
  • Espeak Language Models: Text to speech models for Burmese, Yiddish, ... for Forced Alignment using Praat.
  • The Free Linguistic Environment (FLE) Project. Grammar engineering and syntactic and semantic parsing for language documentation, NLP, and HLT.
  • TreebankParser SA, extract rules from treebanks.

Software tools

  • ELAN2split, (C) 2015-2016 by Damir Cavar:
    This is a free and open curse implementation of a segmenter for ELAN files written in C++11 and platform independent. Using this tool one can select any tier in an ELAN Annotation File and generate file-pairs of audio segment and corresponding transcription all annotated time intervals. These segments of audio and transcription can be used to train common Forced Alignment tools, e.g. the Penn Forced Aligner, the Prosodylab-Aligner, or or the MAUS Segmenter. A set of binaries is available on the Bitbucket project page. The code of ELAN2split is free and open source. SoX is a prerequisite.
    Please contact us, if you need a binary for your project or system.
    (back to Contents)
  • Espeak language models for Text-to-Speech, (C) 2015 by Lwin MoeAndrew LamontDamir CavarMalgorzata E. Cavar:
    This is a collection of Espeak language models for Text-to-Speech. These models were developed to serve as plugins in the Praat-based Forced Alignment implementation. The Praat-based implementation uses Espeak to generate audio from text and forced align transcriptions. We develop models for different low-resourced and endangered languages to serve in Praat-based forced alignment.
  • TreebankParser SA is a small tool to extract rules from treebanks that use the Penn Treebank notation. It generates relative or absolute frequency profiles for the extracted rules. It is part of the Free Linguistic Environment (FLE) project.
Edit this page
Last updated on 2/19/2023