Babelnet
BabelNet®, the largest multilingual encyclopedic dictionary
https://www.babelnet.org/downloads https://github.com/topics/babelnet
Information about Babelnet
Description from: https://www.babelnet.org/about
BabelNet is an innovative multilingual encyclopedic dictionary, with wide lexicographic and encyclopedic coverage of terms, and a semantic network/ontology which connects concepts and named entities in a very large network of semantic relations, made up of about 22 million entries. Conceived within the Sapienza NLP Group, engineered and maintained by Babelscape, BabelNet follows the WordNet model based on the notion of synset (for synonym set), but extends it to contain multilingual lexicalizations. Each BabelNet synset represents a given meaning and contains all the synonyms which express that meaning in a range of different languages.
BabelNet 5.2 covers 520 languages and is obtained from the automatic integration of:
- WordNet, the most popular computational lexicon of English (version 3.0).
- Open English WordNet, a fork of the Princeton Wordnet developed under an open source methodology (November 2021 release).
- Wikipedia, the largest collaborative multilingual Web encyclopedia (October 2022 dump).
- OmegaWiki, a large collaborative multilingual dictionary (January 2017 dump).
- Wiktionary, a collaborative project to produce a free-content multilingual dictionary (October 2022 dump).
- Wikidata, a free knowledge base that can be read and edited by humans and machines alike (October 2022 dump).
- GeoNames, a free geographical database covering all countries and containing over eight million placenames (October 2020 dump).
- ImageNet, an image database organized according to the WordNet hierarchy (2011 release).
- Open Multilingual WordNet, a collection of wordnets available in different languages (January 2021): Albanet, Arabic WordNet (AWN v2), BulTreeBank WordNet (BTB-WN), Chinese Open WordNet, Chinese WordNet (Taiwan), Croatian WordNet, DanNet, Greek WordNet, FinnWordNet, Hebrew WordNet, IceWordNet, ItalWordNet, Japanese WordNet, Lithuanian WordNet, Multilingual Central Repository, MultiWordNet, Norwegian WordNet, Open Dutch WordNet, OpenWN-PT, Princeton WordNet, Persian WordNet, plWordNet, Romanian WordNet, Slovak WordNet, sloWNet, Swedish (SALDO), Thai WordNet, WOLF (WordNet Libre du Français), WoNeF, WordNet Bahasa.
- BabelPic, a large collection of non-concrete pictures.
- VerbAtlas, the largest language-independent verb predicate and role resource.
- HeTOP Q-Codes, a large multilingual health-related lexicon.
- Translations obtained from sense-annotated sentences.
BabelNet is linked to different resources and applications from the Sapienza NLP group:
- VerbAtlas: a large multilingual verb predicate and role repository.
- InVeRo: intelligible verbs and roles produced by a state-of-the-art neural Semantic Role Labeling system.
- Train-O-Matic: the first large scale silver data creation approach to multilingual Word Sense Disambiguation.
- MuLaN: silver data creation for Word Sense Disambiguation by means of multilingual label propagation.
- OneSeC, SensEmBERT and ARES: latent Transformer-based sense representations which achieve state-of-the-art performance in multilingual Word Sense Disambiguation.
- Conception: human-intelligibile multilingual representations of BabelNet synsets.
- SyntagNet: a large collection of disambiguated free word associations and collocations.
- SyntagRank: a SyntagNet- and BabelNet-based multilingual word sense disambiguation system.
- Babelfy: a multilingual disambiguation and entity linking system.
- Wikipedia Bitaxonomy: a state-of-the-art taxonomy of Wikipedia pages aligned to a taxonomy of Wikipedia categories.
BabelNet is provided as a stand-alone resource with its Java and Python APIs, a SPARQL endpoint and a Linked Data interface as part of the Linguistic Linked Open Data Cloud (LLOD Cloud) cloud.
Description from: http://babelnet.org/rdf/page/ BabelNet is both a multilingual encyclopedic dictionary, with lexicographic and encyclopedic coverage of terms, and an ontology which connects concepts and named entities in a very large network of semantic relations, made up of about 22 millions of nodes, called Babel synsets. Each Babel synset represents a given meaning and contains all the synonyms which in different languages express that meaning. BabelNet is made available under the BabelNet Non-Commercial license. The different resources from which BabelNet originates are made available under different licenses, as follows: WordNet: https://wordnet.princeton.edu/license-and-commercial-use, Open English WordNet: http://creativecommons.org/licenses/by/4.0/, Wikipedia: https://creativecommons.org/licenses/by-sa/3.0/, Wiktionary: https://creativecommons.org/licenses/by-sa/3.0/, OmegaWiki: https://creativecommons.org/publicdomain/zero/1.0/deed.en, Open Multilingual WordNet, a collection of wordnets available in different languages: Albanet, Arabic WordNet (AWN v2), BulTreeBank WordNet (BTB-WN), Chinese Open WordNet, Chinese WordNet (Taiwan), Croatian WordNet, DanNet, Greek WordNet, FinnWordNet, Hebrew WordNet, IceWordNet, ItalWordNet, Japanese WordNet, Lithuanian WordNet, Multilingual Central Repository, MultiWordNet, Norwegian WordNet, Open Dutch WordNet, OpenWN-PT, Princeton WordNet, Persian WordNet, plWordNet, Romanian WordNet, Slovak WordNet, sloWNet, Swedish (SALDO), Thai WordNet, WOLF (WordNet Libre du Français), WoNeF, WordNet Bahasa, released in different licenses according to the language, as indicated at https://babelnet.org/licenses/RESOURCE_LICENSES.txt. When applicable, specific license rights are specified through the property dcterms:license. Please make sure of using data in compliance with their respective licenses.