Python package for Natural Language Processing (NLP), focused on low-resource languages spoken in Mexico.
This is a project of Comunidad Elotl.
Developed by:
- Paul Aguilar @penserbjorne, [email protected]
 - Robert Pugh @Lguyogiro, [email protected]
 - Diego Barriga @umoqnier, [email protected]
 
Requiere python>=3.11
- Development Status 
Beta. Read Classifiers - pip package: elotl
 - GitHub repository: ElotlMX/py-elotl
 
pip install elotlgit clone https://github.com/ElotlMX/py-elotl.git
cd py-elotl
pip install -e .import elotl.corpusprint("Name\t\tDescription")
list_of_corpus = elotl.corpus.list_of_corpus()
for row in list_of_corpus:
    print(row)Output:
Name		Description
['axolotl', 'Is a Spanish-Nahuatl parallel corpus']
['tsunkua', 'Is a Spanish-Otomí parallel corpus']
['kolo', 'Is a Spanish-Mixteco parallel corpus']If a non-existent corpus is requested, a value of 0 is returned.
axolotl = elotl.corpus.load('axolotlr')
if axolotl == 0:
    print("The name entered does not correspond to any corpus")If an existing corpus is entered, a list is returned.
axolotl = elotl.corpus.load('axolotl')
print(axolotl[0])[
    'Y así, cuando hizo su ofrenda de fuego, se sienta delante de los demás y una persona se queda junto a él.',
    'Auh in ye yuhqui in on tlenamacac niman ye ic teixpan on motlalia ce tlacatl itech mocaua.',
    'Classical Nahuatl',
    'Vida económica de Tenochtitlan',
    'nci'
]Each element of the list has four indices:
- non_original_language (l1)
 - original_language (l2)
 - variant
 - document_name
 - iso lang (optional)
 
tsunkua = elotl.corpus.load('tsunkua')
  for row in tsunkua:
      print(row[0]) # language 1
      print(row[1]) # language 2
      print(row[2]) # variant
      print(row[3]) # documentUna vez una señora se emborrachó
nándi na ra t'u̱xú bintí
Otomí del Estado de México (ots)
El otomí de toluca, Yolanda Lastra
The following structure is a reference. As the package grows it will be better documented.
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── dist
├── docs
├── elotl                           Top-level package
    ├── corpora                     Here are the corpus data
    ├── corpus                      Subpackage to load corpus
    ├── huave                       Huave language subpackage
        └── orthography.py          Module to normalyze huave orthography and phonemas
    ├── __init__.py                 Initialize the package
    ├── nahuatl                     Nahuatl language subpackage
        └── orthography.py          Module to normalyze nahuatl orthography and phonemas
    ├── otomi                       Otomi language subpackage
        └── orthography.py          Module to normalyze otomi orthography and phonemas
    ├── __pycache__
    └── utils                       Subpackage with common functions and files
        └── fst                     Finite State Transducer functions
            └── att                 Module with static .att files
├── LICENSE
├── Makefile
├── MANIFEST.in
├── pyproject.toml
├── README.md
└── tests
poetry env use 3.x
poetry shell
make allWhere 3.x is your local python version. Check managing environments with poetry
Build the FSTs with make.
make fstpoetry env use 3.x
poetry shellpython -m pip install --upgrade pip
poetry buildpython -m pip install -e .poetry publishRemember to configure your PyPi credentials