Title: Geonames Toponym Extractor Utility Brief: Simple script for extracting ASCII toponym fields from geonames datasets Date: 1713683410 Tags: Python, Script, Programming CSS: /style.css [Link to code](https://codeberg.org/veclavtalica/geonames-extractor) Small script I used for extracting data for machine learning endeavors. Usage: ``` dataset feature_class [feature_code] [--dirty] [--filter=mask] ``` From this invokation ... ``` ./extractor.py datasets/UA.txt P PPL --filter=0123456789\"\'-\` > UA-prep.txt ``` ... it produces a newline separated list of relevant toponyms of particular kind, such as: ``` Katerynivka Vaniushkyne Svistuny Sopych Shilova Balka ``` `--filter=` option is there so that aplhabet size could be reduced for learning purposes, as there are usually quite a lot of symbols that are only found few times, which produces poor balancing. `--dirty` option reduces cases such as `Maydan (Ispas)` and `CHAYKA-Transmitter, Ring Mast 4` to `Maydan` and `CHAYKA-Transmitter`. Duplicates are also removed.