From 4f9b7101c6173b4d8a45e5e9abb8f557d2964bc1 Mon Sep 17 00:00:00 2001 From: veclav talica Date: Sun, 21 Apr 2024 12:24:46 +0500 Subject: [PATCH] add tags to old articles --- articles/neocities-cache/page.mmd | 2 +- articles/tiny-elf/page.mmd | 2 +- articles/toponym-extractor/page.mmd | 36 +++++++++++++++++++++++++++++ 3 files changed, 38 insertions(+), 2 deletions(-) create mode 100644 articles/toponym-extractor/page.mmd diff --git a/articles/neocities-cache/page.mmd b/articles/neocities-cache/page.mmd index e15c9a1..fa929b5 100644 --- a/articles/neocities-cache/page.mmd +++ b/articles/neocities-cache/page.mmd @@ -1,7 +1,7 @@ Title: Cached Neocities Uploads Brief: Making uploading of directories to Neocities less painful. Date: 1707585916 -Tags: Programming, Bash +Tags: Programming, Bash, Script CSS: /style.css Quick and dirty Bash-based sha256sum checksum solution to create stamps for later checking and rejection. diff --git a/articles/tiny-elf/page.mmd b/articles/tiny-elf/page.mmd index 6925701..e3a67c7 100644 --- a/articles/tiny-elf/page.mmd +++ b/articles/tiny-elf/page.mmd @@ -1,7 +1,7 @@ Title: Slim Summer Elf Brief: Making of minimal x86 (Linux) ELF executable. Date: 1684666702 -Tags: Programming, Linux, C +Tags: Programming, Linux, C, Bash, Linker, Low-level CSS: /style.css Code below was composed for [4mb-jam](https://itch.io/jam/4mb-jam-2023) which I didn't finish. diff --git a/articles/toponym-extractor/page.mmd b/articles/toponym-extractor/page.mmd new file mode 100644 index 0000000..ae135f9 --- /dev/null +++ b/articles/toponym-extractor/page.mmd @@ -0,0 +1,36 @@ +Title: Geonames Toponym Extractor Utility +Brief: Simple script for extracting ASCII toponym fields from geonames datasets +Date: 1713683410 +Tags: Python, Script, Programming +CSS: /style.css + +[Link to code](https://codeberg.org/veclavtalica/geonames-extractor) + +Small script I used for extracting data for machine learning endeavors. + +Usage: +``` +dataset feature_class [feature_code] [--dirty] [--filter=mask] +``` + +From this invokation ... +``` +./extractor.py datasets/UA.txt P PPL --filter=0123456789\"\'-\` > UA-prep.txt +``` + +... it produces a newline separated list of relevant toponyms of particular kind, such as: +``` +Katerynivka +Vaniushkyne +Svistuny +Sopych +Shilova Balka +``` + +`--filter=` option is there so that aplhabet size could be reduced for learning purposes, +as there are usually quite a lot of symbols that are only found few times, +which produces poor balancing. + +`--dirty` option reduces cases such as `Maydan (Ispas)` and `CHAYKA-Transmitter, Ring Mast 4` to `Maydan` and `CHAYKA-Transmitter`. + +Duplicates are also removed.