benchmark explorer
token counter
sort

This explorer shows how the bundled token counters handle each language. It is curated, not exhaustive; listed model names are examples, not a complete compatibility map.

language script chars/token fertility efficiency (rtc)
benchmark command

The benchmark shown here has been generated with the following command:

$ uv run mothertoken benchmark run \
--languages eng_Latn,fra_Latn,spa_Latn,por_Latn,deu_Latn,arb_Arab,cmn_Hans,jpn_Jpan,tha_Thai,hin_Deva,kor_Hang,tur_Latn,ukr_Cyrl,vie_Latn,swh_Latn \
--models gpt-4o,gpt-4,qwen3,mistral,qwen2.5,deepseek-v3,gpt-oss,gpt2,gpt-3,codex,codex-edit,opt,tinyllama,pythia,bert-base-uncased,roberta-base,xlm-roberta-base,distilbert-base-uncased \
--output src/mothertoken/data/default_benchmark.json