Your AI tokens are feeding something.
Greedy tiktoken-like tokenizer with embedded vocabulary (cl100k-base approximator)
TokenGeeX is an efficient tokenizer for code based on UnigramLM and TokenMonster.