GPT-style BPE tokenizer trainer - WASM build for Node.js
turbotoken monorepo scripts and JS wrapper
A BPE (Byte Pair Encoding) tokenizer written in Rust with Python bindings
A pattern matching library for BPE tokenization, intended to replace regex-based approaches.
Nanochat in Rust
HPC Rust LLM Tokenizer Library