Convert Chinese text to list of Chinese words
Tokenize CSS
A promise based streaming tokenizer
Tokenized zip support
TypeScript definition for strtok3 token
Algorithms to help you parse CSS from an array of tokens.
The character data used by Hanzi Writer for Japanese. This data is derived from Make Me a Hanzi and animCJK.
Parses and stringifies CSS selectors
The character data used by Hanzi Writer. This data is derived from the Make Me a Hanzi project.
Solve CSS math expressions
A pure JavaScript implementation of a BPE tokenizer (Encoder/Decoder) for GPT-2 / GPT-3 / GPT-4 and other OpenAI models
A tokenzier for Sass' SCSS syntax
Simple HTML Tokenizer is a lightweight JavaScript library that can be used to tokenize the kind of HTML normally found in templates.
ProseMirror Markdown integration
Hanzi Writer is a free and open-source javascript library for both animating simplified Chinese characters and quizzing users on character stroke order.
tokenizer of source code for jscpd
Common token types for decoding and encoding numeric and string values
r/w stream of glsl tokens
Claude tokenizer
Tokenizes a string that represents a regular expression.
Parse CSS media query lists.
detector of copy/paste in files
Tokenize a shell string into argv array
A faster than tiktoken tokenizer with first-class support for Vercel's AI SDK.