Tokenized zip support
Tokenize CSS
A promise based streaming tokenizer
TypeScript definition for strtok3 token
Algorithms to help you parse CSS from an array of tokens.
A tokenzier for Sass' SCSS syntax
Parses and stringifies CSS selectors
A pure JavaScript implementation of a BPE tokenizer (Encoder/Decoder) for GPT-2 / GPT-3 / GPT-4 and other OpenAI models
Solve CSS math expressions
Simple HTML Tokenizer is a lightweight JavaScript library that can be used to tokenize the kind of HTML normally found in templates.
ProseMirror Markdown integration
A JavaScript library for escaping CSS strings and identifiers while generating the shortest possible ASCII-only output.
Common token types for decoding and encoding numeric and string values
tokenizer of source code for jscpd
r/w stream of glsl tokens
For ruby and ruby on rails
Tokenizes a string that represents a regular expression.
Claude tokenizer
Parse CSS media query lists.
detector of copy/paste in files
Tokenize a shell string into argv array
Tiny JavaScript tokenizer.
A faster than tiktoken tokenizer with first-class support for Vercel's AI SDK.
gemma3 tokenizer for NodeJS/Browser
RubyTokenizer is a simple language processing command-line tool. It performs low-level tokenization and returns the top 10 most frequent words in a body of text. At the moment it's only available for English texts and it segments words by filtering whitespaces, punctuation marks, parantheses and other special characters.
Fast tokenization for Ruby using HuggingFace's Rust-powered tokenizers library. Supports GPT, BERT, LLaMA, Claude, and any HuggingFace tokenizer.
A pure ruby implementation of the RFC 7519 OAuth JSON Web Token (JWT) standard.
High speed text tokenization for Ruby
Platform Agnostic SEcurity TOkens are a specification for secure stateless tokens. This is an implementation of PASETO tokens, and the PASERK key management extensions, in ruby, with runtime static type checking provided by Sorbet.
Fast state-of-the-art tokenizers for Ruby
Akamai-EdgeAuth is Akamai Edge Authorization Token for Ruby 2.0+
Ruby port of TinySegmenter.js for tokenizing Japanese text. Uses a Naive Bayes model that has been trained using the RWCP corpus and optimized using L1-norm regularization. The resultant model is quite compact, yet has a 95% accuracy rate.
Ruby implementation of the JSON Web Token (JWT) standard, RFC 7519
Vonage JWT Generator for Ruby
Nexmo JWT Generator for Ruby
TactfulTokenizer uses a naive bayesian model train on the Brown and WSJ corpuses to provide high quality sentence tokenization.