Build your own vocabulary from application-specific corpus using Byte pair encoding (BPE) algorithm.
A pure JavaScript implementation of a BPE tokenizer (Encoder/Decoder) for GPT-2 / GPT-3 / GPT-4 and other OpenAI models
BPE tokenizer used for the NovelAI frontend.
WASM bindings for the tiktoken BPE tokenizer
BPE tokenizer data files for LLMs (o200k_base, cl100k_base, DeepSeek V3/V2)
Offline BPE tokenizer for OpenAI, Anthropic, and Gemini — zero dependencies
A pure JavaScript implementation of a BPE tokenizer (Encoder/Decoder) for GPT-2 / GPT-3 / GPT-4 / Claude Instant / Claude 2
TypeScript + Rust Native machine learning library. Matrix ops, layers (Dense, Embedding, RNN, LSTM, GRU, MultiHeadAttention, etc), models (Sequential, Transformers), dan BPE tokenizer.
[](https://github.com/botisan-ai/gpt3-tokenizer/actions/workflows/main.yml) [](https://www.npmjs.com/
A Simple Byte-Pair Encoding (BPE) tokenizer built from scratch.
Tokenize CSS
TypeScript version of PGN Tokenizer, a Byte Pair Encoding (BPE) tokenizer for Chess Portable Game Notiation (PGN).
A promise based streaming tokenizer
Tokenized zip support
Tokenizer for OpenAI large language models.
TypeScript definition for strtok3 token
Algorithms to help you parse CSS from an array of tokens.
JS tokenizer for LLaMA-based LLMs
A Simple Byte-Pair Encoding (BPE) tokenizer built from scratch.
Parses and stringifies CSS selectors
Solve CSS math expressions
Multi-arch builds of HuggingFace tokenizers
A tokenzier for Sass' SCSS syntax
Simple HTML Tokenizer is a lightweight JavaScript library that can be used to tokenize the kind of HTML normally found in templates.
A BPE Tokenizer library.
BBA Chain Program Executor Token
A simple BPE tokenizer for Rust
An unofficial Ruby wrapper for Tiktoken, a BPE tokenizer written by and used by OpenAI. It can be used to count the number of tokens in text before sending it to OpenAI APIs.
A pure Ruby implementation of OpenAI's tiktoken library for BPE tokenization
An unofficial Ruby wrapper for Tiktoken, a BPE tokenizer written by and used by OpenAI. It can be used to count the number of tokens in text before sending it to OpenAI APIs. This is a fork of tiktoken_ruby by IAPark, which has been cross-compiled for multiple platforms. This way compilation with Rust extensions doesn't need to happen wherever you are deploying it.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.