Build your own vocabulary from application-specific corpus using Byte pair encoding (BPE) algorithm.
Isomorphic edge tokenizer + lazy detokenizer for the Codec binary transport protocol. Works in browsers, Node 18+, and edge runtimes.
Codec-aware browser LLM runtime. Wraps the patched wdunn001/web-llm fork (`stream_format: "raw"`) so a local WebGPU engine emits raw token-ID CodecFrames byte-identically to what vLLM / sglang / llama.cpp produce over HTTP. Host does no tokenize/detokeniz
JS bindings for the ggml library.
A blazing fast Byte Pair Encoding (BPE) tokenizer library with Python bindings
Text processing module for SciRS2 (scirs2-text)
Symbolic projection and embeddings
a minimal byte-pair encoding tokenizer implementation
Isomorphic tokenizer + detokenizer for the Codec binary transport protocol — for Rust. Decodes streaming token IDs from Codec-compliant servers (vLLM, SGLang) and encodes text into IDs for the bidirectional path.
Large language model inference primitives for OxiCUDA: BPE tokenizer, transformer layers with KV cache, GPT-2 and LLaMA architectures — pure Rust, zero CUDA SDK dependency.
Scivex — Tokenization, embeddings, and text processing
Natural language processing utilities for ToRSh deep learning framework