Javascript text tokenizer that is easy to use and compose
Javascript text tokenizer that is easy to use and compose
Efficiently modify strings containing ANSI escape codes
Transform stream that tokenizes CSS
Simple HTML Tokenizer is a lightweight JavaScript library that can be used to tokenize the kind of HTML normally found in templates.
Tokenize a string into an array of string parts and format identifier objects.
A tokenzier for Sass' SCSS syntax
Library to tokenize text to paragraphs, sentences, subsentences and words
transform stream to tokenize html
Provide a high level wrapper for kuromoji.js
Tokenize a string.
Estimate the number of tokens for Gemini models
Parse CSS color values
Tokenize CSS
tokenize a string that includes ansi code
Tokenize a text into sentences
Small library that provides functions to tokenize a string into an array of words with or without punctuation
ProseMirror Markdown integration
Uses snapdragon to tokenize a single JavaScript block comment into an object, with description, tags, and code example sections that can be passed to any other comment parsers for further parsing.
Tokenize Excel formulas
Tokenize a string into words and whitespace tokens
The lexer for Materialize's SQL dialect, with wasm build targets.
Multilingual tokenizer that automatically tags each token with its type
Skyflow SDK for Node.js
High speed text tokenization for Ruby
A simple multilingual tokenizer for NLP tasks. This tool provides a CLI and a library for linguistic tokenization which is an anavoidable step for many HLT (human language technology) tasks in the preprocessing phase for further syntactic, semantic and other higher level processing goals. Use it for tokenization of German, English and French texts.
Ruby port of TinySegmenter.js for tokenizing Japanese text. Uses a Naive Bayes model that has been trained using the RWCP corpus and optimized using L1-norm regularization. The resultant model is quite compact, yet has a 95% accuracy rate.
Textoken is a Ruby library for text tokenization. This gem extracts words from text with many customizations. It can be used in many fields like Web Crawling and Natural Language Processing.
Tokkens makes it easy to apply a vector space model to text documents, targeted towards with machine learning. It provides a mapping between numbers and tokens (strings)
High performance unsupervised text tokenization for Ruby
A simple multilingual tokenizer for NLP tasks. This tool provides a CLI and a library for linguistic tokenization which is an anavoidable step for many HLT (human language technology) tasks in the preprocessing phase for further syntactic, semantic and other higher level processing goals. Use it for tokenization of German, English and French texts.
A simple multilingual tokenizer for NLP tasks. This tool provides a CLI and a library for linguistic tokenization which is an anavoidable step for many HLT (human language technology) tasks in the preprocessing phase for further syntactic, semantic and other higher level processing goals. Use it for tokenization of German, English and French texts.
A simple multilingual tokenizer for NLP tasks. This tool provides a CLI and a library for linguistic tokenization which is an anavoidable step for many HLT (human language technology) tasks in the preprocessing phase for further syntactic, semantic and other higher level processing goals. Use it for tokenization of German, English and French texts.
Separa splits chunks of text into tokens to be indexed by Busca, the simple redis search
An unofficial Ruby wrapper for Tiktoken, a BPE tokenizer written by and used by OpenAI. It can be used to count the number of tokens in text before sending it to OpenAI APIs.
🪙 Token::Resolver provides configurable PEG-based (parslet) parsing and resolution of structured tokens (e.g., {KJ|GEM_NAME}) in arbitrary text. Useful for template ETL pipelines where tokens in template files must be resolved before format-specific merging.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.