Break down text into array of words.
Efficiently modify strings containing ANSI escape codes
Transform stream that tokenizes CSS
Tokenize a string into an array of string parts and format identifier objects.
Small library that provides functions to tokenize a string into an array of words with or without punctuation
A tokenzier for Sass' SCSS syntax
transform stream to tokenize html
Ensure that no reserved words are used.
Multilingual tokenizer that automatically tags each token with its type
Simple HTML Tokenizer is a lightweight JavaScript library that can be used to tokenize the kind of HTML normally found in templates.
Tokenize a string.
Parse CSS color values
Tokenize CSS
Transform a string between `camelCase`, `PascalCase`, `Capital Case`, `snake_case`, `kebab-case`, `CONSTANT_CASE` and others
Tokenize a string into words and whitespace tokens
Encode and decode quoted printable and base64 strings
Tiny Casing utils
Provide a high level wrapper for kuromoji.js
Uses snapdragon to tokenize a single JavaScript block comment into an object, with description, tags, and code example sections that can be passed to any other comment parsers for further parsing.
Tokenize Excel formulas
Convert a string of words to a JavaScript identifier
The lexer for Materialize's SQL dialect, with wasm build targets.
Split a double-precision floating-point number into a higher order word and a lower order word.
Create a double-precision floating-point number from a higher order word and a lower order word.
Generates tokens consisting of readable words from your system dictionary
Right now all we do is convert fpmw to zipf and other units.
TokenKit provides lightweight, Unicode-aware word-level tokenization with pattern preservation, backed by Rust for performance.
Thai language tools for Ruby, i.e. a word tokenizer, a character level indentifier, and a romanization tool
Filtra filters an array of tokens or words so they can be indexed by Busca, the simple redis search
Textoken is a Ruby library for text tokenization. This gem extracts words from text with many customizations. It can be used in many fields like Web Crawling and Natural Language Processing.
RubyTokenizer is a simple language processing command-line tool. It performs low-level tokenization and returns the top 10 most frequent words in a body of text. At the moment it's only available for English texts and it segments words by filtering whitespaces, punctuation marks, parantheses and other special characters.
Generate random names from themed word lists (Gundam, Star Trek, Star Wars, Transformers, and more) with configurable patterns and token formats.
Proper related posts plugin for Jekyll - uses document correlation matrix on TF-IDF (optionally with Latent Semantic Indexing). Each document is tokenized and stemmed, every word found is treated as keyword for analysis (except for some stop words). TF-IDF matrix for the whole site is calculated (including extra provided weights), then if given accuraccy is lower than 1.0, LSI algorithm is used to compute new simplified vector space. Document correlation matrix is created using dot product of the matrix and its transpose. For each of the post' related documents are inserted into priority queue (sorted by score from document correlation matrix), assuming the score is greater than minimal required score. Selected few bests related posts are retrieven from the queue. Liquid template for each post is rendered and <related-posts /> is replaced with the outcomes of algorithm.