text-segmentation ==============
A lightweight implementation of the Unicode Text Segmentation (UAX #29)
CLDR text segmentation for JavaScript
text segmentation into sentences
text segmentation into sentences
Text Segmentation data
Text Segmentation data (modern only: deprecated)
A library for multilingual word, phrase and sentence segmentation.
PAT tree construction for Chinese documents, keyword extraction and text segmentation
Implement Chinese text segmentation algorithm
Native Chinese text segmentation for React Native, powered by cppjieba.
Khmer text segmentation, normalization, cluster and typing-game utilities for JavaScript and TypeScript.
A specialized tokenizer library for Japanese lyrics analysis that provides intelligent text segmentation using the Kuromoji morphological analyzer.
WebAssembly efficient text segmentation; support english, chinese, japanese and other.
A high-performance wrapper around `Intl.Segmenter` for efficient text segmentation. This class resolves memory handling issues seen with large strings and can enhance performance by 50-500x. Only ~70 loc (with comments) and no dependencies.
A JavaScript library to convert a string into an array of graphemes, following the Unicode 12 Text Segmentation spec
A FastMCP server with text segmentation tool - split text by natural paragraphs
Pretrained body segmentation model
Pretrained BodyPix model in TensorFlow.js
Polymorphic Segmentation utility for Cornerstone3D
Split text into semantic or structural chunks using purely algorithmic strategies. Supports mixed Japanese/English text.
JavaScript version of GPAC's MP4Box tool
Library of utilities to instrument A/B tests and segmentation in plasmic.
Complete TypeScript port of rivo/uniseg with 100% API compatibility. Unicode text segmentation for grapheme clusters, word boundaries, and text width calculation.
Segments text into sentences
It is a reimplementation text_sentencer, which is originally written in ruby, using C extension for a better performance. It is a preliminary version, and may not be fully functional.
A Rails 3 Engine for managing and rendering text segments into your Rails web application. Segments are short-to-medium blocks of text or HTML that you wish to use throughout your application. It includes a web interface for managing segments, and is automatically compatible with Internationalization (I18n)
segments text according word frequency using the Viterbi algorithm.
Pragmatic Segmenter is a sentence segmentation tool for Ruby. It allows you to split a text into an array of sentences. This gem provides 2 main benefits over other segmentation gems - 1) It works well even with ill-formatted text 2) It works for multiple languages
It is a reimplementation text_sentencer, which is originally written in ruby, using C extension for a better performance.
TextSentencer is a simple rule-based system for segmenting text into sentences.
Converts HTML to plain text, preserving as much legibility and functionality as possible. Ideal for providing a plaintext multipart segment of email messages.
Scalpel is a sentence segmentation tool for Ruby. It allows you to split a text into an array of sentences. It is simple, lightweight, blazing fast and does not require any domain-specific training. It works well even in the face of ill-formatted texts.
Library to build random text strings from rules defined as ruby hashes. Rules consist of Builders, which have Items which are componsed of Segments. Generated text can be modified with filters.
Parsing URI and tokenize URI segments. Scan input text and extract URIs to array. Based on Ragel FSM compiler.
RubyTokenizer is a simple language processing command-line tool. It performs low-level tokenization and returns the top 10 most frequent words in a body of text. At the moment it's only available for English texts and it segments words by filtering whitespaces, punctuation marks, parantheses and other special characters.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.