Unicode line breaking and text segmentation algorithms for text boundaries analysis
Data for the icu_segmenter crate
Deprecated crate!
ICU4X-backed CLDR segmentation and locale-aware collation for OxiText
The `i18n-message` crate of the Internationalisation project.
Text tokenizer for linguistic purposes, such as text matching. Supports more than 40 languages, including English, French, Russian, Japanese, Thai etc.
Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from Rust and Python.
Rialight internationalization module.
Correct Unicode text handling for every script: bidi, line breaking, segmentation, normalization
Generate data for ICU4X DataProvider
ICU collation extension
Data for the icu_locid_transform crate