Detects among the Japanese legacy encodings
C API for shift_or_euc
A character encoding detector for legacy Web content
RFC 2047 MIME encoded-word decoder. Decodes =?charset?(B|Q)?text?= subjects + display names to UTF-8. Supports the full WHATWG Encoding charset set via encoding_rs (UTF-8, ISO-8859-*, Windows-*, ISO-2022-JP, Shift_JIS, EUC-KR, GB18030, …).
CLI for LiTA tokenizers.
A line-map and character-encoding-aware red-green tree for structured, lossless, incrementally-editable text
A Gecko-oriented implementation of the Encoding Standard
A Gecko-oriented implementation of the Encoding Standard
MeCab-based furigana and romaji annotation for Japanese text — no Python, no kakasi
Space-efficient std::io::{Read, Write} wrappers for encoding_rs
ripgrep is a line-oriented search tool that recursively searches the current directory for a regex pattern while respecting gitignore rules. ripgrep has first class support on Windows, macOS and Linux.
Parser for arena-backed, lightweight representations of Prolog-like terms