JavaScript Unicode 8.0 Normalization - NFC, NFD, NFKC, NFKD. Read <http://unicode.org/reports/tr15/> UAX #15 Unicode Normalization Forms.
A JavaScript library that breaks strings into their individual user-perceived characters (including emojis!)
Fast and lightweight Node.js based Unicode Normalization CLI tool and library
ESLint plugin for detecting and fixing whitespace issues - zero-width characters, non-breaking spaces, and Unicode normalization
High-performance, strictly typed transliteration library with support for Unicode normalization, smart casing, and multiple scripts.
Unicode Trie data structure for fast character metadata lookup, ported from ICU
Ethereum Name Service (ENS) Name Normalizer
Unicode normalization layer — strips invisible characters and neutralizes bidirectional text attacks
Powerful Japanese text processor for Node.js — romaji conversion, hiragana, katakana, furigana with prepasses, mixed-language segmentation, traditional to modern kanji conversion, unicode normalization and kanji fallback
iLib is a cross-engine library of internationalization (i18n) classes written in pure JS
Detect whether the terminal supports Unicode
Compile ES2015 Unicode escapes to ES5
Compile Unicode property escapes in Unicode regular expressions to ES5.
Match a Unicode property or property alias to its canonical property name per the algorithm used for RegExp Unicode property escapes in ECMAScript.
Regenerate sets for Unicode properties and values.
Compile ES2015 Unicode regex to ES5
Parse regular expressions' unicodeSets (v) flag.
Match a Unicode property or property alias to its canonical property name per the algorithm used for RegExp Unicode property escapes in ECMAScript.
Compile regular expressions' unicodeSets (v) flag.
The set of canonical Unicode property names supported in ECMAScript RegExp property escapes.
A rehype plugin that automatically adds unique id attributes to heading elements (h1–h6), supporting explicit slug notation, Unicode normalization, duplicate and invalid slug handling, flexible id assignment strategies, and compatibility with MDX. Powered
Unicode property alias mappings in JavaScript format for property names that are supported in ECMAScript RegExp property escapes.
Normalize unicode-range descriptors, and can convert to wildcard ranges.
Provides fast access to unicode character properties
This crate provides functions for normalization of Unicode strings, including Canonical and Compatible Decomposition and Recomposition, as described in Unicode Standard Annex #15.
Rust reference implementation of matrix256v1 — a SHA-256 fingerprint over the (path, size) records of a rooted filesystem tree.
日本語テキスト正規化ライブラリ - 文字幅、かな、Unicode、句読点、旧字体の正規化
No Diacr!
SIMD-accelerated Unicode normalization (NFC, NFD, NFKC, NFKD)
This crate provides functions for normalization of Unicode strings, including Canonical and Compatible Decomposition and Recomposition, as described in Unicode Standard Annex #15.
A Rust crate to remove accents from strings, inspired by PostgreSQL's unaccent extension.
Strip invisible / steganographic Unicode code points from strings before they reach a knowledge store, a journal, or a prompt.
Rust bindings to the utf8proc library
Unsafe rust bindings to the utf8proc library
A simple Trie implementation in Rust
Brazilian utilities for Rust - validation and formatting for CPF, CNPJ, RENAVAM, voter ID, and more
A Unicode Normalization Form validation plugin for Rails
Unicode normalization library.
Unicode Normalization Form support library for CRuby
Efficient pure Ruby Unicode normalization.
This is a wrapper library to bring Unicode Normalization Form support to Ruby/JRuby.
Unicode normalize string value. see http://site.icu-project.org/
Unicode normalization library using utf8proc
InfraRuby Unicode Normalization Form functions
[Unicode 17.0.0] Compares two strings if they are visually confusable as described in Unicode® Technical Standard #39: Both strings get transformed into a skeleton format before comparing them. The skeleton is generated by normalizing the string, replacing confusable characters, and then normalizing the string again.
Zaru takes a given filename (a string) and normalizes, filters and truncates it, so it can be safely used as a filename in modern operating systems. Zaru doesn't remove Unicode characters when not necessary.
Toolkit for security research manipulating Unicode: confusables, homoglyphs, hexdump, code point, UTF-8, UTF-16, UTF-32, properties, regexp search, size, grapheme, surrogates, version, ICU, CLDR, UCD, BiDi, normalization
U U extends Ruby’s Unicode support. It provides a string class called U::String with an interface that mimics that of the String class in Ruby 2.0, but that can also be used from both Ruby 1.8. This interface also has more complete Unicode support and never modifies the receiver. Thus, a U::String is an immutable value object. U comes with complete and very accurate documentation¹. The documentation can realistically also be used as a reference to the Ruby String API and may actually be preferable, as it’s a lot more explicit and complete than the documentation that comes with Ruby. ¹ See http://disu.se/software/u-1.0/api/ § Installation Install u with % gem install u § Usage Usage is basically the following: require 'u-1.0' a = 'äbc' a.upcase # ⇒ 'äBC' a.u.upcase # ⇒ 'ÄBC' That is, you require the library, then you invoke #u on a String. This’ll give you a U::String that has much better Unicode support than a normal String. It’s important to note that U only uses UTF-8, which means that #u will try to #encode the String as such. This shouldn’t be an issue in most cases, as UTF-8 is now more or less the universal encoding – and rightfully so. As U::Strings¹ are immutable value objects, there’s also a U::Buffer² available for building U::Strings efficiently. See the API³ for more complete usage information. The following sections will only cover the extensions and differences that U::String exhibit from Ruby’s built-in String class. ¹ See http://disu.se/software/u-1.0/api/U/String/ ² See http://disu.se/software/u-1.0/api/U/Buffer/ ³ See http://disu.se/software/u-1.0/api/ § Unicode Properties There are quite a few property-checking interrogators that let you check if all characters in a U::String have the given Unicode property: • #alnum?¹ • #alpha?² • #assigned?³ • #case_ignorable?⁴ • #cased?⁵ • #cntrl?⁶ • #defined?⁷ • #digit?⁸ • #graph?⁹ • #newline?¹⁰ • #print?¹¹ • #punct?¹² • #soft_dotted?¹³ • #space?¹⁴ • #title?¹⁵ • #valid?¹⁶ • #wide?¹⁷ • #wide_cjk?¹⁸ • #xdigit?¹⁹ • #zero_width?²⁰ ¹ See http://disu.se/software/u-1.0/api/U/String/#alnum-p-instance-method ² See http://disu.se/software/u-1.0/api/U/String/#alpha-p-instance-method ³ See http://disu.se/software/u-1.0/api/U/String/#assigned-p-instance-method ⁴ See http://disu.se/software/u-1.0/api/U/String/#case_ignorable-p-instance-method ⁵ See http://disu.se/software/u-1.0/api/U/String/#cased-p-instance-method ⁶ See http://disu.se/software/u-1.0/api/U/String/#cntrl-p-instance-method ⁷ See http://disu.se/software/u-1.0/api/U/String/#defined-p-instance-method ⁸ See http://disu.se/software/u-1.0/api/U/String/#digit-p-instance-method ⁹ See http://disu.se/software/u-1.0/api/U/String/#graph-p-instance-method ¹⁰ See http://disu.se/software/u-1.0/api/U/String/#newline-p-instance-method ¹¹ See http://disu.se/software/u-1.0/api/U/String/#print-p-instance-method ¹² See http://disu.se/software/u-1.0/api/U/String/#punct-p-instance-method ¹³ See http://disu.se/software/u-1.0/api/U/String/#soft_dotted-p-instance-method ¹⁴ See http://disu.se/software/u-1.0/api/U/String/#space-p-instance-method ¹⁵ See http://disu.se/software/u-1.0/api/U/String/#title-p-instance-method ¹⁶ See http://disu.se/software/u-1.0/api/U/String/#valid-p-instance-method ¹⁷ See http://disu.se/software/u-1.0/api/U/String/#wide-p-instance-method ¹⁸ See http://disu.se/software/u-1.0/api/U/String/#wide_cjk-p-instance-method ¹⁹ See http://disu.se/software/u-1.0/api/U/String/#xdigit-p-instance-method ²⁰ See http://disu.se/software/u-1.0/api/U/String/#zero_width-p-instance-method Similar to these methods are • #folded?¹ • #lower?² • #upper?³ which check whether a ‹U::String› has been cased in a given manner. ¹ See http://disu.se/software/u-1.0/api/U/String/#folded-p-instance-method ² See http://disu.se/software/u-1.0/api/U/String/#lower-p-instance-method ³ See http://disu.se/software/u-1.0/api/U/String/#upper-p-instance-method There’s also a #normalized?¹ method that checks whether a ‹U::String› has been normalized on a given form. ¹ See http://disu.se/software/u-1.0/api/U/String/#normalized-p-instance-method You can also access certain Unicode properties of the characters of a U::String: • #canonical_combining_class¹ • #general_category² • #grapheme_break³ • #line_break⁴ • #script⁵ • #word_break⁶ ¹ See http://disu.se/software/u-1.0/api/U/String/#canonical_combining_class-instance-method ² See http://disu.se/software/u-1.0/api/U/String/#general_category-instance-method ³ See http://disu.se/software/u-1.0/api/U/String/#grapheme_break-instance-method ⁴ See http://disu.se/software/u-1.0/api/U/String/#line_break-instance-method ⁵ See http://disu.se/software/u-1.0/api/U/String/#script-instance-method ⁶ See http://disu.se/software/u-1.0/api/U/String/#word_break-instance-method § Locale-specific Comparisons Comparisons of U::Strings respect the current locale (and also allow you to specify a locale to use): ‹#<=>›¹, #casecmp², and #collation_key³. ¹ See http://disu.se/software/u-1.0/api/U/String/#comparison-operator ² See http://disu.se/software/u-1.0/api/U/String/#casecmp-instance-method ³ See http://disu.se/software/u-1.0/api/U/String/#collation_key-instance-method § Additional Enumerators There are a couple of additional enumerators in #each_grapheme_cluster¹ and #each_word² (along with aliases #grapheme_clusters³ and #words⁴). ¹ See http://disu.se/software/u-1.0/api/U/String/#each_grapheme_cluster-instance-method ² See http://disu.se/software/u-1.0/api/U/String/#each_word-instance-method ³ See http://disu.se/software/u-1.0/api/U/String/#grapheme_clusters-instance-method ⁴ See http://disu.se/software/u-1.0/api/U/String/#words-instance-method § Unicode-aware Sub-sequence Removal #Chomp¹, #chop², #lstrip³, #rstrip⁴, and #strip⁵ all look for Unicode newline and space characters, rather than only ASCII ones. ¹ See http://disu.se/software/u-1.0/api/U/String/#chomp-instance-method ² See http://disu.se/software/u-1.0/api/U/String/#chop-instance-method ³ See http://disu.se/software/u-1.0/api/U/String/#lstrip-instance-method ⁴ See http://disu.se/software/u-1.0/api/U/String/#rstrip-instance-method ⁵ See http://disu.se/software/u-1.0/api/U/String/#strip-instance-method § Unicode-aware Conversions Case-shifting methods #downcase¹ and #upcase² do proper Unicode casing and the interface is further augmented by #foldcase³ and #titlecase⁴. #Mirror⁵ and #normalize⁶ do conversions similar in nature to the case-shifting methods. ¹ See http://disu.se/software/u-1.0/api/U/String/#downcase-instance-method ² See http://disu.se/software/u-1.0/api/U/String/#upcase-instance-method ³ See http://disu.se/software/u-1.0/api/U/String/#foldcase-instance-method ⁴ See http://disu.se/software/u-1.0/api/U/String/#titlecase-instance-method ⁵ See http://disu.se/software/u-1.0/api/U/String/#mirror-instance-method ⁶ See http://disu.se/software/u-1.0/api/U/String/#normalize-instance-method § Width Calculations #Width¹ will return the number of cells on a terminal that a U::String will occupy. #Center², #ljust³, and #rjust⁴ deal in width rather than length, making them much more useful for generating terminal output. #%⁵ (and its alias #format⁶) similarly deal in width. ¹ See http://disu.se/software/u-1.0/api/U/String/#width-instance-method ² See http://disu.se/software/u-1.0/api/U/String/#center-instance-method ³ See http://disu.se/software/u-1.0/api/U/String/#ljust-instance-method ⁴ See http://disu.se/software/u-1.0/api/U/String/#rjust-instance-method ⁵ See http://disu.se/software/u-1.0/api/U/String/#modulo-operator ⁶ See http://disu.se/software/u-1.0/api/U/String/#format-instance-method § Extended Type Conversions Finally, #hex¹, #oct², and #to_i³ use Unicode alpha-numerics for their respective conversions. ¹ See http://disu.se/software/u-1.0/api/U/String/#hex-instance-method ² See http://disu.se/software/u-1.0/api/U/String/#oct-instance-method ³ See http://disu.se/software/u-1.0/api/U/String/#to_i-instance-method § News § 1.0.0 Initial public release! § Financing Currently, most of my time is spent at my day job and in my rather busy private life. Please motivate me to spend time on this piece of software by donating some of your money to this project. Yeah, I realize that requesting money to develop software is a bit, well, capitalistic of me. But please realize that I live in a capitalistic society and I need money to have other people give me the things that I need to continue living under the rules of said society. So, if you feel that this piece of software has helped you out enough to warrant a reward, please PayPal a donation to now@disu.se¹. Thanks! Your support won’t go unnoticed! ¹ Send a donation: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=now@disu.se&item_name=U § Reporting Bugs Please report any bugs that you encounter to the {issue tracker}¹. ¹ See https://github.com/now/u/issues § Authors Nikolai Weibull wrote the code, the tests, the documentation, and this README. § Licensing U is free software: you may redistribute it and/or modify it under the terms of the {GNU Lesser General Public License, version 3}¹ or later², as published by the {Free Software Foundation}³. ¹ See http://disu.se/licenses/lgpl-3.0/ ² See http://gnu.org/licenses/ ³ See http://fsf.org/
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.