No description provided.
A JavaScript module to convert legacy Hanyang PUA characters to Unicode.
The semantic version parser used by npm.
Match balanced character pairs, like "{" and "}"
Lexes CommonJS modules, returning their named exports metadata
A very strict and proper argument parser.
Transforms css values and at-rule params into the tree
Brace expansion as known from sh/bash
Option parsing for Node, supporting types, shorthands, etc. Used by npm.
detect and normalize encodings of text
Resolve package.json exports & imports maps
Tiny millisecond conversion utility
Extract the non-magic parent path from a glob string.
Converts a source-map from/to different formats and allows adding/changing properties.
A tiny (952b), correct, general-purpose, and configurable "exports" and "imports" resolver without file-system reliance
A Babel plugin to inject imports to core-js@3 polyfills
TypeScript definitions for methods
The default Vite plugin for React projects
Create an array of unique values, in order, from the input arrays
Parse String to Number based on configuration
🌈Easily set your terminal text color & styles.
A string tag that strips indentation from multi-line strings. ⬅️
Tokenize CSS
Minimal module to check if a file is executable.
[Unicode 17.0.0] Retrieve the Unicode script(s) a string belongs to. Can also return the Script_Extension property which is defined as characters which are 'commonly used with more than one script, but with a limited number of scripts'.
Some [hopefully] useful extensions to Ruby’s String class. Stringex is made up of three libraries: ActsAsUrl [permalink solution with better character translation], Unidecoder [Unicode to Ascii transliteration], and StringExtensions [miscellaneous helper methods for the String class].
Some [hopefully] useful extensions to Ruby’s String class. Stringex is made up of three libraries: ActsAsUrl [permalink solution with better character translation], Unidecoder [Unicode to Ascii transliteration], and StringExtensions [miscellaneous helper methods for the String class].
== ICU4R - ICU Unicode bindings for Ruby ICU4R is an attempt to provide better Unicode support for Ruby, where it lacks for a long time. Current code is mostly rewritten string.c from Ruby 1.8.3. ICU4R is Ruby C-extension binding for ICU library[1] and provides following classes and functionality: * UString: - String-like class with internal UTF16 storage; - UCA rules for UString comparisons (<=>, casecmp); - encoding(codepage) conversion; \ - Unicode normalization; - transliteration, also rule-based; Bunch of locale-sensitive functions: - upcase/downcase; - string collation; \ - string search; - iterators over text line/word/char/sentence breaks; \ - message formatting (number/currency/string/time); - date and number parsing. * URegexp - unicode regular expressions. * UResourceBundle - access to resource bundles, including ICU locale data. * UCalendar - date manipulation and timezone info. * UConverter - codepage conversions API * UCollator - locale-sensitive string comparison == Install and usage > ruby extconf.rb > make && make check > make install Now, in your scripts just require 'icu4r'. To create RDoc, run > sh tools/doc.sh == Requirements To build and use ICU4R you will need GCC and ICU v3.4 libraries[2]. == Differences from Ruby String and Regexp classes === UString vs String 1. UString substring/index methods use UTF16 codeunit indexes, not code points. 2. UString supports most methods from String class. Missing methods are: capitalize, capitalize!, swapcase, swapcase! %, center, ljust, rjust chomp, chomp!, chop, chop! \ count, delete, delete!, squeeze, squeeze!, tr, tr!, tr_s, tr_s! crypt, intern, sum, unpack dump, each_byte, each_line hex, oct, to_i, to_sym reverse, reverse! succ, succ!, next, next!, upto 3. Instead of String#% method, UString#format is provided. See FORMATTING for short reference. 4. UStrings can be created via String.to_u(encoding='utf8') or global u(str,[encoding='utf8']) calls. Note that +encoding+ parameter must be value of String class. 5. There's difference between character grapheme, codepoint and codeunit. See UNICODE reports for gory details, but in short: locale dependent notion of character can be presented using more than one codepoint - base letter and combining (accents) (also possible more than one!), and each codepoint can require more than one codeunit to store (for UTF8 codeunit size is 8bit, though \ some codepoints require up to 4bytes). So, UString has normalization and locale dependent break iterators. 6. Currently UString doesn't include Enumerable module. 7. UString index/[] methods which accept URegexp, throw exception if Regexp passed. 8. UString#<=>, UString#casecmp use UCA rules. === URegexp UString uses ICU regexp library. Pattern syntax is described in [./docs/UNICODE_REGEXPS] and ICU docs. There are some differences between processing in Ruby Regexp and URegexp: 1. When UString#sub, UString#gsub are called with block, special vars ($~, $&, $1, ...) aren't set, as their values are processed through deep ruby core code. Instead, block receives UMatch object, which is essentially immutable array of matching groups: "test".u.gsub(ure("(e)(.)")) do |match| \ puts match[0] # => 'es' <--> $& puts match[1] # => 'e' \ <--> $1 puts match[2] # => 's' <--> $2 end 2. In URegexp search pattern backreferences are in form \n (\1, \2, ...), in replacement string - in form $1, $2, ... NOTE: URegexp considers char to be a digit NOT ONLY ASCII (0x0030-0x0039), but any Unicode char, which has property Decimal digit number (Nd), e.g.: a = [?$, 0x1D7D9].pack("U*").u * 2 puts a.inspect_names <U000024>DOLLAR SIGN <U01D7D9>MATHEMATICAL DOUBLE-STRUCK DIGIT ONE <U000024>DOLLAR SIGN <U01D7D9>MATHEMATICAL DOUBLE-STRUCK DIGIT ONE puts "abracadabra".u.gsub(/(b)/.U, a) abbracadabbra \ 3. One can create URegexp using global Kernel#ure function, Regexp#U, Regexp#to_u, or from UString using URegexp.new, e.g: /pattern/.U =~ "string".u 4. There are differences about Regexp and URegexp multiline matching options: t = "text\ntest" # ^,$ handling : URegexp multiline <-> Ruby default t.u =~ ure('^\w+$', URegexp::MULTILINE) => #<UMatch:0xf6f7de04 @ranges=[0..3], @cg=[\u0074\u0065\u0078\u0074]> t =~ /^\w+$/ => 0 # . matches \n : URegexp DOTALL <-> /m t.u =~ ure('.+test', URegexp::DOTALL) \ => #<UMatch:0xf6fa4d88 ... t.u =~ /.+test/m 5. UMatch.range(idx) returns range for capturing group idx. This range is in codeunits. === References 1. ICU Official Homepage http://ibm.com/software/globalization/icu/ 2. ICU downloads \ http://ibm.com/software/globalization/icu/downloads.jsp 3. ICU Home Page http://icu.sf.net 4. Unicode Home Page http://www.unicode.org ==== BUGS, DOCS, TO DO The code is slow and inefficient yet, is still highly experimental, so can have many security and memory leaks, bugs, inconsistent documentation, incomplete test suite. Use it at your own risk. Bug reports and feature requests are welcome :) === Copying This extension module is copyrighted free software by Nikolai Lugovoi. You can redistribute it and/or modify it under the terms of MIT License. Nikolai Lugovoi <meadow.nnick@gmail.com>
This will replace the Latin character inside a string to correspond to normal character example: à to a. Usage: LatinToNormalCharacter.transform('ThÏs ís  strìng wÌth Lãtîn úñîcÔdë.') and will return a string value of "ThIs is A string wIth Latin unicOde. List of supported latin characters: A: ['À', 'Á', 'Â', 'Ã', 'Ä', 'Å'] a: ['à', 'á', 'â', 'ã', 'ä', 'å'] B: ['Ɓ', 'Ƃ', 'Ƅ', 'ʙ'] b: ['ƀ', 'ƃ', 'ƅ'] C: ['Ç', 'Č', 'Ɔ', 'Ƈ'] c: ['ç', 'č', 'ƈ'] D: ['Ð', 'Ƌ', 'Ɗ'] d: ['ð', 'ƌ', 'ƍ'] E: ['È', 'É', 'Ê', 'Ë', 'Ĕ', 'Ǝ', 'Ɛ'] e: ['è', 'é', 'ê', 'ë', 'ĕ', 'Ə', 'ʚ'] F: ['Ƒ'] f: ['ƒ'] G: ['Ğ', 'Ģ', 'Ĝ', 'Ġ', 'Ɠ', 'ʛ'] g: ['ğ', 'ģ', 'ĝ', 'ġ'] H: ['Ĥ', 'Ħ', 'ʜ'] h: ['ĥ', 'ħ', 'ʰ', 'ʯ', 'ʮ'] I: ['Ì', 'Í', 'Î', 'Ï', 'Ĩ', 'Ī', 'Ĭ', 'Į', 'İ', 'Ɨ'] i: ['ì', 'í', 'î', 'ï', 'ĩ', 'ī', 'ĭ', 'į', 'ı'] J: ['Ĵ'] j: ['ĵ', 'ʝ'] K: ['Ķ', 'Ƙ'] k: ['ķ', 'ĸ', 'ƙ', 'ʞ'] L: ['Ĺ', 'Ļ', 'Ľ', 'Ŀ', 'Ł', 'ʟ'] l: ['ĺ', 'ļ', 'ľ', 'ŀ', 'ł', 'ƚ'] M: ['Ɯ'] m: ['ɯ', 'ɰ', 'ɱ'] N: ['Ñ', 'Ń', 'Ņ', 'Ň', 'Ŋ', 'Ɲ'] n: ['ñ', 'ń', 'ņ', 'ň', 'ŋ', 'ʼn', 'ɲ', 'ɳ', 'ƞ', 'ɴ'] O: ['Ò', 'Ó', 'Ô', 'Õ', 'Ö', 'Ø', 'Ō', 'Ŏ', 'Ő', 'Ɵ', 'Ơ'] o: ['ò', 'ó', 'ô', 'õ', 'ö', 'ø', 'ō', 'ŏ', 'ő', 'ơ', 'ɵ'] P: ['Ƥ'] p: ['ƥ'] q: ['ʠ'] R: ['Ŕ', 'Ŗ', 'Ř'] r: ['ŕ', 'ŗ', 'ř', 'ɹ', 'ɺ', 'ɻ', 'ɼ', 'ɽ', 'ɾ', 'ɿ', 'ʀ', 'ʁ'] S: ['Ŝ', 'Ş', 'Š', 'Ś'] s: ['ŝ', 'ş', 'š', 'ś', 'ſ', 'ʂ'] T: ['Ţ', 'Ť', 'Ŧ', 'Ƭ', 'Ʈ'] t: ['ţ', 'ť', 'ŧ', 'ƭ', 'ƫ', 'ʇ', 'ʈ'] U: ['Ù', 'Ú', 'Û', 'Ü', 'Ū', 'Ũ', 'Ŭ', 'Ů', 'Ű', 'Ų', 'Ư'] u: ['ù', 'ú', 'û', 'ü', 'ū', 'ũ', 'ŭ', 'ů', 'ű', 'ų', 'ư', 'ʉ'] V: ['Ʋ'] v: ['ʋ', 'ʌ'] W: ['Ŵ'] w: ['ŵ', 'ʍ'] Y: ['Ý', 'Ÿ', 'Ŷ', 'Ƴ'] y: ['ý', 'ŷ', 'ƴ', 'ʎ', 'ʏ'] Z: ['Ž', 'Ź', 'Ż', 'Ƶ'] z: ['ž', 'ź', 'ż', 'ƶ', 'ʐ', 'ʑ'] AE: ['Æ'] ae: ['æ'] IJ: ['IJ'] ij: ['ij'] OE: ['Œ'] oe: ['œ', 'ɶ'] th: ['Þ'] SS: ['ß'] YR: ['Ʀ'] ESH: ['Ʃ'] esh: ['ƪ'] EZH: ['Ʒ', 'Ƹ'] ezh: ['ƹ', 'ƺ'] dz: ['ƻ'] Q: ['Ƽ'] q: ['ƽ'] ts: ['ƾ'] Wynn: ['ƿ'] Updates: 0.0.4 & 0.0.5 - update the coverage of latin string support. 0.0.6 - fix issue on non string value. 0.0.7 - fix issue on non string value.