A minimal UTF8 implementation for number arrays.
Get utf8 byte length of string
A browser UTF-8 string <-> UInt8Array converter
A Node.JS UTF-8 string <-> UInt8Array converter
Truncate string to given length in bytes
Detect if a buffer is utf8 encoded.
Turn a string into an ArrayBuffer by using the UTF8 encoding.
A well-tested UTF-8 encoder/decoder written in JavaScript.
single-function form of node's Buffer.toString(utf8)
buffer operations
Babel compiler core.
A JavaScript parser
The Babel Traverse module maintains the overall tree state, and is responsible for replacing, removing, and adding nodes
Babel Types is a Lodash-esque utility library for AST nodes
pvtsutils is a set of common utility functions used in various Peculiar Ventures TypeScript based projects.
Generate an AST from a string template.
Turns an AST into code.
utf8 to/from bytes codec (esm/cjs)
render domhandler DOM nodes to a string
A Babel preset for each environment.
Babel preset for TypeScript.
TypeScript definitions for utf8
Babel preset for all React plugins.
Standalone build of Babel for use in non-Node.js environments.
UTF-8 validation and truncation helpers for RustUse
Feature-gated facade crate for RustUse encoding helpers
Convert a string's encoding to utf8, whithout caring which encoding used before converting.
A lightweight UTF8-aware String class meant for use with Ruby 1.8
Rename folderful of filenames to Mac OS X supported format (UTF8-MAC). Mac OS X use only.
Unichars is a wrapper around Glib2 UTF8 functions.
Poetry is so XIX century, nowadays we express ourselves using emojis. This gem adds a task to migrate mysql utf8 tables to utf8mb4 in rails projects.
This gem attempts to convert the received text to UTF8. It works by trying to convert the given text with a list of possible common encodings. This is useful if the developer knows the most common encodings that the application is going to be receiving, leaving the guessing work to this gem and by safely converting (without crash) the received text.
Use Mysql AUTO_INCREMENT to support key value cache, which should be combined by an integer and string. It means to reduce the database storage size, and improve query performance. All cache will store in process memory, and will never be expired, until the process dies, so the less kvs you use, the better performance you will get. BTW, 100,000 general strings use 10MB memory. Some relatived articles: http://en.wikipedia.org/wiki/Correlation_database Usage ------------------------------------------ ## setup ```ruby create_table :kv_browser_names, :options => 'ENGINE=MyISAM DEFAULT CHARSET=utf8' do |t| t.string :name t.timestamps end class KvBrowserName < ActiveRecord::Base include IdNameCache end ``` or ```ruby create_table :common_tag, :options => 'ENGINE=MyISAM DEFAULT CHARSET=utf8' do |t| t.integer :tagid t.string :tagname end class CommonTag < ActiveRecord::Base self.table_name = :common_tag self.primary_key = :tagid include IdNameCache; set_key_value :tagid, :tagname # include IdNameCache; set_key_value_without_create :tagid, :tagname # if you dont want create it automately end ``` ### use cases ```text ruby-1.9.3-rc1 :001 > QuizTag[1] QuizTag Load (0.3ms) SELECT `common_tag`.* FROM `common_tag` WHERE `common_tag`.`tagid` = 1 LIMIT 1 => "Android" ruby-1.9.3-rc1 :002 > QuizTag[1] => "Android" ruby-1.9.3-rc1 :003 > QuizTag['Android'] QuizTag Load (0.5ms) SELECT `common_tag`.* FROM `common_tag` WHERE `common_tag`.`tagname` = 'Android' LIMIT 1 => 1 ruby-1.9.3-rc1 :004 > QuizTag['Android'] => 1 ``` == Copyright MIT, David Chen at eoe.cn
Diff and patch tables
== ICU4R - ICU Unicode bindings for Ruby ICU4R is an attempt to provide better Unicode support for Ruby, where it lacks for a long time. Current code is mostly rewritten string.c from Ruby 1.8.3. ICU4R is Ruby C-extension binding for ICU library[1] and provides following classes and functionality: * UString: - String-like class with internal UTF16 storage; - UCA rules for UString comparisons (<=>, casecmp); - encoding(codepage) conversion; \ - Unicode normalization; - transliteration, also rule-based; Bunch of locale-sensitive functions: - upcase/downcase; - string collation; \ - string search; - iterators over text line/word/char/sentence breaks; \ - message formatting (number/currency/string/time); - date and number parsing. * URegexp - unicode regular expressions. * UResourceBundle - access to resource bundles, including ICU locale data. * UCalendar - date manipulation and timezone info. * UConverter - codepage conversions API * UCollator - locale-sensitive string comparison == Install and usage > ruby extconf.rb > make && make check > make install Now, in your scripts just require 'icu4r'. To create RDoc, run > sh tools/doc.sh == Requirements To build and use ICU4R you will need GCC and ICU v3.4 libraries[2]. == Differences from Ruby String and Regexp classes === UString vs String 1. UString substring/index methods use UTF16 codeunit indexes, not code points. 2. UString supports most methods from String class. Missing methods are: capitalize, capitalize!, swapcase, swapcase! %, center, ljust, rjust chomp, chomp!, chop, chop! \ count, delete, delete!, squeeze, squeeze!, tr, tr!, tr_s, tr_s! crypt, intern, sum, unpack dump, each_byte, each_line hex, oct, to_i, to_sym reverse, reverse! succ, succ!, next, next!, upto 3. Instead of String#% method, UString#format is provided. See FORMATTING for short reference. 4. UStrings can be created via String.to_u(encoding='utf8') or global u(str,[encoding='utf8']) calls. Note that +encoding+ parameter must be value of String class. 5. There's difference between character grapheme, codepoint and codeunit. See UNICODE reports for gory details, but in short: locale dependent notion of character can be presented using more than one codepoint - base letter and combining (accents) (also possible more than one!), and each codepoint can require more than one codeunit to store (for UTF8 codeunit size is 8bit, though \ some codepoints require up to 4bytes). So, UString has normalization and locale dependent break iterators. 6. Currently UString doesn't include Enumerable module. 7. UString index/[] methods which accept URegexp, throw exception if Regexp passed. 8. UString#<=>, UString#casecmp use UCA rules. === URegexp UString uses ICU regexp library. Pattern syntax is described in [./docs/UNICODE_REGEXPS] and ICU docs. There are some differences between processing in Ruby Regexp and URegexp: 1. When UString#sub, UString#gsub are called with block, special vars ($~, $&, $1, ...) aren't set, as their values are processed through deep ruby core code. Instead, block receives UMatch object, which is essentially immutable array of matching groups: "test".u.gsub(ure("(e)(.)")) do |match| \ puts match[0] # => 'es' <--> $& puts match[1] # => 'e' \ <--> $1 puts match[2] # => 's' <--> $2 end 2. In URegexp search pattern backreferences are in form \n (\1, \2, ...), in replacement string - in form $1, $2, ... NOTE: URegexp considers char to be a digit NOT ONLY ASCII (0x0030-0x0039), but any Unicode char, which has property Decimal digit number (Nd), e.g.: a = [?$, 0x1D7D9].pack("U*").u * 2 puts a.inspect_names <U000024>DOLLAR SIGN <U01D7D9>MATHEMATICAL DOUBLE-STRUCK DIGIT ONE <U000024>DOLLAR SIGN <U01D7D9>MATHEMATICAL DOUBLE-STRUCK DIGIT ONE puts "abracadabra".u.gsub(/(b)/.U, a) abbracadabbra \ 3. One can create URegexp using global Kernel#ure function, Regexp#U, Regexp#to_u, or from UString using URegexp.new, e.g: /pattern/.U =~ "string".u 4. There are differences about Regexp and URegexp multiline matching options: t = "text\ntest" # ^,$ handling : URegexp multiline <-> Ruby default t.u =~ ure('^\w+$', URegexp::MULTILINE) => #<UMatch:0xf6f7de04 @ranges=[0..3], @cg=[\u0074\u0065\u0078\u0074]> t =~ /^\w+$/ => 0 # . matches \n : URegexp DOTALL <-> /m t.u =~ ure('.+test', URegexp::DOTALL) \ => #<UMatch:0xf6fa4d88 ... t.u =~ /.+test/m 5. UMatch.range(idx) returns range for capturing group idx. This range is in codeunits. === References 1. ICU Official Homepage http://ibm.com/software/globalization/icu/ 2. ICU downloads \ http://ibm.com/software/globalization/icu/downloads.jsp 3. ICU Home Page http://icu.sf.net 4. Unicode Home Page http://www.unicode.org ==== BUGS, DOCS, TO DO The code is slow and inefficient yet, is still highly experimental, so can have many security and memory leaks, bugs, inconsistent documentation, incomplete test suite. Use it at your own risk. Bug reports and feature requests are welcome :) === Copying This extension module is copyrighted free software by Nikolai Lugovoi. You can redistribute it and/or modify it under the terms of MIT License. Nikolai Lugovoi <meadow.nnick@gmail.com>
Diff and patch tables
Assigns a case-insensitive unique three-letter code to each record in a scope, based loosely on some other attribute of the record
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.