Microdata to json and json-ld parser
Microdata parser. Extract and parse microdata from any website
A microdata rdf-parse-html actor
A fast and lightweight streaming Microdata to RDF parser
Microdata to json and json-ld parser
A JSON-LD Processor and API implementation in JavaScript.
Parses RDF from any serialization
Parse HTML character references
Small footprint URL parser that works seamlessly across Node.js and browser environments
JSON.parse with context information on error
Node.js path.parse() ponyfill
JSON.parse with context information on error
utility library for parsing asn1 files for use with browserify-sign.
JavaScript parser and stringifier for YAML
Parse HTTP Content-Type header according to RFC 7231
hast utility to create an element from a simple CSS selector
Parse the Forwarded header (RFC 7239) into an array of objects
CSV parsing implementing the Node.js `stream.Transform` API
quote and parse shell commands
Parse JSON with more helpful errors
An Esprima-compatible JavaScript parser built on Acorn
Parse a passwd file into a list of users.
Parse milliseconds into an object
JSON parse with prototype poisoning protection
This gem removes the surplus “clutter” (boilerplate, templates) around the main textual content of a web page (pure Ruby implementation). BoilerpipeArticle can be also used to parse (open graph) meta data and microdata. Check GitHub for usage examples.
Scrapetor is a Ruby HTML parsing + scraping toolkit. The parser is a native C arena DOM with structural indexes built at parse time and NEON SIMD scanners in the SAX hot loop. A streaming extraction engine compiles the schema DSL into a single forward pass — no DOM materialised, one Ruby boundary crossing per document. On builds where libcurl is available, Scrapetor::Fetcher adds an HTTP/2-capable fetch layer with per-thread connection cache, shared DNS + TLS session pool, in-process gzip / deflate / brotli / zstd decoding, iconv charset transcoding, retry + exponential backoff, ETag / Last-Modified disk cache with bulk revalidation, per-host throttle, cookie jar, basic + bearer auth, proxy, and three bulk concurrency models (parallel_fetch / multi_fetch / streaming multi_each). Scrapetor::Session ties the cookie / auth / throttle / retry policies together. Also ships robots.txt + sitemap.xml parsers, a bounded-memory streaming HTML parser, and structured-data extractors (JSON-LD, OpenGraph, Schema.org, Microdata, RDFa, Twitter Cards). The Net::HTTP-based Scrapetor.fetch is preserved as the no-libcurl fallback.