scrape json from messy input streams
Scrape documentation frameworks to Mintlify docs
JavaScript SDK for Firecrawl API
Search from DuckDuckGo and use it's spice APIs.
JavaScript SDK for Firecrawl API
Promise queue with concurrency control
A Node.js scraper for humans.
The core scraping functionality of scrape-it.
Official Firecrawl nodes for n8n - scrape, crawl, map, search, and extract data from websites. Supports AI Agent tool usage.
Scrape Window Metadata
Headless CLI client for the Agent Client Protocol (ACP) — talk to coding agents from the command line
MCP server for Firecrawl — search, scrape, and interact with the web. Supports both cloud and self-hosted instances. Features include web search, scraping, page interaction, batch processing, and LLM-powered content analysis.
Scrape and parse search engine results using SerpApi.
Traverse JSON Schema passing each schema object to callback
Serper MCP Server supporting search and webpage scraping
A Convex component for scraping web pages using the Firecrawl API with durable caching and reactive queries.
Scrape From primbon.com
A lightning fast package to scrape YouTube search results. This was made for Discord Bots.
Parse JSON with more helpful errors
Another JSON Schema Validator
Allow parsing of the U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR in JS strings
A slim module for scraping Facebook event data in milliseconds.
Strip comments from JSON. Lets you use comments in your JSON files!
The library scraper for WhatsApp bot or Restfull API's
Gem to scrape a webpage and renders into JSON.
Given a Wikipedia article, generate a tree of linked articles from the summary of the first.
Scrape TravelMob locations and convert to json.
Scrape AirBNB data and convert to JSON format.
SourceMonitor is a mountable Rails 8 engine that ingests RSS, Atom, and JSON feeds, scrapes full article content, and surfaces Solid Queue powered dashboards for monitoring and remediation.
Strigil is a gem for easily scraping a Reddit user's comment history into a JSON file.
mtg-db is a Ruby gem containing data for all Magic: The Gathering cards and sets, in JSON format. The linked repository contains rake scripts to scrape new data and update the JSON files.
The BeerAdvocate gem contains several brittle, lightly-tested methods for turning a beer name into its Beer Advocate URL (by scraping Google), and turning a Beer Advocate beer page URL into a hash of attributes about the beer (by scraping Beer Advocate). The gem also installs a `beer` executable, which returns JSON-formatted beer information for the beer named on the command line.
Random Poetry Scraper is a command line gem which returns a configurable number of poems scraped from poemhunter.com. The gem allows you to consume the poems either through a JSON dump or through a command line "pleasure reading" interface. ~Most~ poems on poemhunter.com are in English, but not all are. If you plan to to use this gem to build a corpus of poetry, you should do additional language validation.
Mount SolidQueueWeb in any Rails app using Solid Queue to get a full-featured job dashboard: inspect jobs by status (ready, scheduled, running, blocked, failed), retry or discard failed jobs, reschedule or run scheduled jobs immediately, manage recurring tasks, filter by queue/priority/period, export to CSV, detect slow jobs, view queue depth sparklines, track job performance (p50/p95), and scrape a /metrics JSON endpoint for external monitoring — all without leaving your app.
Scrapetor is a Ruby HTML parsing + scraping toolkit. The parser is a native C arena DOM with structural indexes built at parse time and NEON SIMD scanners in the SAX hot loop. A streaming extraction engine compiles the schema DSL into a single forward pass — no DOM materialised, one Ruby boundary crossing per document. On builds where libcurl is available, Scrapetor::Fetcher adds an HTTP/2-capable fetch layer with per-thread connection cache, shared DNS + TLS session pool, in-process gzip / deflate / brotli / zstd decoding, iconv charset transcoding, retry + exponential backoff, ETag / Last-Modified disk cache with bulk revalidation, per-host throttle, cookie jar, basic + bearer auth, proxy, and three bulk concurrency models (parallel_fetch / multi_fetch / streaming multi_each). Scrapetor::Session ties the cookie / auth / throttle / retry policies together. Also ships robots.txt + sitemap.xml parsers, a bounded-memory streaming HTML parser, and structured-data extractors (JSON-LD, OpenGraph, Schema.org, Microdata, RDFa, Twitter Cards). The Net::HTTP-based Scrapetor.fetch is preserved as the no-libcurl fallback.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.