Crawled - the smartest crawler ever made.
A library to test if a url(request) is crawled, usually used in a web crawler. Compatible with `request` and `node-crawler`
Generate Relume-compatible sitemap clipboard payloads from XML sitemaps and crawled pages.
Beautiful-dom is a lightweight library that mirrors the capabilities of the HTML DOM API needed for parsing crawled HTML/XML pages. It models the methods and properties of HTML nodes that are relevant for extracting data from HTML nodes. It is written in
A storage interface to store crawled content in Elasticsearch
Audit websites by comparing sitemap.xml entries against crawled page links
Client for the AIACTA Crawl Manifest API. Query AI providers to see which of your pages were crawled, when, and for what purpose (AIACTA Proposal 1, §2.2)
Crawl your CSS/SCSS or HTML files for img URL's and store the crawled image URL's in a local JSON file.
🤖 Protect your email address from being crawled by spam bots.
SiteGazer crawls all of your pages and find errors from the crawled pages
Agent Registry - crawled directory of AgentDoor-enabled services
Alpdonloader is music batch downloader crawled from youtube
a crawler and or scraper based on jsdom, with dynamic module loader depending on crawled host
Recursively resolve JSON pointers and remote authorities.
A lightweight robots.txt parser for Node.js with support for wildcards, caching and promises.
Categorized data on third party entities on the web.
Fetch an entire site and save it as a text file
Documentation tool for Avro schemas. Forked from https://github.com/leosilvadev/avrodoc-plus.
A crawling file watcher.
Scraper for oEmbed, Twitter Cards and Open Graph metadata - fast and Promise-based
Generate a robots.txt for Astro
The 134,000+ words and their pronunciations in the CMU pronouncing dictionary
W3C/WHATWG spec dependencies exploration companion. Features a short set of tools to study spec references as well as WebIDL term definitions and references found in W3C specifications.
Node SDK for Scrapeless AI
Rust SDK for Cloudflare Browser Rendering crawl jobs
CLI for Cloudflare Browser Rendering crawl jobs
Directory crawler for batch Markdown file processing
A rock-solid cryprocurrency crawler.
Fast, accurate code search for large Rush monorepos
Async BFS web crawler with rate limiting and robots.txt support for CRW
Rust SDK for Firecrawl API.
KODEGEN.ᴀɪ: Memory-efficient, Blazing-Fast, MCP tools for code generation agents.
A URL Crawler tool and library for crawling web targets, discovering links, and detecting secrets with configurable regex rules.
Discover and export documentation links from docs sites
Official Rust SDK for Firecrawl API v2.
Universal web scraper and code extractor CLI - crawl websites, analyze repositories, build knowledge graphs
Ruby based client for the ProxyCrawl API that helps developers crawl or scrape thousands of web pages anonymously
Crawls Twitter
Crawl websites
Crawling framework
Ruby utilities for web crawling.
Fassbinder crawls book offers on Amazon.
Easilly crawl a website
Crawls public LinkedIn profiles via Google
Crawls Indeed resumes
The SimpleCrawler module is a library for crawling web sites. The crawler provides comprehensive data from the page crawled which can be used for page analysis, indexing, accessibility checks etc. Restrictions can be specified to limit crawling of binary files.
Web crawling framework based on ActiveJob
Vessel is a high-level web crawling framework, used to crawl websites and extract structured data from their pages
No description provided.
No description provided.