CLI tool for crawling web pages and extracting links
The fastest directory crawler & globbing alternative to glob, fast-glob, & tiny-glob. Crawls 1m files in < 1s
A triple-linked lists based DOM implementation
This repository contains a list of of HTTP user-agents used by robots, crawlers, and spiders as in single JSON file.
A library to recursively retrieve and serialize Notion pages with customization for machine learning applications.
This is an ES6 adaptation of the original PHP library CrawlerDetect, this library will help you detect bots/crawlers/spiders vie the useragent.
Analyzes license information for multiple node.js modules (package.json files) as part of your software project.
Used to run a web crawler that checks for errors on specified pages.
Very straightforward, event driven web crawler. Features a flexible queue interface and a basic cache mechanism with extensible backend.
Inspecting Node.js's Network with Chrome DevTools
Device detection module for Nuxt
A CLI tool to crawl documentation sites and create a search index for Upstash Search.
Crawler is a ready-to-use web spider that works with proxies, asynchrony, rate limit, configurable request pools, jQuery, and HTTP/2 support.
AI-native dual-graph knowledge representation — build, query, and persist typed knowledge graphs in-process.
Create xml sitemaps from the command line.
HTTP request module customized for crawlers.
express middleware for serving prerendered javascript-rendered pages for SEO
A manifest of Apify actor templates.
A web crawler that works with prember to discover URLs in your app
A CLI for accessibility testing using axe-core
Names of each food in local languages, including scientific name.
Curated, sourced list of AI crawler / training bot user agents, plus a small CLI to test whether a URL is reachable to each bot.
Crawl and download Snap Lenses from *lens.snapchat.com* with ease.
Find broken links, missing images, etc in your HTML. Scurry around your site and find all those broken links.