Robust web spider for Node.js
Next.js robots.tsx generator - Automatically create and serve robots.txt for Next.js applications
MCP server for Firecrawl web scraping integration. Supports both cloud and self-hosted instances. Features include web scraping, search, batch processing, structured data extraction, and LLM-powered content analysis.
Web scraping application with HTTP API (incl. Dashboard), CLI, and MCP Server interfaces.
Gracefully handle timeout and network error with auto retry.
Walsh-Research/1.2 spec-compliant fetcher — reference implementation in TypeScript
Clado Model Context Protocol Server
Content protection engine — obfuscate text, trap scrapers, defend against AI bots
Hyperbrowser Model Context Protocol Server
A sophisticated website comparison tool with intelligent content analysis and offset-aware difference detection
Enhanced request library with some specific cases like win1251 encoding and encoding support.
Scrape public available jobs on Linkedin using headless browser
Boilerpipe Modification for Node.js
web-spider is a simple and fast web spider written with Nodejs!
Run headless Chrome (aka Puppeteer) as a service, for web crawling, remote controlling and so on.
`form-reader` is a part of a bigger project aiming to support Unofficial Facebook Graph. It allows you to submit your custom data to every facebook form.
Spider for Commandline and Programming API
Create a XML sitemap for your website.
A clean and powerful Twitter/X scraping library with CycleTLS support, proxy configuration, and full TypeScript type definitions | 简洁的 Twitter/X 爬虫库,支持 CycleTLS 和代理,提供完整的 TypeScript 类型定义
Automated llms.txt generator SDK + CLI
TESTPAL v7.1 - AI-Powered Testing Agent with Hardened Security, Git History Verification, Multi-Factor Confidence, 30+ Secret Patterns (HuggingFace, Anthropic, AWS, etc.), Framework-Aware Analysis, 95%+ accuracy. Created by Akash S
This module will track licenses of bower packages user in the application and list them as json.
A tool for getting public website content using a browser engine or http get.
The web data layer for AI agents — fetch, search, crawl, extract, screenshot, and monitor the web with 55+ domain extractors and MCP.