This is personal project for web crawling/scraping topics. It includes few ways to crawl the data mainly using [Node.js](https://nodejs.org/en/) such as:
The fastest directory crawler & globbing alternative to glob, fast-glob, & tiny-glob. Crawls 1m files in < 1s
Inspecting Node.js's Network with Chrome DevTools
Very straightforward, event driven web crawler. Features a flexible queue interface and a basic cache mechanism with extensible backend.
Used to run a web crawler that checks for errors on specified pages.
Get a stream as a string, Buffer, ArrayBuffer or array
A triple-linked lists based DOM implementation
Google APIs Authentication Client Library for Node.js
This repository contains a list of of HTTP user-agents used by robots, crawlers, and spiders as in single JSON file.
Web API compatible fetch implementation
Twilio SendGrid NodeJS API client
Web API compatible Blob implementation
Browser compatibility data provided by MDN Web Docs
Find broken links, missing images, etc in your HTML. Scurry around your site and find all those broken links.
A mutex for guarding async workflows
Twilio SendGrid NodeJS mail service
A module for crawling thredds catalogs
A library to recursively retrieve and serialize Notion pages with customization for machine learning applications.
Crawler is a ready-to-use web spider that works with proxies, asynchrony, rate limit, configurable request pools, jQuery, and HTTP/2 support.
Crawl and download Snap Lenses from *lens.snapchat.com* with ease.
This is an ES6 adaptation of the original PHP library CrawlerDetect, this library will help you detect bots/crawlers/spiders vie the useragent.
A library for obtaining browser versions with their maximum supported Baseline feature set and Widely Available status.
CSV parsing implementing the Node.js `stream.Transform` API
Analyzes license information for multiple node.js modules (package.json files) as part of your software project.