This is personal project for web crawling/scraping topics. It includes few ways to crawl the data mainly using [Node.js](https://nodejs.org/en/) such as:
The fastest directory crawler & globbing alternative to glob, fast-glob, & tiny-glob. Crawls 1m files in < 1s
Used to run a web crawler that checks for errors on specified pages.
Very straightforward, event driven web crawler. Features a flexible queue interface and a basic cache mechanism with extensible backend.
Inspecting Node.js's Network with Chrome DevTools
A triple-linked lists based DOM implementation
A mutex for guarding async workflows
This repository contains a list of of HTTP user-agents used by robots, crawlers, and spiders as in single JSON file.
Crawler is a ready-to-use web spider that works with proxies, asynchrony, rate limit, configurable request pools, jQuery, and HTTP/2 support.
This is an ES6 adaptation of the original PHP library CrawlerDetect, this library will help you detect bots/crawlers/spiders vie the useragent.
Crawl and download Snap Lenses from *lens.snapchat.com* with ease.
A library to recursively retrieve and serialize Notion pages with customization for machine learning applications.
express middleware for serving prerendered javascript-rendered pages for SEO
Analyzes license information for multiple node.js modules (package.json files) as part of your software project.
Device detection module for Nuxt
MCP server for advanced web search using Tavily
A module for crawling thredds catalogs
Web crawler for Node.js
Crawl web as easy as possible
x-ray's crawler
A light weight JS library to check if a user agent is a web crawler.
web crawler
Web Streams, based on the WHATWG spec reference implementation
Array manipulation, ordering, searching, summarizing, etc.
Web crawler help you with parse and collect data from the web
Generic Web crawler with a DSL that parses event-related data from web pages
Wgit was primarily designed to crawl static HTML websites to index and search their content - providing the basis of any search engine; but Wgit is suitable for many application domains including: URL parsing, data mining and statistical analysis.
Generic Web crawler with a DSL that parses structured data from web pages
Crawler Guru provides all basic functionalities to extract data from web pages
The SimpleCrawler module is a library for crawling web sites. The crawler provides comprehensive data from the page crawled which can be used for page analysis, indexing, accessibility checks etc. Restrictions can be specified to limit crawling of binary files.
Easy to use DSL that helps scraping data from websites. Thanks to it, writing web crawlers would be very fast and intuitive. Traversing through html nodes and fetching all of the HTML attributes, would be possible. Just like in jQuery - you will find methods like parent, children, first, find, siblings etc. Furthermore, you are able to download images, web pages, and store all content in the database. Please visit my Github account for more details.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.