BETAmodules.com is in beta — open to partnerships & joint ventures.Build with us

Home Search Compare Equivalents

One search box and one honest, consistent read on every open-source library — across every ecosystem.

npmPyPIcrates.ioRubyGemsGoMavenNuGet

Discover

Tools

Compare Equivalents

Data

deps.dev OSV advisories npm registry PyPI

About

Methodology Partner with us

© 2026 Modules · A precision instrument for picking dependencies.Data refreshed continuously from public registries, deps.dev & OSV

cross-ecosystem search · live

Results for url-crawler

Found in 6 of 7 ecosystemsnpm 1–24 of 169,977 · 19 matches across other registries

npm169977 PyPI1 crates.io2 RubyGems11 Maven1 NuGet4

How we search: free-text on npm, crates.io, RubyGems, NuGet and Maven. PyPI and Go do exact-name lookup only. Tip: click an ecosystem chip below to filter; click Show all ecosystems to come back.

Sort

Auto-load on scroll

npm matches

Showing 24 of 169,977 · JavaScript

See all npm →

url-crawlerv1.2.0

A library to crawl and extract cleaned HTML content from URLs.

MaintenanceHealthy

PopularityUnknown

Maintained. Maintained, actively maintained.

@headwall/url-crawlerv0.2.7

URL crawler for analysing web content

MaintenanceAbandoned

PopularityUnknown

Abandoned. Last published 2 years ago.

The fastest directory crawler & globbing alternative to glob, fast-glob, & tiny-glob. Crawls 1m files in < 1s

MaintenanceAging

PopularityUnknown

Aging — last published 10 months ago — check before adopting.

@tarzandb/source-crawlerv0.1.0

TarzanDB URL crawler core package

MaintenanceHealthy

PopularityUnknown

Maintained. Maintained, actively maintained.

crawler-user-agentsv1.50.0

This repository contains a list of of HTTP user-agents used by robots, crawlers, and spiders as in single JSON file.

MaintenanceHealthy

PopularityUnknown

Maintained. Maintained, actively maintained.

linkedomv0.18.12

A triple-linked lists based DOM implementation

MaintenanceAging

PopularityUnknown

Aging — last published 10 months ago — check before adopting.

@ckeditor/ckeditor5-dev-web-crawlerv56.1.0

Used to run a web crawler that checks for errors on specified pages.

MaintenanceHealthy

PopularityUnknown

Maintained. Maintained, actively maintained.

prerender-nodev3.8.3

express middleware for serving prerendered javascript-rendered pages for SEO

MaintenanceAging

PopularityUnknown

Aging — last published 10 months ago — check before adopting.

notion-md-crawlerv1.0.2

A library to recursively retrieve and serialize Notion pages with customization for machine learning applications.

MaintenanceAging

PopularityUnknown

Aging — last published 11 months ago — check before adopting.

thredds-catalog-crawlerv0.0.7

A module for crawling thredds catalogs

MaintenanceAbandoned

PopularityUnknown

Abandoned. Last published 2 years ago.

npm-license-crawlerv0.2.1

Analyzes license information for multiple node.js modules (package.json files) as part of your software project.

MaintenanceAbandoned

PopularityUnknown

Abandoned. Last published 7 years ago.

es6-crawler-detectv4.0.2

This is an ES6 adaptation of the original PHP library CrawlerDetect, this library will help you detect bots/crawlers/spiders vie the useragent.

MaintenanceAging

PopularityUnknown

Aging — last published over a year ago — check before adopting.

simplecrawlerv1.1.9

Very straightforward, event driven web crawler. Features a flexible queue interface and a basic cache mechanism with extensible backend.

MaintenanceAbandoned

PopularityUnknown

Abandoned. Last published 6 years ago.

node-network-devtoolsv1.0.30

Inspecting Node.js's Network with Chrome DevTools

MaintenanceHealthy

PopularityUnknown

Maintained. Maintained, actively maintained.

@upstash/search-crawlerv0.2.1

A CLI tool to crawl documentation sites and create a search index for Upstash Search.

MaintenanceAging

PopularityUnknown

Aging — last published 9 months ago — check before adopting.

@nuxtjs/devicev4.0.0

Device detection module for Nuxt

MaintenanceHealthy

PopularityUnknown

Maintained. Maintained, actively maintained.

Crawler is a ready-to-use web spider that works with proxies, asynchrony, rate limit, configurable request pools, jQuery, and HTTP/2 support.

MaintenanceHealthy

PopularityUnknown

Maintained. Maintained, actively maintained.

x-ray-crawlerv2.0.5

x-ray's crawler

MaintenanceAbandoned

PopularityUnknown

Abandoned. Last published 5 years ago.

whatwg-urlv16.0.1

An implementation of the WHATWG URL Standard's URL API and parsing machinery

MaintenanceHealthy

PopularityUnknown

Maintained. Maintained, actively maintained.

sitemap-generator-cliv7.5.0

Create xml sitemaps from the command line.

MaintenanceAbandoned

PopularityUnknown

Abandoned. Last published 6 years ago.

crawler-requestv1.2.2

HTTP request module customized for crawlers.

MaintenanceAbandoned

PopularityUnknown

Abandoned. Last published 8 years ago.

prember-crawlerv1.0.0

A web crawler that works with prember to discover URLs in your app

MaintenanceAbandoned

PopularityUnknown

Abandoned. Last published 7 years ago.

encodeurlv2.0.0

Encode a URL to a percent-encoded form, excluding already-encoded sequences

MaintenanceAbandoned

PopularityUnknown

Abandoned. Last published 2 years ago.

longcelot-seo-scannerv0.0.1

AST codebase scanner and URL crawler for longcelot-seo

MaintenanceHealthy

PopularityUnknown

Maintained. Maintained, actively maintained.

1 2 3 4 5…7083

PyPI matches

Exact match · Python

url-crawlerv1.0.0

A Python library to crawl the details of a URL.

MaintenanceAbandoned

PopularityNiche

Abandoned. Last published 4 years ago.

crates.io matches

2 matches · Rust

url-crawlerv0.3.1

Use crusty instead

MaintenanceAbandoned

PopularityNiche

Abandoned. Last published 4 years ago.

url-crawlv0.2.0

URL crawler for HTML code.

MaintenanceAbandoned

PopularityNiche

Abandoned. Last published 2 years ago.

RubyGems matches

11 matches · Ruby

spidermechv0.0.2

Does things

MaintenanceAbandoned

PopularityNiche

Abandoned. Last published 12 years ago.

Traverses all HTML files from given directory and checks links found in them.

MaintenanceAbandoned

PopularityNiche

Abandoned. Last published 3 years ago.

stupid_crawlerv0.2.1

Stupid crawler that looks for URLs on a given site. Result is saved as two CSV files one with found URLs and another with failed URLs.

MaintenanceAbandoned

PopularityNiche

Abandoned. Last published 8 years ago.

govuk_seed_crawlerv3.2.1

Retrieves a list of URLs to seed the crawler by publishing them to a RabbitMQ exchange.

MaintenanceAbandoned

PopularityNiche

Abandoned. Last published 3 years ago.

wayback_archiverv1.5.0

Post URLs to Wayback Machine (Internet Archive), using a crawler, from Sitemap(s) or a list of URLs.

MaintenanceAging

PopularityNiche

Aging — last published over a year ago — check before adopting.

validate-websitev1.12.0

validate-website is a web crawler for checking the markup validity with XML Schema / DTD and not found urls.

MaintenanceAbandoned

PopularityNiche

Abandoned. Last published 3 years ago.

iron-crawlerv1.2.1

A generic web crawler that doesn't crawl outside URLs.

MaintenanceAbandoned

PopularityNiche

Abandoned. Last published 10 years ago.

Arachnid is a web crawler that relies on Bloom Filters to efficiently store visited urls and Typhoeus to avoid the overhead of Mechanize when crawling every page on a domain.

MaintenanceAbandoned

PopularityNiche

Abandoned. Last published 12 years ago.

arachnidishv0.0.1

Arachnidish is a web crawler that relies on Bloom Filters to efficiently store visited urls and Typhoeus to avoid the overhead of Mechanize when crawling every page on a domain.

MaintenanceAbandoned

PopularityNiche

Abandoned. Last published 10 years ago.

livedoor-feeddiscoverv1.1.0

livedoor-feeddiscover performs feed autodiscovery using the livedoor Feed Discover API. livedoor Feed Discover API find a Atom/RSS feed(s) from the livedoor Reader crawler database. So, livedoor-feeddiscover do not access the target URL.

MaintenanceAbandoned

PopularityNiche

Abandoned. Last published 16 years ago.

medusa-crawlerv1.0.0

== Medusa: a ruby crawler framework {rdoc-image:https://badge.fury.io/rb/medusa-crawler.svg}[https://rubygems.org/gems/medusa-crawler] rdoc-image:https://github.com/brutuscat/medusa-crawler/workflows/Ruby/badge.svg?event=push Medusa is a framework for the ruby language to crawl and collect useful information about the pages it visits. It is versatile, allowing you to write your own specialized tasks quickly and easily. === Features * Choose the links to follow on each page with +focus_crawl+ * Multi-threaded design for high performance * Tracks +301+ HTTP redirects * Allows exclusion of URLs based on regular expressions * Records response time for each page * Obey _robots.txt_ directives (optional, but recommended) * In-memory or persistent storage of pages during crawl, provided by Moneta[https://github.com/moneta-rb/moneta] * Inherits OpenURI behavior (redirects, automatic charset and encoding detection, proxy configuration options). <b>Do you have an idea or a suggestion? {Open an issue and talk about it}[https://github.com/brutuscat/medusa-crawler/issues/new]</b> === Examples Medusa is versatile and to be used programatically, you can start with one or multiple URIs: require 'medusa' Medusa.crawl('https://www.example.com', depth_limit: 2) Or you can pass a block and it will yield the crawler back, to manage configuration or drive its crawling focus: require 'medusa' Medusa.crawl('https://www.example.com', depth_limit: 2) do |crawler| crawler.discard_page_bodies = some_flag # Persist all the pages state across crawl-runs. crawler.clear_on_startup = false crawler.storage = Medusa::Storage.Moneta(:Redis, 'redis://redis.host.name:6379/0') crawler.skip_links_like(/private/) crawler.on_pages_like(/public/) do |page| logger.debug "[public page] #{page.url} took #{page.response_time} found #{page.links.count}" end # Use an arbitrary logic, page by page, to continue customize the crawling. crawler.focus_crawl(/public/) do |page| page.links.first end end

MaintenanceAbandoned

PopularityNiche

Abandoned. Last published 5 years ago.

Maven matches

1 match · Java

com.github.fancyerii:url_crawlerv1.1.4

No description provided.

MaintenanceAbandoned

PopularityUnknown

Abandoned. Last published 8 years ago.

NuGet matches

4 matches · .NET

webreaper.cdpv11.3.0

No description provided.

MaintenanceHealthy

PopularityUnknown

Maintained. Maintained, actively maintained.

crawldnewsv1.0.0

No description provided.

MaintenanceAbandoned

PopularityUnknown

Abandoned. Last published 6 years ago.

umbraco.community.aivisibilityv1.1.2

No description provided.

MaintenanceHealthy

PopularityUnknown

Maintained. Maintained, actively maintained.

conceptnetdotnetv1.1.4

No description provided.

MaintenanceAbandoned

PopularityUnknown

Abandoned. Last published 8 years ago.