Results for crawled

Crawled - the smartest crawler ever made.

seenreqv3.0.0

Abandoned. Last published 7 years ago.

A library to test if a url(request) is crawled, usually used in a web crawler. Compatible with `request` and `node-crawler`

@kvalifik/relume-sitemapv0.1.1

Generate Relume-compatible sitemap clipboard payloads from XML sitemaps and crawled pages.

beautiful-domv1.0.9

Abandoned. Last published 5 years ago.

Beautiful-dom is a lightweight library that mirrors the capabilities of the HTML DOM API needed for parsing crawled HTML/XML pages. It models the methods and properties of HTML nodes that are relevant for extracting data from HTML nodes. It is written in

alexandriav0.1.7

Abandoned. Last published 11 years ago.

A storage interface to store crawled content in Elasticsearch

@mraichelson/sitemap-scrapperv1.0.2

Audit websites by comparing sitemap.xml entries against crawled page links

@aiacta-org/crawl-manifest-clientv1.0.14

Client for the AIACTA Crawl Manifest API. Query AI providers to see which of your pages were crawled, when, and for what purpose (AIACTA Proposal 1, §2.2)

grunt-url-image-crawlerv1.3.1

Abandoned. Last published 12 years ago.

Crawl your CSS/SCSS or HTML files for img URL's and store the crawled image URL's in a local JSON file.

spamguard.jsv4.4.1

Abandoned. Last published 3 years ago.

🤖 Protect your email address from being crawled by spam bots.

sitegazerv0.0.3

Abandoned. Last published 6 years ago.

SiteGazer crawls all of your pages and find errors from the crawled pages

@agentdoor/registryv0.1.0

Agent Registry - crawled directory of AgentDoor-enabled services

alpdonv1.0.1

Abandoned. Last published 7 years ago.

Alpdonloader is music batch downloader crawled from youtube

modcrawlerv0.1.2

Abandoned. Last published 10 years ago.

a crawler and or scraper based on jsdom, with dynamic module loader depending on crawled host

@stoplight/json-ref-resolverv3.1.6

Recursively resolve JSON pointers and remote authorities.

robots-txt-parserv2.0.3

Abandoned. Last published 4 years ago.

A lightweight robots.txt parser for Node.js with support for wildcards, caching and promises.

third-party-webv0.29.2

Categorized data on third party entities on the web.

@alloc/sitefetchv0.2.7

Fetch an entire site and save it as a text file

@mikaello/avrodoc-plusv1.5.0

Documentation tool for Avro schemas. Forked from https://github.com/leosilvadev/avrodoc-plus.

firewormv0.7.2

Abandoned. Last published 4 years ago.

A crawling file watcher.

unfurl.jsv6.4.0

Scraper for oEmbed, Twitter Cards and Open Graph metadata - fast and Promise-based

astro-robots-txtv1.0.0

Generate a robots.txt for Astro

cmu-pronouncing-dictionaryv3.0.0

Abandoned. Last published 4 years ago.

The 134,000+ words and their pronunciations in the CMU pronouncing dictionary

reffyv21.0.1

W3C/WHATWG spec dependencies exploration companion. Features a short set of tools to study spec references as well as WebIDL term definitions and references found in W3C specifications.

@scrapeless-ai/sdkv1.11.0