BETAmodules.com is in beta — open to partnerships & joint ventures.Build with us

Home Search Compare Equivalents

One search box and one honest, consistent read on every open-source library — across every ecosystem.

npmPyPIcrates.ioRubyGemsGoMavenNuGet

Discover

Tools

Compare Equivalents

Data

deps.dev OSV advisories npm registry PyPI

About

Methodology Partner with us

© 2026 Modules · A precision instrument for picking dependencies.Data refreshed continuously from public registries, deps.dev & OSV

cross-ecosystem search · live

Results for language-tokenizer

Found in 4 of 7 ecosystemsnpm 1–24 of 87,510 · 29 matches across other registries

npm87510 crates.io1 RubyGems12 NuGet16

How we search: free-text on npm, crates.io, RubyGems, NuGet and Maven. PyPI and Go do exact-name lookup only. Tip: click an ecosystem chip below to filter; click Show all ecosystems to come back.

Sort

Auto-load on scroll

npm matches

Showing 24 of 87,510 · JavaScript

See all npm →

@strav/viewv1.0.3

Strav view engine — .strav template language. Tokenizer + compiler + ViewEngine, Vue 3 hydration islands + buildIslands, pages auto-router, console commands, disk cache, asset versioning.

MaintenanceHealthy

PopularityUnknown

Maintained. Maintained, actively maintained.

@bizone-ai/monaco-json-transformv1.16.2

JSON Transform language tokenizer (and syntax highlight), hover provider and more

MaintenanceHealthy

PopularityUnknown

Maintained. Maintained, actively maintained.

@jscpd/tokenizerv4.2.4

tokenizer of source code for jscpd

MaintenanceHealthy

PopularityUnknown

Maintained. Maintained, actively maintained.

@nlighten/monaco-json-transformv1.9.0

JSON Transform language tokenizer (and syntax highlight), hover provider and more

MaintenanceAging

PopularityUnknown

Aging — last published over a year ago — check before adopting.

@csstools/css-tokenizerv4.0.0

Tokenize CSS

MaintenanceHealthy

PopularityUnknown

Maintained. Maintained, actively maintained.

A promise based streaming tokenizer

MaintenanceHealthy

PopularityUnknown

Maintained. Maintained, actively maintained.

@tokenizer/inflatev0.4.1

Tokenized zip support

MaintenanceAging

PopularityUnknown

Aging — last published 6 months ago — check before adopting.

@tokenizer/tokenv0.3.0

TypeScript definition for strtok3 token

MaintenanceAbandoned

PopularityUnknown

Abandoned. Last published 4 years ago.

@csstools/css-parser-algorithmsv4.0.0

Algorithms to help you parse CSS from an array of tokens.

MaintenanceHealthy

PopularityUnknown

Maintained. Maintained, actively maintained.

Fast token estimation at 96% accuracy of a full tokenizer in a 2kB bundle

MaintenanceHealthy

PopularityUnknown

Maintained. Maintained, actively maintained.

css-selector-tokenizerv0.8.0

Parses and stringifies CSS selectors

MaintenanceAbandoned

PopularityUnknown

Abandoned. Last published 5 years ago.

gpt-tokenizerv3.4.0

A pure JavaScript implementation of a BPE tokenizer (Encoder/Decoder) for GPT-2 / GPT-3 / GPT-4 and other OpenAI models

MaintenanceAging

PopularityUnknown

Aging — last published 7 months ago — check before adopting.

@csstools/css-calcv3.2.1

Solve CSS math expressions

MaintenanceHealthy

PopularityUnknown

Maintained. Maintained, actively maintained.

scss-tokenizerv0.4.3

A tokenzier for Sass' SCSS syntax

MaintenanceAbandoned

PopularityUnknown

Abandoned. Last published 3 years ago.

simple-html-tokenizerv0.5.11

Simple HTML Tokenizer is a lightweight JavaScript library that can be used to tokenize the kind of HTML normally found in templates.

MaintenanceAbandoned

PopularityUnknown

Abandoned. Last published 5 years ago.

prosemirror-markdownv1.13.4

ProseMirror Markdown integration

MaintenanceHealthy

PopularityUnknown

Maintained. Maintained, actively maintained.

wink-tokenizerv5.3.0

Multilingual tokenizer that automatically tags each token with its type

MaintenanceAbandoned

PopularityUnknown

Abandoned. Last published 4 years ago.

token-typesv6.1.2

Common token types for decoding and encoding numeric and string values

MaintenanceHealthy

PopularityUnknown

Maintained. Maintained, actively maintained.

@anthropic-ai/tokenizerv0.0.4

Claude tokenizer

MaintenanceAbandoned

PopularityUnknown

Abandoned. Last published 2 years ago.

glsl-tokenizerv2.1.5

r/w stream of glsl tokens

MaintenanceAbandoned

PopularityUnknown

Abandoned. Last published 7 years ago.

Tokenizes a string that represents a regular expression.

MaintenanceAbandoned

PopularityUnknown

Abandoned. Last published 3 years ago.

ai-tokenizerv1.0.6

A faster than tiktoken tokenizer with first-class support for Vercel's AI SDK.

MaintenanceAging

PopularityUnknown

Aging — last published 6 months ago — check before adopting.

args-tokenizerv0.3.0

Tokenize a shell string into argv array

MaintenanceAging

PopularityUnknown

Aging — last published over a year ago — check before adopting.

@csstools/media-query-list-parserv5.0.0

Parse CSS media query lists.

MaintenanceHealthy

PopularityUnknown

Maintained. Maintained, actively maintained.

1 2 3 4 5…3647

crates.io matches

1 match · Rust

language-tokenizerv0.3.0

Text tokenizer for linguistic purposes, such as text matching. Supports more than 40 languages, including English, French, Russian, Japanese, Thai etc.

MaintenanceHealthy

PopularityNiche

Maintained. Niche but maintained, actively maintained.

RubyGems matches

Exact match · Ruby

Tools for processing polish language. Tokenization, scanning, categorization...

MaintenanceAbandoned

PopularityNiche

Abandoned. Last published 14 years ago.

pratt_parserv0.1.2

A Pratt parser. Create token objects to define your language. Create a lexer to return tokens. Call the parser to grok the language.

MaintenanceAbandoned

PopularityNiche

Abandoned. Last published 11 years ago.

tokenizerv0.3.0

A simple multilingual tokenizer for NLP tasks. This tool provides a CLI and a library for linguistic tokenization which is an anavoidable step for many HLT (human language technology) tasks in the preprocessing phase for further syntactic, semantic and other higher level processing goals. Use it for tokenization of German, English and French texts.

MaintenanceAbandoned

PopularityNiche

Abandoned. Last published 10 years ago.

stanford-core-nlpv0.5.3

High-level Ruby bindings to the Stanford CoreNLP package, a set natural language processing tools that provides tokenization, part-of-speech tagging and parsing for several languages, as well as named entity recognition and coreference resolution for English, German, French and other languages.

MaintenanceAbandoned

PopularityNiche

Abandoned. Last published 9 years ago.

thailang4rv0.1.0

Thai language tools for Ruby, i.e. a word tokenizer, a character level indentifier, and a romanization tool

MaintenanceAbandoned

PopularityNiche

Abandoned. Last published 5 years ago.

toon-rubyv0.1.1

TOON is a compact, human-readable format designed for passing structured data to Large Language Models with significantly reduced token usage.

MaintenanceAging

PopularityNiche

Aging — last published 7 months ago — check before adopting.

Source code lexer configurable for any programming language that allows to tokenize and abstract a given source file

MaintenanceAbandoned

PopularityNiche

Abandoned. Last published 3 years ago.

Textoken is a Ruby library for text tokenization. This gem extracts words from text with many customizations. It can be used in many fields like Web Crawling and Natural Language Processing.

MaintenanceAbandoned

PopularityNiche

Abandoned. Last published 7 years ago.

auth0_rs256_jwt_verifierv0.0.2

Auth0 (https://auth0.com) is web service handling users identities which can be easily plugged into your application. It provides SDKs for many languages which enable you to sign up/in users and returns access token (JWT) in exchange. Access token can be used then to access your's Web Service. This gem helps you to verify (https://auth0.com/docs/api-auth/tutorials/verify-access-token#verify-the-signature) such access token which has been signed using the RS256 algorithm.

MaintenanceAbandoned

PopularityNiche

Abandoned. Last published 7 years ago.

sorbet-bamlv0.5.1

A Ruby gem that converts T::Struct and T::Enum to BAML (Boundary AI Markup Language) type definitions. BAML uses 60% fewer tokens than JSON Schema while maintaining type safety.

MaintenanceHealthy

PopularityNiche

Maintained. Niche but maintained, actively maintained.

TokenizerProjectUTv0.0.1

A simple multilingual tokenizer for NLP tasks. This tool provides a CLI and a library for linguistic tokenization which is an anavoidable step for many HLT (human language technology) tasks in the preprocessing phase for further syntactic, semantic and other higher level processing goals. Use it for tokenization of German, English and French texts.

MaintenanceAbandoned

PopularityNiche

Abandoned. Last published 14 years ago.

tokenizer_project_uni-trier_j-vv0.0.1

A simple multilingual tokenizer for NLP tasks. This tool provides a CLI and a library for linguistic tokenization which is an anavoidable step for many HLT (human language technology) tasks in the preprocessing phase for further syntactic, semantic and other higher level processing goals. Use it for tokenization of German, English and French texts.

MaintenanceAbandoned

PopularityNiche

Abandoned. Last published 14 years ago.

NuGet matches

Showing 12 of 16 · .NET

See all NuGet →

microsoft.deepdev.tokenizerlibv1.3.3

No description provided.

MaintenanceAbandoned

PopularityUnknown

Abandoned. Last published 2 years ago.

fastberttokenizerv1.0.28

No description provided.

MaintenanceAbandoned

PopularityUnknown

Abandoned. Last published 2 years ago.

thaistringtokenizerv1.0.1

No description provided.

MaintenanceAbandoned

PopularityUnknown

Abandoned. Last published 5 years ago.

sentencepiecetokenizerv0.1.6

No description provided.

MaintenanceHealthy

PopularityUnknown

Maintained. Maintained, actively maintained.

opennlp.netv1.9.4.1

No description provided.

MaintenanceAbandoned

PopularityUnknown

Abandoned. Last published 3 years ago.

nreco.nlqueryv1.2.1

No description provided.

MaintenanceAbandoned

PopularityUnknown

Abandoned. Last published over a year ago.

zemberekdotnet.tokenizationv0.19.5

No description provided.

MaintenanceHealthy

PopularityUnknown

Maintained. Maintained, actively maintained.

virastyarv2.0.0

No description provided.

MaintenanceAbandoned

PopularityUnknown

Abandoned. Last published 14 years ago.

virastyar.libv2.0.0

No description provided.

MaintenanceAbandoned

PopularityUnknown

Abandoned. Last published 14 years ago.

No description provided.

MaintenanceAbandoned

PopularityUnknown

Abandoned. Last published 5 years ago.

virastyar.datav2.0.0

No description provided.

MaintenanceAbandoned

PopularityUnknown

Abandoned. Last published 14 years ago.

sharpparser.corev1.3.0

No description provided.

MaintenanceAging

PopularityUnknown

Aging — last published 8 months ago — check before adopting.