A WebAssembly library for efficient text deduplication and similarity detection using shingling and MinHash.
MinHash and Shingles
Detect reusable/duplicate React Native code (components, hooks, styles, utils) and suggest refactors. Ships as a CLI + Node API.
Find structurally similar JS/TS functions, even after renaming/refactoring
Module which allows query strings to be placed into LSH batches, and compares candidates from a list of LSH candidates
Command Line tool that compares two text files using simhash
JSON-Hashify is a library for hashing JSON objects and arrays into compact signatures (sketches) that can be used to compare the similarity of JSON objects.
Data Leak Prevension for Node.js
<img src='https://cloud.githubusercontent.com/assets/1127259/11770151/744bfde4-a1ac-11e5-9122-341154b5e85a.png'>
a simple near duplicate detection by Shingling method
Shingles implementation in rust
rust min-shingle hashing
Algorithmic texture generator for Bevy.
Modified https://github.com/aws/random-cut-forest-by-aws
Pure Rust library for zoned block device management (SMR/ZNS)
A library for hash-based text similarity analysis
Character-oriented ngram generator and fuzzy matching library.
A port of the python-ngram project that provides fuzzy search using N-gram.
Random Cut Forest implementation in Rust
UCFP perceptual fingerprinting (text shingling, winnowing, MinHash) crate
Text fingerprinting: MinHash + LSH, SimHash, and ONNX semantic embeddings
Universal Content Fingerprinting (UCFP) core library
Shingling
Shingle
Powerful string similarity determination using the Shingles method.