Determines the MIME type of a file using the Apache Tika mime database.
Extractous provides a fast and efficient way to extract content from all kind of file formats including PDF, Word, Excel CSV, Email etc... Internally it uses a natively compiled Apache Tika for formats are not supported natively by the Rust core
The fastest Office document processing library — DOCX, XLSX, PPTX, DOC, XLS, PPT
A Rust toolkit for detecting and extracting metadata, text, and content from various file formats
Core library for semantic document graph processing — parse PDFs into structured, queryable graphs
Four formats, one engine. PDF, DOCX, XLSX, HTML → Markdown and typed JSON. 15–40× faster than equivalent-quality OSS tools, with pipeline pre-flight and element-level provenance.
Fast, zero-dependency Rust CLI for scanning Scala/Play/SBT projects for vulnerable, outdated, and unused dependencies
Sinatra based service around Apache Tika content extraction project
Wrapper around the tika-app jar
Ruby Tika app bindings
Provides ruby wrapper around tika command line tools
Tika service wrapper
rTika is a JRuby wrapper around the Apache Tika content extraction library
JRUBY Tika connector to read text and metadata from files using Apache Tika 1.1. Usage: Jrtika.read(full_file_path)
Ruby bindings for Apache Tika Server REST API
Wrapper around the tika-app jar
A simple Apache Tika binding for ruby using rjb.
Wrapper around the tika-app jar