TypeScript implementation of the Parquet file format, based on parquet.js
TypeScript implementation of the Parquet file format, based on parquet.js
TypeScript implementation of the Parquet file format, based on parquet.js
Read and Write Parquet files from Node.js
HEDL to/from Apache Parquet conversion
This crate provides a complete implementation of the SQL-on-FHIR specification for Rust, enabling the transformation of FHIR resources into tabular data using declarative ViewDefinitions. It supports all major FHIR versions (R4, R4B, R5, R6) through a version-agnostic abstraction layer.
Elusion is a modern DataFrame / Data Engineering / Data Analysis library that combines the familiarity of DataFrame operations (like those in PySpark, Pandas, and Polars) with the power of SQL query building. It provides flexible query construction without enforcing strict operation ordering, enabling developers to write intuitive and maintainable data transformations.
datu - a data file utility
A simple command-line interface & Python API for parquet
a crate to convert parquet file(s) to an/a excel/csv file with constant memory in rust
The jq of Parquet. Inspect, transform, and operate on Parquet files from your terminal. S3, GCS, Azure support. CLI tool.
High-performance Rust library for moving structs to/from disk using Parquet format. Abstracts complex Arrow/Parquet usage while providing batch writing and parallel reading capabilities for maximum performance.
Parquet read/write for CityJSON 2.0 city models via cityjson-arrow
AES-GCM primitives, AAD construction, and key-retrieval trait for Parquet Modular Encryption support in ematix-parquet.
Hand-rolled Apache Thrift compact-protocol decoder and Parquet format types (FileMetaData, PageHeader, encodings) for ematix-parquet.
File handles, byte-range reads, and page-header iteration over Apache Parquet — the I/O layer that sits between ematix-parquet-format (pure decoding) and the page-body codecs.
Apache Parquet is a columnar storage format.
Parquet is a high-performance Parquet library for Ruby, written in Rust. It wraps the official Apache Rust implementation to provide fast, correct Parquet parsing.
Parquet output plugin is an Embulk plugin that loads records to Parquet read by any input plugins. Search the input plugins by "embulk-input" keyword.
Dumps records to S3 Parquet.
Loads records from Parquet files via Hadoop FileSystem.
Parquet output plugin is an Embulk plugin that loads records to Parquet read by any input plugins. Search the input plugins by "embulk-input" keyword.
Convert data to Parquet
High-throughput, resumable snapshots of MongoDB collections with partitioning, multi-threaded readers, and size-based sharded outputs.
Extrae datos transaccionales, los archiva en un Data Lake (S3/Local) en formato Parquet usando Hive Partitioning, y purga el origen de forma segura.
This gem provides bindings for DuckDB, which is an in-process SQL database optimized for analytical queries on structured data. It's lightweight, embeddable, and works directly with files like Parquet and CSV, making it popular for data analysis tasks.
Scalable Wisconsin Benchmark dataset generator for Arrow/Parquet.
To use Ruby for data processing widely, Apache Arrow support is important. We can do the followings with Apache Arrow: * Super fast large data interchange and processing * Reading/writing data in several famous formats such as CSV and Apache Parquet * Reading/writing partitioned large data on cloud storage such as Amazon S3 This talk describes the followings: * What is Apache Arrow * How to use Apache Arrow with Ruby * How to integrate with Ruby 3.0 features such as MemoryView and Ractor
No description provided.
No description provided.
No description provided.