Elementwise serialization for pointwise structured sorting
Numerical methods for web
A TypeScript ML framework with Rust native backends (CPU, CUDA, WebGPU) — autograd, tensors, and neural network training at GPU speed.
Multi Dimensional arrays for javascript.
High performance matrix manipulation in Javascript
Strided array special math functions.
Strided array math functions.
Apply a function to elements in two input arrays and assign the results to an output array.
Special math functions.
Create a function for applying a unary function to each element in a provided array.
Strided array math operations.
Apply a function to each element in an array and assign the result to an element in an output array.
Compute the absolute value for each element in an ndarray.
Base strided.
Pseudorandom number generator array creation function tools.
Compute the absolute value for each element in an input array.
Pseudorandom number generator array creation functions.
Math array function tools.
Special math functions applied to arrays.
Pseudorandom number generator strided array functions.
Pseudorandom number generator strided array function tools.
Apply a function to each element in an array and assign the result to an element in an output array, iterating from right to left.
Apply a function to elements in two input arrays while iterating from right to left and assign the results to an output array.
WGSL kernel catalog for browser-side machine learning: matmul, softmax, layernorm, attention, and friends. Each kernel ships with a JS reference for conformance + fallback. Independent of any tensor framework.
Elementwise operations implemented for standard Rust containers
Kernel fusion optimization for the Axonml ML framework
Pluggable GPU acceleration layer for RunMat (CUDA, ROCm, Metal, Vulkan/Spir-V)
A tensor library with GPU support
Idiomatic Rust wrappers for the NVIDIA CUDA stack (Driver API, Runtime API, NVRTC, cuBLAS, cuDNN, NCCL, NVML, ...). Umbrella crate.
GPU/Vulkan matrix and tensor operations for the mumu/lava language
A simple Rust ML library with GPU-accelerated gradient descent. Supports tensors, complex numbers, linear/logistic regression, and CUDA optimization.
MetalTile kernel standard library — benchmark metadata and type definitions
High-performance MATLAB/Octave syntax mathematical runtime
Compiled bespoke .cu kernel template instantiations for the baracuda ML kernel facade plus C-ABI FFI facades for the library-backed plans (cuDNN conv/pool, cuSOLVER linalg, cuFFT/cuRAND, CUTLASS GEMM re-export). Hosts curated CUDA kernel sources (int8/FP8/int4/bin GEMM RRR, elementwise, reduce, norm, attention, …), builds them via baracuda-forge, exposes extern "C" entry points for the safe baracuda-kernels crate. CUTLASS template kernels live in the sibling baracuda-cutlass-kernels-sys crate and are re-exported here under the unified baracuda_kernels_gemm_* namespace.
High-performance neural network inference library with instruction-based execution
Portable mixed-precision math, linear-algebra, & retrieval library with 2000+ SIMD kernels for x86, Arm, RISC-V, LoongArch, Power, & WebAssembly
NumRuby module: Central ufunc registry and core elementwise ops
RMatrix is a lightning fast library for Ruby. It provides numerous enhancements over the Matrix class in the standard library. Features include the ability to calculate Matrix inverse, transpose, determinant, minor, adjoint, cofactor_matrix, hadamard product and other elementwise operations, slicing, masking and more. RMatrix makes use of backing instances of NArray to allow for great performance.