Half-precision 16-bit floating point numbers
Super-Memory MCP server with local embeddings and vector search
High-quality text-to-speech for the web
Unified CLI dispatcher: schema-pop-import <file> picks tree-sitter (Rust) or system clang (C / C++) and writes an arktype scope.
A collection of utilities for working with GPUs, especially for WebGPU.
Browser offline background removal using WebGPU - powered by rembg.com technology
LLM entropy-aware token compression for prompts
React Native binding of whisper.cpp
Browser-side cellular segmentation via Cellpose-SAM, running on WebGPU.
Encode JavaScript values as canonical CBOR
Fastest SIMD-Accelerated Vector Similarity Functions for x86 and Arm
Compiles Typescript code into Build Engine CON
Node.js translation library powered by Transformers.js with ONNX Runtime
Runtime-agnostic translation core powered by Transformers.js
Run LLMs locally in your terminal. Supports custom .pt GPT-2 and LLaMA decoder checkpoints (auto-handles compact key naming, tied embeddings, and split-halves RoPE) plus GGUF models, with a live dashboard of throughput, VRAM, and chat.
CLI for discovering, inspecting, and running ComfyUI workflows with tag-based overrides
`@c4a/server-cli` — C4A 服务端管理工具,负责服务配置、安装、运维和数据管理。
// Half-precision floating-point format // bfloat16
Renderer-agnostic core SDK for Omote AI Characters
GGML image classification addon for QVAC (MobileNetV3-Small CPU inference)
Intelligent CLI tool with AI-powered model selection that analyzes your hardware and recommends optimal LLM models for your system
A node.js embedding tool with optional GPU acceleration
Run Hugging Face onnx-community models locally inside pi: registers a chat provider for ONNX text-generation models and a set of tools (embeddings, classification, ASR) backed by @huggingface/transformers and onnxruntime-node.
Flower Intelligence: Open-Source On-Device AI with optional Confidential Remote Compute.
A Rust library integrated with ONNXRuntime, providing a collection of ML models.
A3S Power — Privacy-preserving LLM inference for TEE environments
LLM serving runtime with Ruvector integration - Paged attention, KV cache, and SONA learning
Subquadratic O(N log N) sparse attention kernel for Rust LLM inference on edge devices, with optional FastGRNN salience gating for near-linear O(N) scaling
Arbitrary-precision floating point library
Bridge layer for OxiGAF: convert checkpoints between GAF, ToRSh, and PyTorch formats
Unified inference runtime for high-performance LLM execution
High-performance key-value cache for LLM inference
Advanced quantization engine for efficient LLM inference
High-performance LLM inference engine with advanced quantization and salience-based optimization
Salience analysis engine for intelligent token prioritization in LLM inference
Shared utilities and types for Zeta Reticula components
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.