xInfer — a high-performance LLM inference engine in Rust with CUDA/Metal acceleration
Unified cache adapters for notion-headless-cms — memory, Cloudflare (KV/R2), and Next.js
A simple TypeScript Keyv cache
vig-cache
Cloudflare KV StorageAdapter implementation for notion-headless-cms document cache layer
Automatic KV-Cache Optimization for HuggingFace Transformers - Find the optimal cache strategy, attention backend, and configuration for your model and hardware.
An integration to use Cloudflare Workers as a hosting service with Apollo Server v4
🔥Lightning-fast, globally distributed Apollo GraphQL server, deployed at the edge using Cloudflare Workers
kvcache
A key-value capability provider for wasmCloud that replicates data changes over NATS
Subquadratic O(N log N) sparse attention kernel for Rust LLM inference on edge devices, with optional FastGRNN salience gating for near-linear O(N) scaling
From-scratch LLM inference engine for Apple Silicon — 233 tok/sec, 85+ Metal GPU kernels
CUDA Virtual Memory Management bindings for elastic KV cache allocation in Candle
Transformer-as-rules: Self-attention and FFN layers as einsum expressions
LLM inference in Rust
Large Language Model architectures for the Axonml ML framework
Cognitora shared library: config, errors, hashing, prefix-trie
Cognitora gRPC stubs (tonic-generated)
Cognitora: rustls helpers and mTLS bootstrap
Neural network modules for ferrotorch — layers, losses, initialization
No description provided.
No description provided.