Talk to local llama.cpp server via chat completion API (plain text, per-user memory).
Extension of @node-llama-cpp/linux-x64-cuda - prebuilt binary for node-llama-cpp for Linux x64 with CUDA support
Prebuilt binary for node-llama-cpp for Linux x64
Prebuilt binary for node-llama-cpp for Linux x64 with CUDA support
Prebuilt binary for node-llama-cpp for Linux armv7l
Prebuilt binary for node-llama-cpp for Linux x64 with Vulkan support
Prebuilt binary for node-llama-cpp for Linux arm64
Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema on the model output on the generation level
Talk to local llama.cpp server via chat completion API (plain text, per-user memory).
Prebuilt binary for node-llama-cpp for Windows x64
Prebuilt binary for node-llama-cpp for macOS arm64 with Metal support
Extension of @node-llama-cpp/win-x64-cuda - prebuilt binary for node-llama-cpp for Windows x64 with CUDA support
Prebuilt binary for node-llama-cpp for Windows x64 with CUDA support
Prebuilt binary for node-llama-cpp for Windows arm64
Prebuilt binary for node-llama-cpp for Windows x64 with Vulkan support
Prebuilt binary for node-llama-cpp for macOS x64
A native Capacitor plugin that embeds llama.cpp directly into mobile apps, enabling offline AI inference with chat-first API design. Complete iOS and Android support: text generation, chat, multimodal, TTS, LoRA, embeddings, and more.
Prebuilt binary for node-llama-cpp for Linux armv7l
Extension of @realtimex/win-x64-cuda - prebuilt binary for node-llama-cpp for Windows x64 with CUDA support
Prebuilt binary for node-llama-cpp for Linux arm64
Prebuilt binary for node-llama-cpp for Windows arm64
Prebuilt binary for node-llama-cpp for Linux x64 with Vulkan support
Prebuilt binary for node-llama-cpp for Linux x64
Prebuilt binary for node-llama-cpp for Windows x64