Fast, unopinionated, minimalist web framework
MCP server for Fish Audio Text-to-Speech integration
MCP server for Fish Audio Text-to-Speech integration
A framework-agnostic document viewer SDK supporting PDF, DOCX, PPTX, Excel, CSV, OFD, images, video, audio, text, and markdown.
Audio & text player object, based on Weston Ruter's HTML5-Audio-Read-Along
Independent, unofficial CLI to create and edit CapCut projects — build drafts from scratch, add video/audio/text, subtitles, timing, speed, volume, templates, cut long-form to shorts. No API needed. Not affiliated with ByteDance.
A custom element for the Spotify player with an API that aims to match the `<audio>` API
Database to mime-format based on content-type header and content
Web Assembly streaming Opus decoder with Machine Learning enhancements
Async audio text-to-speach interface
Web Assembly streaming Ogg Vorbis decoder
node-edge-tts is a module that using Microsoft Edge's online TTS (Text-to-Speech) service on the Node.js
Web Assembly streaming FLAC decoder
MCP server for Fish Audio Text-to-Speech integration
Pure, deterministic transformation pipeline for audio, text, and image processing
A cross-browser wrapper for the Web Audio API which aims to closely follow the standard.
Decode audio data in node or browser
Javascript audio library for the modern web.
React wrapper around the @rive-app/canvas-lite library
n8n node for integrating Palatine Speech API into workflow
MCP server for Fish Audio Text-to-Speech integration
Web Assembly streaming Opus decoder
Official Lara SDK for JavaScript and Node.js
The AudioWorkletProcessor which is used by the recorder-audio-worklet package.
Command-line interface for the ElevenLabs API
Comprehensive async Rust SDK for ElevenLabs API with TTS, STT, voice management, and WebSocket streaming
Google Speech-to-Text enables developers to convert audio to text by applying powerful neural network models in an easy-to-use API. The API recognizes more than 120 languages and variants to support your global user base. You can enable voice command-and-control, transcribe audio from call centers, and more. It can process real-time streaming or prerecorded audio, using Google's machine learning technology. Note that google-cloud-speech-v1 is a version-specific client library. For most uses, we recommend installing the main client library google-cloud-speech instead. See the readme for more details.
Google Speech-to-Text enables developers to convert audio to text by applying powerful neural network models in an easy-to-use API. The API recognizes more than 120 languages and variants to support your global user base. You can enable voice command-and-control, transcribe audio from call centers, and more. It can process real-time streaming or prerecorded audio, using Google's machine learning technology. Note that google-cloud-speech-v1p1beta1 is a version-specific client library. For most uses, we recommend installing the main client library google-cloud-speech instead. See the readme for more details.
Google Speech-to-Text enables developers to convert audio to text by applying powerful neural network models in an easy-to-use API. The API recognizes more than 120 languages and variants to support your global user base. You can enable voice command-and-control, transcribe audio from call centers, and more. It can process real-time streaming or prerecorded audio, using Google's machine learning technology. Note that google-cloud-speech-v2 is a version-specific client library. For most uses, we recommend installing the main client library google-cloud-speech instead. See the readme for more details.
AWS polly wrapper, converts from text into speech audio.
Flexible text to speech using Google Translator
Text-to-Speech converts text or Speech Synthesis Markup Language (SSML) input into audio data of natural human speech.
Google Speech-to-Text enables developers to convert audio to text by applying powerful neural network models in an easy-to-use API. The API recognizes more than 120 languages and variants to support your global user base. You can enable voice command-and-control, transcribe audio from call centers, and more. It can process real-time streaming or prerecorded audio, using Google's machine learning technology.
Anki_auto_lookup scrapes Google Translate. Whether you type in an English or Chinese word, it will find the definition, pinyin, and a recording of the Chinese word being spoken. Anki_auto_lookup stores all this information in a CSV text database for portability before creating a text file that can be uploaded to your Anki decks. The Anki deck's notes will have the English word on one side and the Chinese characters, pinyin, and recording on the other.
This provides helpers that manage internationalized audio prompts, both file-based and text-based
Convert media like images, video, audio, and text that are in commonly used formats into other commonly used formats.
This gem has the intention to facilitate the creation of chatbots with Chatgpt, Telegram bot, Discord bot, Audio Transcription and IBM Cloud Text to Speech or AWS Polly in docker containers. Documentation: https://github.com/JesusGautamah/chatgpt_assistant
Using the Mac system text-to-speech Voice Chapters will create an audio file with bookmarked chapters. It takes a regex capture group to define the chapter markers and the gem will create a m4a/acc file.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.