VoiceBox

A Unified AI Services Framework

By Tim_Shaw

https://voicebox.timshaw.dev/

Date uploaded	2 weeks ago
Version	0.3.7
Download link	Tim_Shaw-VoiceBox-0.3.7.zip
Downloads	7015
Dependency string	Tim_Shaw-VoiceBox-0.3.7

This mod requires the following mods to function

dotnet_lethal_company-Newtonsoft_Json

NuGet Newtonsoft.Json package re-bundled for convenient consumption and dependency management.

Preferred version: 13.0.400

Bobbie-NAudio

Audio and MIDI library for .NET

Preferred version: 2.2.2

README

VoiceBox: A Unified AI Services Framework for Unity

VoiceBox is a flexible and extensible framework for integrating various AI services into your Unity projects. It provides a unified interface for interacting with different AI APIs, allowing you to easily switch between services and add new ones.

Features

Unified Service Interfaces: Common interfaces for Chat, Speech-to-Text (STT), and Text-to-Speech (TTS) services.
Service Agnostic: Easily switch between different AI service providers without changing your core application logic.
Configuration via ScriptableObjects: Easily configure model selections, service endpoints, audio devices, and other settings in the Unity Editor.
Audio Streaming Support: Built-in support for streaming audio for TTS, reducing latency and improving user experience.
Extensible: Designed to be easily extended with new AI services and functionalities.
Modding Support: Built with game modding applications in mind, providing utilities to make mod creation just as seamless as inside the Unity editor.

Natively Supported Services

Chat
- Google Gemini
- ChatGPT
- Anthropic
- Ollama
- Deepseek (via Ollama)
Speech to Text
- Azure Speech services
- Elevenlabs Scribe
- Whisper via WhisperLive
Text to Speech
- Elevenlabs

Docs

You can find the docs here: https://voicebox.timshaw.dev/

CHANGELOG

v0.3.7

Fix a websocket issue and remove verbose logging in Elevenlabs STT

v0.3.6

Fix an issue where Elevenlabs STT query parameters would serialize incorrectly in some locales (Thanks @xCore!)

v0.3.5

Fix an issue with NAudio not detecting devices with names over 31 characters long

v0.3.4

Fix a WebSocket error when attempting to reconnect to a TTS service

v0.3.2

STT

Update AzureSTTServiceManager to provide VoiceBoxResultReason.RecognizedSpeechWithTimestamps when requestWordLevelTimestamps = true.
Update AzureSTTServiceManager to save derived config.

TTS

Update ElevenlabsTTSServiceManager to hide event not used warnings.

v0.3.0

This update contains breaking changes!

TTS

Rework interface
- Add voice cloning API call support
Implement more elevenlabs TTS api config options
Correctly set model ID in elevenlabs requests

STT

Add Elevenlabs STT support
Implement local VAD for STT to limit silent chunks sent to STT api
Add support for word-level timestamps

Misc

Resolve several issues with service cancellations
Rework audio decoder class to support wave audio
Rework audio streaming to stream through an audioclip, allowing for filters to be added to the output

v0.2.0

Initial Thunderstore release

Get the Thunderstore App