You are viewing a potentially older version of this package. View all versions.
Tim_Shaw-VoiceBox-0.3.2 icon

VoiceBox

A Unified AI Services Framework

Date uploaded a week ago
Version 0.3.2
Download link Tim_Shaw-VoiceBox-0.3.2.zip
Downloads 2214
Dependency string Tim_Shaw-VoiceBox-0.3.2

This mod requires the following mods to function

dotnet_lethal_company-Newtonsoft_Json-13.0.400 icon
dotnet_lethal_company-Newtonsoft_Json

NuGet Newtonsoft.Json package re-bundled for convenient consumption and dependency management.

Preferred version: 13.0.400
Bobbie-NAudio-2.2.2 icon
Bobbie-NAudio

Audio and MIDI library for .NET

Preferred version: 2.2.2

README

VoiceBox: A Unified AI Services Framework for Unity

VoiceBox is a flexible and extensible framework for integrating various AI services into your Unity projects. It provides a unified interface for interacting with different AI APIs, allowing you to easily switch between services and add new ones.

Features

  • Unified Service Interfaces: Common interfaces for Chat, Speech-to-Text (STT), and Text-to-Speech (TTS) services.
  • Service Agnostic: Easily switch between different AI service providers without changing your core application logic.
  • Configuration via ScriptableObjects: Easily configure model selections, service endpoints, audio devices, and other settings in the Unity Editor.
  • Audio Streaming Support: Built-in support for streaming audio for TTS, reducing latency and improving user experience.
  • Extensible: Designed to be easily extended with new AI services and functionalities.
  • Modding Support: Built with game modding applications in mind, providing utilities to make mod creation just as seamless as inside the Unity editor.

Natively Supported Services

  • Chat

    • Google Gemini
    • ChatGPT
    • Anthropic
    • Ollama
    • Deepseek (via Ollama)
  • Speech to Text

    • Azure Speech services
    • Elevenlabs Scribe
    • Whisper via WhisperLive
  • Text to Speech

    • Elevenlabs

Docs

You can find the docs here: https://voicebox.timshaw.dev/

CHANGELOG

v0.3.2

STT

  • Update AzureSTTServiceManager to provide VoiceBoxResultReason.RecognizedSpeechWithTimestamps when requestWordLevelTimestamps = true.
  • Update AzureSTTServiceManager to save derived config.

TTS

  • Update ElevenlabsTTSServiceManager to hide event not used warnings.

v0.3.0

This update contains breaking changes!

TTS

  • Rework interface
    • Add voice cloning API call support
  • Implement more elevenlabs TTS api config options
  • Correctly set model ID in elevenlabs requests

STT

  • Add Elevenlabs STT support
  • Implement local VAD for STT to limit silent chunks sent to STT api
  • Add support for word-level timestamps

Misc

  • Resolve several issues with service cancellations
  • Rework audio decoder class to support wave audio
  • Rework audio streaming to stream through an audioclip, allowing for filters to be added to the output

v0.2.0

Initial Thunderstore release