๐ŸŽ™๏ธ TTS Multi-Mode Interface

This interface provides four different modes for text-to-speech and audio processing:

  • Mode 1: Text + Features โ†’ Audio (with predefined examples)
  • Mode 2: Text โ†’ Features + Audio
  • Mode 3: Audio โ†’ Text Features
  • Mode 4: Text + Instruction โ†’ Features (using OpenRouter Gemini)

Mode 1: Text + Features to Audio

Input text along with prosodic features to generate speech audio. Use the example buttons below to load predefined test cases.

๐Ÿ“‹ Predefined Examples


๐Ÿ“ Usage Notes:

  • Mode 1: Best for precise control over prosodic features
  • Mode 2: Best for quick text-to-speech with automatic feature generation
  • Mode 3: Best for analyzing existing audio files
  • Mode 4: Best for generating features with specific emotional characteristics

๐Ÿ”ง Technical Requirements:

  • CUDA-compatible GPU recommended for optimal performance
  • Sufficient GPU memory for model loading
  • Valid OpenRouter API key for Mode 4