TTS Multi-Mode Interface

Input text along with prosodic features to generate speech audio. Use the example buttons below to load predefined test cases.

Text to Synthesize

Prosodic Features (JSON)

Generated Audio

Status

Input only text to generate both prosodic features and speech audio. The model will automatically generate appropriate features internally.

Text to Synthesize

Generated Audio

Generated Features

Status

Upload an audio file to extract transcribed text and word-level features. The system will perform speech recognition and feature extraction.

Upload Audio File

Extracted Features (JSON)

Status

Generate prosodic features from text and emotional/stylistic instructions using OpenRouter Gemini API.

⚠️ Note about Prompt Templates:

Template 1: Standard template for reliable feature generation
Template 2: Experimental template that may be more expressive but could generate additional words not in the original text

OpenRouter API Key

Text to Synthesize

Emotional/Stylistic Instruction

Generated Features (JSON)

Status

🎙️ TTS Multi-Mode Interface