Text-to-Speech
Convert text to speech using AI voices
Generate natural-sounding speech from text using state-of-the-art AI voices from OpenAI, Deepgram, ElevenLabs, Cartesia, Google Cloud, Azure, and PlayHT. Supports multiple voices, languages, and audio formats.
| Parameter | Type | Required | Description |
|---|
text | string | Yes | The text content to convert to speech (e.g., "Hello, welcome to our service!") |
apiKey | string | Yes | No description |
model | string | No | OpenAI TTS model identifier (e.g., "tts-1", "tts-1-hd", "gpt-4o-mini-tts") |
voice | string | No | OpenAI voice identifier (e.g., "alloy", "ash", "ballad", "coral", "echo", "sage", "shimmer") |
responseFormat | string | No | No description |
speed | number | No | Speech speed multiplier from 0.25 to 4.0 (e.g., 0.5 for slower, 1.0 for normal, 2.0 for faster) |
| Parameter | Type | Description |
|---|
audioUrl | string | URL to the generated audio file |
audioFile | file | Generated audio file object |
duration | number | Audio duration in seconds |
characterCount | number | Number of characters processed |
format | string | Audio format |
provider | string | TTS provider used |
| Parameter | Type | Required | Description |
|---|
text | string | Yes | The text content to convert to speech (e.g., "Hello, welcome to our service!") |
apiKey | string | Yes | No description |
model | string | No | Deepgram model/voice identifier (e.g., "aura-asteria-en", "aura-luna-en", "aura-2-luna-en") |
voice | string | No | Deepgram voice identifier, alternative to model param (e.g., "aura-asteria-en", "aura-orion-en") |
encoding | string | No | No description |
sampleRate | number | No | No description |
bitRate | number | No | No description |
container | string | No | No description |
| Parameter | Type | Description |
|---|
audioUrl | string | URL to the generated audio file |
audioFile | file | Generated audio file object |
duration | number | Audio duration in seconds |
characterCount | number | Number of characters processed |
format | string | Audio format |
provider | string | TTS provider used |
| Parameter | Type | Required | Description |
|---|
text | string | Yes | The text content to convert to speech (e.g., "Hello, welcome to our service!") |
voiceId | string | Yes | ElevenLabs voice identifier (e.g., "21m00Tcm4TlvDq8ikWAM", "AZnzlk1XvdvUeBnXmlld") |
apiKey | string | Yes | No description |
modelId | string | No | ElevenLabs model identifier (e.g., "eleven_turbo_v2_5", "eleven_flash_v2_5", "eleven_multilingual_v2") |
stability | number | No | No description |
similarityBoost | number | No | No description |
style | number | No | No description |
useSpeakerBoost | boolean | No | No description |
| Parameter | Type | Description |
|---|
audioUrl | string | URL to the generated audio file |
audioFile | file | Generated audio file object |
duration | number | Audio duration in seconds |
characterCount | number | Number of characters processed |
format | string | Audio format |
provider | string | TTS provider used |
| Parameter | Type | Required | Description |
|---|
text | string | Yes | The text content to convert to speech (e.g., "Hello, welcome to our service!") |
apiKey | string | Yes | No description |
modelId | string | No | Cartesia model identifier (e.g., "sonic", "sonic-2", "sonic-3", "sonic-multilingual") |
voice | string | No | Cartesia voice identifier or embedding (e.g., "a0e99841-438c-4a64-b679-ae501e7d6091") |
language | string | No | Language code for speech synthesis (e.g., "en", "es", "fr", "de", "it", "pt") |
outputFormat | json | No | No description |
speed | number | No | No description |
emotion | array | No | Emotion tags for Sonic-3 (e.g., ['positivity:high']) |
| Parameter | Type | Description |
|---|
audioUrl | string | URL to the generated audio file |
audioFile | file | Generated audio file object |
duration | number | Audio duration in seconds |
characterCount | number | Number of characters processed |
format | string | Audio format |
provider | string | TTS provider used |
| Parameter | Type | Required | Description |
|---|
text | string | Yes | The text content to convert to speech (e.g., "Hello, welcome to our service!") |
apiKey | string | Yes | No description |
voiceId | string | No | Google Cloud voice identifier (e.g., "en-US-Neural2-A", "en-US-Wavenet-D", "en-GB-Neural2-B") |
languageCode | string | Yes | BCP-47 language code for speech synthesis (e.g., "en-US", "es-ES", "fr-FR", "de-DE") |
gender | string | No | No description |
audioEncoding | string | No | No description |
speakingRate | number | No | Speaking rate multiplier from 0.25 to 2.0 (e.g., 0.5 for slower, 1.0 for normal, 1.5 for faster) |
pitch | number | No | No description |
volumeGainDb | number | No | No description |
sampleRateHertz | number | No | No description |
effectsProfileId | array | No | Effects profile (e.g., ['headphone-class-device']) |
| Parameter | Type | Description |
|---|
audioUrl | string | URL to the generated audio file |
audioFile | file | Generated audio file object |
duration | number | Audio duration in seconds |
characterCount | number | Number of characters processed |
format | string | Audio format |
provider | string | TTS provider used |
| Parameter | Type | Required | Description |
|---|
text | string | Yes | The text content to convert to speech (e.g., "Hello, welcome to our service!") |
apiKey | string | Yes | No description |
voiceId | string | No | Azure voice identifier (e.g., "en-US-JennyNeural", "en-US-GuyNeural", "en-GB-SoniaNeural") |
region | string | No | No description |
outputFormat | string | No | No description |
rate | string | No | No description |
pitch | string | No | No description |
style | string | No | No description |
styleDegree | number | No | No description |
role | string | No | No description |
| Parameter | Type | Description |
|---|
audioUrl | string | URL to the generated audio file |
audioFile | file | Generated audio file object |
duration | number | Audio duration in seconds |
characterCount | number | Number of characters processed |
format | string | Audio format |
provider | string | TTS provider used |
| Parameter | Type | Required | Description |
|---|
text | string | Yes | The text content to convert to speech (e.g., "Hello, welcome to our service!") |
apiKey | string | Yes | No description |
userId | string | Yes | No description |
voice | string | No | PlayHT voice identifier or manifest URL (e.g., "s3://voice-cloning-zero-shot/...") |
quality | string | No | No description |
outputFormat | string | No | No description |
speed | number | No | Speech speed multiplier from 0.5 to 2.0 (e.g., 0.5 for slower, 1.0 for normal, 1.5 for faster) |
temperature | number | No | No description |
voiceGuidance | number | No | No description |
textGuidance | number | No | No description |
sampleRate | number | No | No description |
| Parameter | Type | Description |
|---|
audioUrl | string | URL to the generated audio file |
audioFile | file | Generated audio file object |
duration | number | Audio duration in seconds |
characterCount | number | Number of characters processed |
format | string | Audio format |
provider | string | TTS provider used |