Text-to-Speech

Usage Instructions

Generate natural-sounding speech from text using state-of-the-art AI voices from OpenAI, Deepgram, ElevenLabs, Cartesia, Google Cloud, Azure, and PlayHT. Supports multiple voices, languages, and audio formats.

Tools

`tts_openai`

Input

Parameter	Type	Required	Description
`text`	string	Yes	The text content to convert to speech (e.g., "Hello, welcome to our service!")
`apiKey`	string	Yes	No description
`model`	string	No	OpenAI TTS model identifier (e.g., "tts-1", "tts-1-hd", "gpt-4o-mini-tts")
`voice`	string	No	OpenAI voice identifier (e.g., "alloy", "ash", "ballad", "coral", "echo", "sage", "shimmer")
`responseFormat`	string	No	No description
`speed`	number	No	Speech speed multiplier from 0.25 to 4.0 (e.g., 0.5 for slower, 1.0 for normal, 2.0 for faster)

Output

Parameter	Type	Description
`audioUrl`	string	URL to the generated audio file
`audioFile`	file	Generated audio file object
`duration`	number	Audio duration in seconds
`characterCount`	number	Number of characters processed
`format`	string	Audio format
`provider`	string	TTS provider used

`tts_deepgram`

Input

Parameter	Type	Required	Description
`text`	string	Yes	The text content to convert to speech (e.g., "Hello, welcome to our service!")
`apiKey`	string	Yes	No description
`model`	string	No	Deepgram model/voice identifier (e.g., "aura-asteria-en", "aura-luna-en", "aura-2-luna-en")
`voice`	string	No	Deepgram voice identifier, alternative to model param (e.g., "aura-asteria-en", "aura-orion-en")
`encoding`	string	No	No description
`sampleRate`	number	No	No description
`bitRate`	number	No	No description
`container`	string	No	No description

Output

Parameter	Type	Description
`audioUrl`	string	URL to the generated audio file
`audioFile`	file	Generated audio file object
`duration`	number	Audio duration in seconds
`characterCount`	number	Number of characters processed
`format`	string	Audio format
`provider`	string	TTS provider used

`tts_elevenlabs`

Input

Parameter	Type	Required	Description
`text`	string	Yes	The text content to convert to speech (e.g., "Hello, welcome to our service!")
`voiceId`	string	Yes	ElevenLabs voice identifier (e.g., "21m00Tcm4TlvDq8ikWAM", "AZnzlk1XvdvUeBnXmlld")
`apiKey`	string	Yes	No description
`modelId`	string	No	ElevenLabs model identifier (e.g., "eleven_turbo_v2_5", "eleven_flash_v2_5", "eleven_multilingual_v2")
`stability`	number	No	No description
`similarityBoost`	number	No	No description
`style`	number	No	No description
`useSpeakerBoost`	boolean	No	No description

Output

Parameter	Type	Description
`audioUrl`	string	URL to the generated audio file
`audioFile`	file	Generated audio file object
`duration`	number	Audio duration in seconds
`characterCount`	number	Number of characters processed
`format`	string	Audio format
`provider`	string	TTS provider used

`tts_cartesia`

Input

Parameter	Type	Required	Description
`text`	string	Yes	The text content to convert to speech (e.g., "Hello, welcome to our service!")
`apiKey`	string	Yes	No description
`modelId`	string	No	Cartesia model identifier (e.g., "sonic", "sonic-2", "sonic-3", "sonic-multilingual")
`voice`	string	No	Cartesia voice identifier or embedding (e.g., "a0e99841-438c-4a64-b679-ae501e7d6091")
`language`	string	No	Language code for speech synthesis (e.g., "en", "es", "fr", "de", "it", "pt")
`outputFormat`	json	No	No description
`speed`	number	No	No description
`emotion`	array	No	Emotion tags for Sonic-3 (e.g., ['positivity:high'])

Output

Parameter	Type	Description
`audioUrl`	string	URL to the generated audio file
`audioFile`	file	Generated audio file object
`duration`	number	Audio duration in seconds
`characterCount`	number	Number of characters processed
`format`	string	Audio format
`provider`	string	TTS provider used

`tts_google`

Input

Parameter	Type	Required	Description
`text`	string	Yes	The text content to convert to speech (e.g., "Hello, welcome to our service!")
`apiKey`	string	Yes	No description
`voiceId`	string	No	Google Cloud voice identifier (e.g., "en-US-Neural2-A", "en-US-Wavenet-D", "en-GB-Neural2-B")
`languageCode`	string	Yes	BCP-47 language code for speech synthesis (e.g., "en-US", "es-ES", "fr-FR", "de-DE")
`gender`	string	No	No description
`audioEncoding`	string	No	No description
`speakingRate`	number	No	Speaking rate multiplier from 0.25 to 2.0 (e.g., 0.5 for slower, 1.0 for normal, 1.5 for faster)
`pitch`	number	No	No description
`volumeGainDb`	number	No	No description
`sampleRateHertz`	number	No	No description
`effectsProfileId`	array	No	Effects profile (e.g., ['headphone-class-device'])

Output

Parameter	Type	Description
`audioUrl`	string	URL to the generated audio file
`audioFile`	file	Generated audio file object
`duration`	number	Audio duration in seconds
`characterCount`	number	Number of characters processed
`format`	string	Audio format
`provider`	string	TTS provider used

`tts_azure`

Input

Parameter	Type	Required	Description
`text`	string	Yes	The text content to convert to speech (e.g., "Hello, welcome to our service!")
`apiKey`	string	Yes	No description
`voiceId`	string	No	Azure voice identifier (e.g., "en-US-JennyNeural", "en-US-GuyNeural", "en-GB-SoniaNeural")
`region`	string	No	No description
`outputFormat`	string	No	No description
`rate`	string	No	No description
`pitch`	string	No	No description
`style`	string	No	No description
`styleDegree`	number	No	No description
`role`	string	No	No description

Output

Parameter	Type	Description
`audioUrl`	string	URL to the generated audio file
`audioFile`	file	Generated audio file object
`duration`	number	Audio duration in seconds
`characterCount`	number	Number of characters processed
`format`	string	Audio format
`provider`	string	TTS provider used

`tts_playht`

Input

Parameter	Type	Required	Description
`text`	string	Yes	The text content to convert to speech (e.g., "Hello, welcome to our service!")
`apiKey`	string	Yes	No description
`userId`	string	Yes	No description
`voice`	string	No	PlayHT voice identifier or manifest URL (e.g., "s3://voice-cloning-zero-shot/...")
`quality`	string	No	No description
`outputFormat`	string	No	No description
`speed`	number	No	Speech speed multiplier from 0.5 to 2.0 (e.g., 0.5 for slower, 1.0 for normal, 1.5 for faster)
`temperature`	number	No	No description
`voiceGuidance`	number	No	No description
`textGuidance`	number	No	No description
`sampleRate`	number	No	No description

Output

Parameter	Type	Description
`audioUrl`	string	URL to the generated audio file
`audioFile`	file	Generated audio file object
`duration`	number	Audio duration in seconds
`characterCount`	number	Number of characters processed
`format`	string	Audio format
`provider`	string	TTS provider used

Text-to-Speech

On this page