Usage Instructions
Transcribe audio and video files to text using leading AI providers. Supports multiple languages, timestamps, and speaker diarization.
Tools
stt_whisper
Input
| Parameter | Type | Required | Description |
|---|---|---|---|
provider | string | Yes | No description |
apiKey | string | Yes | No description |
model | string | No | No description |
audioFile | file | No | No description |
audioFileReference | file | No | No description |
audioUrl | string | No | No description |
language | string | No | Language code (e.g., "en", "es", "fr") or "auto" for auto-detection |
timestamps | string | No | No description |
translateToEnglish | boolean | No | No description |
prompt | string | No | Optional text to guide the model's style or continue a previous audio segment. Helps with proper nouns and context. |
temperature | number | No | Sampling temperature between 0 and 1. Higher values make output more random, lower values more focused and deterministic. |
responseFormat | string | No | Output format for the transcription (e.g., "json", "text", "srt", "verbose_json", "vtt") |
Output
This tool does not produce any outputs.
stt_deepgram
Input
| Parameter | Type | Required | Description |
|---|---|---|---|
provider | string | Yes | No description |
apiKey | string | Yes | No description |
model | string | No | No description |
audioFile | file | No | No description |
audioFileReference | file | No | No description |
audioUrl | string | No | No description |
language | string | No | Language code (e.g., "en", "es", "fr") or "auto" for auto-detection |
timestamps | string | No | No description |
diarization | boolean | No | No description |
Output
This tool does not produce any outputs.
stt_elevenlabs
Input
| Parameter | Type | Required | Description |
|---|---|---|---|
provider | string | Yes | No description |
apiKey | string | Yes | No description |
model | string | No | No description |
audioFile | file | No | No description |
audioFileReference | file | No | No description |
audioUrl | string | No | No description |
language | string | No | Language code (e.g., "en", "es", "fr") or "auto" for auto-detection |
timestamps | string | No | No description |
Output
This tool does not produce any outputs.
stt_assemblyai
Input
| Parameter | Type | Required | Description |
|---|---|---|---|
provider | string | Yes | No description |
apiKey | string | Yes | No description |
model | string | No | No description |
audioFile | file | No | No description |
audioFileReference | file | No | No description |
audioUrl | string | No | No description |
language | string | No | Language code (e.g., "en", "es", "fr") or "auto" for auto-detection |
timestamps | string | No | No description |
diarization | boolean | No | No description |
sentiment | boolean | No | No description |
entityDetection | boolean | No | No description |
piiRedaction | boolean | No | No description |
summarization | boolean | No | No description |
Output
This tool does not produce any outputs.
stt_gemini
Input
| Parameter | Type | Required | Description |
|---|---|---|---|
provider | string | Yes | No description |
apiKey | string | Yes | No description |
model | string | No | No description |
audioFile | file | No | No description |
audioFileReference | file | No | No description |
audioUrl | string | No | No description |
language | string | No | Language code (e.g., "en", "es", "fr") or "auto" for auto-detection |
timestamps | string | No | No description |
Output
This tool does not produce any outputs.

