AI Voices - API Documentation
Full reference for all commands, modes, and webhook endpoints.
In all URLs below, replace XXX with your AI Apps Account ID, found in your account settings.
How Endpoints Work
Two ways to call each endpoint - read this first
App Command (API Call)
POST your request to
Mode / Webhook (Direct URL)
POST directly to the endpoint URL shown in each section. The fields go in the POST body as standard form fields (the same fields you would include in an HTML form POST). Use this method for provider webhooks, website forms, or any third-party system that sends form data. No API key is required - the account ID is part of the URL.
POST your request to
https://api.aiappsapi.com with your API key and account ID in the POST body. The fields described in each section below go inside the jsonData POST field as a JSON-encoded string. Use this method when your server is making the call and you have your API credentials available.Mode / Webhook (Direct URL)
POST directly to the endpoint URL shown in each section. The fields go in the POST body as standard form fields (the same fields you would include in an HTML form POST). Use this method for provider webhooks, website forms, or any third-party system that sends form data. No API key is required - the account ID is part of the URL.
Character presets: Most voice settings can be configured once in the admin panel as a named character and then referenced by ID in your API calls. This lets you manage voices centrally without hardcoding settings in every request. Pass the
charID field to use a character's saved settings, or override any individual field per request.
Get Audio (Text to Speech)
command: getaudio
Converts text into spoken audio using AWS Polly. Returns the audio file as a base64-encoded string ready for playback or storage. Optionally returns lip-sync tween data containing frame-level viseme and word timing for driving avatar mouth animations alongside the audio.
You can pass voice settings directly in each request, or reference a saved character preset by its ID. Per-request fields override the character defaults, so you can use a preset as a baseline and change just what you need.
You can pass voice settings directly in each request, or reference a saved character preset by its ID. Per-request fields override the character defaults, so you can use a preset as a baseline and change just what you need.
POST App Command URL
https://api.aiappsapi.com app: voices command: getaudio
This command is only available as an App Command. It is not available as a direct mode/webhook URL.
Required Fields
| Field Name | Field Key | Notes |
|---|---|---|
| Text | text | The text to convert to speech. Can be plain text or SSML markup if textType is set to ssml. |
Optional Fields
| Field Name | Field Key | Notes |
|---|---|---|
| Character ID | charID | Index of a saved character preset (e.g. 0, 1). Loads that character's voice, engine, format, language, and tween settings as defaults. Any other fields you send will override the character's values. |
| Voice | voice | Polly voice ID. Defaults to Joanna. Use the listvoices command to see all available voices. Common examples: Matthew, Salli, Amy, Brian. |
| Engine | engine | Polly engine type. Values: neural (default), standard, long-form, generative. Neural and generative produce the most natural speech. Not all voices support all engines. |
| Output Format | outputFormat | Audio format. Values: mp3 (default), ogg_vorbis. |
| Language Code | languageCode | Language/region code (e.g. en-US, es-ES). Only required for bilingual voices or to override a voice's default language. |
| Text Type | textType | Set to ssml to pass SSML markup in the text field for advanced speech control (pauses, emphasis, pronunciation). Defaults to text. |
| Tween Data | getTween | Set to 1 to include lip-sync animation data alongside the audio. Returns a tween array of viseme and word timing objects. Only works with neural, long-form, and generative engines. Doubles billing because it requires a second synthesis call. |
Response Fields
| Field Name | Field Key | Notes |
|---|---|---|
| Audio Content | filecontent | Base64-encoded audio file. Decode and save or play directly in a browser using a data URL. |
| Tween Data | tween | Array of timing objects (only present when getTween=1). Each object has time (milliseconds), type (viseme or word), and value (the viseme ID or word text). Use these to animate avatar mouth shapes in sync with the audio. |
| Output Format | outputFormat | The format of the returned audio file. |
| Voice Used | voice | The Polly voice ID that was used. |
| Character Count | characters | Number of characters in the input text (used for billing calculation). |
Get Transcription (Speech to Text)
command: gettranscription
Converts audio into text using OpenAI Whisper. Send a base64-encoded audio file and receive a text transcript with language detection and duration. Supports common audio formats including WebM, MP3, WAV, and others accepted by the Whisper API.
This is the same transcription the AI Chatbot uses when voice input is enabled on a chat window. The chatbot automatically sends recorded audio through this command to convert spoken messages into text before processing them.
This is the same transcription the AI Chatbot uses when voice input is enabled on a chat window. The chatbot automatically sends recorded audio through this command to convert spoken messages into text before processing them.
POST App Command URL
https://api.aiappsapi.com app: voices command: gettranscription
This command is only available as an App Command. It is not available as a direct mode/webhook URL.
Required Fields
| Field Name | Field Key | Notes |
|---|---|---|
| Audio Content | filecontent | Base64-encoded audio file. Record audio in a browser using MediaRecorder, convert the blob to base64, and send it in this field. |
Response Fields
| Field Name | Field Key | Notes |
|---|---|---|
| Transcript | transcript | The transcribed text from the audio. |
| Language | lang | Detected language code of the audio. |
| Duration | duration | Length of the audio in seconds. |
List Available Voices
command: listvoices
Returns a list of all available AWS Polly voices. Each voice includes its ID, display name, gender, language, and supported engines. Use this to find valid voice IDs for the
getaudio command or to build a voice selector in your application. You can filter results by engine type or language code.
POST App Command URL
https://api.aiappsapi.com app: voices command: listvoices
This command is only available as an App Command. It is not available as a direct mode/webhook URL.
Optional Fields
| Field Name | Field Key | Notes |
|---|---|---|
| Engine | engine | Filter by engine type: neural, standard, long-form, or generative. Only voices that support this engine will be returned. |
| Language Code | languageCode | Filter by language (e.g. en-US, fr-FR, ja-JP). Only voices for this language will be returned. |
Response Fields
| Field Name | Field Key | Notes |
|---|---|---|
| Voices | voices | Array of voice objects. Each contains: id (the voice ID to use in getaudio), name (display name), gender (Male/Female), languageCode, languageName, and engines (array of supported engine types). |
| Count | count | Total number of voices returned. |
Chatbot Voice Integration
How AI Voices connects to the AI Chatbot
AI Voices integrates directly with the AI Chatbot through the platform's AppBridge system. When a chatbot has a voice character assigned, every response the chatbot generates is automatically converted to spoken audio and returned alongside the text. This works across all chatbot types and delivery methods including the embeddable chat widget, webhooks, and direct API calls.
How to enable voice on a chatbot: In the AI Chatbot admin panel, edit a chatbot and set its Voice Character field to the index of a character you created in the AI Voices app. The chatbot will then route every response through the
getaudio command using that character's voice settings. The audio and optional tween data are included in the chatbot's response automatically.
Voice input: The embeddable chat widget includes a microphone button when voice is enabled. Users hold the button to record audio, which is sent to the
Voice output: When the chatbot generates a response, it reads the character's saved settings (voice, engine, format, language, tween preference) and calls
gettranscription command to convert speech to text before the chatbot processes it. This gives you full two-way voice conversations through the chat window.Voice output: When the chatbot generates a response, it reads the character's saved settings (voice, engine, format, language, tween preference) and calls
getaudio internally. The base64 audio and any tween animation data are added to the chatbot response so your front end can play the audio and animate an avatar without any extra API calls.
Admin: Voice Characters
Managing voice presets in the admin panel
The Voice Characters page in the admin panel lets you create and manage named voice presets. Each character saves a complete set of voice settings -- Polly voice ID, engine, output format, language code, text type, and whether to include tween data. Once saved, reference a character by its index number in any
getaudio API call or assign it to a chatbot for automatic voice output.
Character Settings
| Setting | Description |
|---|---|
| Character Name | A display name for this voice preset (e.g. "Customer Support Voice", "Spanish Narrator"). |
| Polly Voice ID | The AWS Polly voice to use. Examples: Joanna, Matthew, Salli, Amy, Brian. Use the listvoices API command to see all options. |
| Engine | Speech engine: standard, neural, long-form, or generative. Neural and generative produce the most natural speech. Not all voices support all engines. |
| Output Format | mp3 (recommended for web) or ogg_vorbis. |
| Language Code | Optional. Only needed for bilingual voices or to override the voice's default language (e.g. en-US, es-ES). |
| Text Type | text (default) or ssml for advanced speech control with SSML markup. |
| Tween Data | Set to 1 to always include lip-sync animation data with this character's audio. Doubles billing per request. |