AI Voices - API Documentation

Full reference for all commands, modes, and webhook endpoints.

In all URLs below, replace XXX with your AI Apps Account ID, found in your account settings.

How Endpoints Work

Two ways to call each endpoint - read this first
Show / Hide
App Command (API Call)
POST your request to https://api.aiappsapi.com with your API key and account ID in the POST body. The fields described in each section below go inside the jsonData POST field as a JSON-encoded string. Use this method when your server is making the call and you have your API credentials available.

Mode / Webhook (Direct URL)
POST directly to the endpoint URL shown in each section. The fields go in the POST body as standard form fields (the same fields you would include in an HTML form POST). Use this method for provider webhooks, website forms, or any third-party system that sends form data. No API key is required - the account ID is part of the URL.
Character presets: Most voice settings can be configured once in the admin panel as a named character and then referenced by ID in your API calls. This lets you manage voices centrally without hardcoding settings in every request. Pass the charID field to use a character's saved settings, or override any individual field per request.

Get Audio (Text to Speech)

command: getaudio
Show / Hide
Converts text into spoken audio using AWS Polly. Returns the audio file as a base64-encoded string ready for playback or storage. Optionally returns lip-sync tween data containing frame-level viseme and word timing for driving avatar mouth animations alongside the audio.

You can pass voice settings directly in each request, or reference a saved character preset by its ID. Per-request fields override the character defaults, so you can use a preset as a baseline and change just what you need.
POST App Command URL
https://api.aiappsapi.com     app: voices     command: getaudio
This command is only available as an App Command. It is not available as a direct mode/webhook URL.

Required Fields
Field NameField KeyNotes
Text text The text to convert to speech. Can be plain text or SSML markup if textType is set to ssml.
Optional Fields
Field NameField KeyNotes
Character ID charID Index of a saved character preset (e.g. 0, 1). Loads that character's voice, engine, format, language, and tween settings as defaults. Any other fields you send will override the character's values.
Voice voice Polly voice ID. Defaults to Joanna. Use the listvoices command to see all available voices. Common examples: Matthew, Salli, Amy, Brian.
Engine engine Polly engine type. Values: neural (default), standard, long-form, generative. Neural and generative produce the most natural speech. Not all voices support all engines.
Output Format outputFormat Audio format. Values: mp3 (default), ogg_vorbis.
Language Code languageCode Language/region code (e.g. en-US, es-ES). Only required for bilingual voices or to override a voice's default language.
Text Type textType Set to ssml to pass SSML markup in the text field for advanced speech control (pauses, emphasis, pronunciation). Defaults to text.
Tween Data getTween Set to 1 to include lip-sync animation data alongside the audio. Returns a tween array of viseme and word timing objects. Only works with neural, long-form, and generative engines. Doubles billing because it requires a second synthesis call.

Response Fields
Field NameField KeyNotes
Audio Content filecontent Base64-encoded audio file. Decode and save or play directly in a browser using a data URL.
Tween Data tween Array of timing objects (only present when getTween=1). Each object has time (milliseconds), type (viseme or word), and value (the viseme ID or word text). Use these to animate avatar mouth shapes in sync with the audio.
Output Format outputFormat The format of the returned audio file.
Voice Used voice The Polly voice ID that was used.
Character Count characters Number of characters in the input text (used for billing calculation).

Get Transcription (Speech to Text)

command: gettranscription
Show / Hide
Converts audio into text using OpenAI Whisper. Send a base64-encoded audio file and receive a text transcript with language detection and duration. Supports common audio formats including WebM, MP3, WAV, and others accepted by the Whisper API.

This is the same transcription the AI Chatbot uses when voice input is enabled on a chat window. The chatbot automatically sends recorded audio through this command to convert spoken messages into text before processing them.
POST App Command URL
https://api.aiappsapi.com     app: voices     command: gettranscription
This command is only available as an App Command. It is not available as a direct mode/webhook URL.

Required Fields
Field NameField KeyNotes
Audio Content filecontent Base64-encoded audio file. Record audio in a browser using MediaRecorder, convert the blob to base64, and send it in this field.

Response Fields
Field NameField KeyNotes
Transcript transcript The transcribed text from the audio.
Language lang Detected language code of the audio.
Duration duration Length of the audio in seconds.

List Available Voices

command: listvoices
Show / Hide
Returns a list of all available AWS Polly voices. Each voice includes its ID, display name, gender, language, and supported engines. Use this to find valid voice IDs for the getaudio command or to build a voice selector in your application. You can filter results by engine type or language code.
POST App Command URL
https://api.aiappsapi.com     app: voices     command: listvoices
This command is only available as an App Command. It is not available as a direct mode/webhook URL.

Optional Fields
Field NameField KeyNotes
Engine engine Filter by engine type: neural, standard, long-form, or generative. Only voices that support this engine will be returned.
Language Code languageCode Filter by language (e.g. en-US, fr-FR, ja-JP). Only voices for this language will be returned.

Response Fields
Field NameField KeyNotes
Voices voices Array of voice objects. Each contains: id (the voice ID to use in getaudio), name (display name), gender (Male/Female), languageCode, languageName, and engines (array of supported engine types).
Count count Total number of voices returned.

Chatbot Voice Integration

How AI Voices connects to the AI Chatbot
Show / Hide
AI Voices integrates directly with the AI Chatbot through the platform's AppBridge system. When a chatbot has a voice character assigned, every response the chatbot generates is automatically converted to spoken audio and returned alongside the text. This works across all chatbot types and delivery methods including the embeddable chat widget, webhooks, and direct API calls.
How to enable voice on a chatbot: In the AI Chatbot admin panel, edit a chatbot and set its Voice Character field to the index of a character you created in the AI Voices app. The chatbot will then route every response through the getaudio command using that character's voice settings. The audio and optional tween data are included in the chatbot's response automatically.
Voice input: The embeddable chat widget includes a microphone button when voice is enabled. Users hold the button to record audio, which is sent to the gettranscription command to convert speech to text before the chatbot processes it. This gives you full two-way voice conversations through the chat window.

Voice output: When the chatbot generates a response, it reads the character's saved settings (voice, engine, format, language, tween preference) and calls getaudio internally. The base64 audio and any tween animation data are added to the chatbot response so your front end can play the audio and animate an avatar without any extra API calls.

Admin: Voice Characters

Managing voice presets in the admin panel
Show / Hide
The Voice Characters page in the admin panel lets you create and manage named voice presets. Each character saves a complete set of voice settings -- Polly voice ID, engine, output format, language code, text type, and whether to include tween data. Once saved, reference a character by its index number in any getaudio API call or assign it to a chatbot for automatic voice output.
Character Settings
SettingDescription
Character Name A display name for this voice preset (e.g. "Customer Support Voice", "Spanish Narrator").
Polly Voice ID The AWS Polly voice to use. Examples: Joanna, Matthew, Salli, Amy, Brian. Use the listvoices API command to see all options.
Engine Speech engine: standard, neural, long-form, or generative. Neural and generative produce the most natural speech. Not all voices support all engines.
Output Format mp3 (recommended for web) or ogg_vorbis.
Language Code Optional. Only needed for bilingual voices or to override the voice's default language (e.g. en-US, es-ES).
Text Type text (default) or ssml for advanced speech control with SSML markup.
Tween Data Set to 1 to always include lip-sync animation data with this character's audio. Doubles billing per request.