AI Voices - API Documentation

Full reference for all commands, modes, and webhook endpoints.

In all URLs below, replace XXX with your AI Apps Account ID, found in your account settings.

How Endpoints Work

Two ways to call each endpoint - read this first

Show / Hide

App Command (API Call)
POST your request to https://api.aiappsapi.com with your API key and account ID in the POST body. The fields described in each section below go inside the jsonData POST field as a JSON-encoded string. Use this method when your server is making the call and you have your API credentials available.

Mode / Webhook (Direct URL)
POST directly to the endpoint URL shown in each section. The fields go in the POST body as standard form fields (the same fields you would include in an HTML form POST). Use this method for provider webhooks, website forms, or any third-party system that sends form data. No API key is required - the account ID is part of the URL.

Character presets: Most voice settings can be configured once in the admin panel as a named character and then referenced by ID in your API calls. This lets you manage voices centrally without hardcoding settings in every request. Pass the charID field to use a character's saved settings, or override any individual field per request.

Get Audio (Text to Speech)

command: getaudio

Show / Hide

Converts text into spoken audio using AWS Polly. Returns the audio file as a base64-encoded string ready for playback or storage. Optionally returns lip-sync tween data containing frame-level viseme and word timing for driving avatar mouth animations alongside the audio.

You can pass voice settings directly in each request, or reference a saved character preset by its ID. Per-request fields override the character defaults, so you can use a preset as a baseline and change just what you need.

POST App Command URL

https://api.aiappsapi.com app: voices command: getaudio

This command is only available as an App Command. It is not available as a direct mode/webhook URL.

Required Fields

Field Name	Field Key	Notes
Text	text	The text to convert to speech. Can be plain text or SSML markup if `textType` is set to `ssml`.

Optional Fields

Field Name	Field Key	Notes
Character ID	charID	Index of a saved character preset (e.g. `0`, `1`). Loads that character's voice, engine, format, language, and tween settings as defaults. Any other fields you send will override the character's values.
Voice	voice	Polly voice ID. Defaults to `Joanna`. Use the `listvoices` command to see all available voices. Common examples: `Matthew`, `Salli`, `Amy`, `Brian`.
Engine	engine	Polly engine type. Values: `neural` (default), `standard`, `long-form`, `generative`. Neural and generative produce the most natural speech. Not all voices support all engines.
Output Format	outputFormat	Audio format. Values: `mp3` (default), `ogg_vorbis`.
Language Code	languageCode	Language/region code (e.g. `en-US`, `es-ES`). Only required for bilingual voices or to override a voice's default language.
Text Type	textType	Set to `ssml` to pass SSML markup in the text field for advanced speech control (pauses, emphasis, pronunciation). Defaults to `text`.
Tween Data	getTween	Set to `1` to include lip-sync animation data alongside the audio. Returns a `tween` array of viseme and word timing objects. Only works with neural, long-form, and generative engines. Doubles billing because it requires a second synthesis call.

Response Fields

Field Name	Field Key	Notes
Audio Content	filecontent	Base64-encoded audio file. Decode and save or play directly in a browser using a data URL.
Tween Data	tween	Array of timing objects (only present when `getTween=1`). Each object has `time` (milliseconds), `type` (`viseme` or `word`), and `value` (the viseme ID or word text). Use these to animate avatar mouth shapes in sync with the audio.
Output Format	outputFormat	The format of the returned audio file.
Voice Used	voice	The Polly voice ID that was used.
Character Count	characters	Number of characters in the input text (used for billing calculation).

Get Transcription (Speech to Text)

command: gettranscription

Show / Hide

Converts audio into text using OpenAI Whisper. Send a base64-encoded audio file and receive a text transcript with language detection and duration. Supports common audio formats including WebM, MP3, WAV, and others accepted by the Whisper API.

This is the same transcription the AI Chatbot uses when voice input is enabled on a chat window. The chatbot automatically sends recorded audio through this command to convert spoken messages into text before processing them.

POST App Command URL

https://api.aiappsapi.com app: voices command: gettranscription

This command is only available as an App Command. It is not available as a direct mode/webhook URL.

Required Fields

Field Name	Field Key	Notes
Audio Content	filecontent	Base64-encoded audio file. Record audio in a browser using MediaRecorder, convert the blob to base64, and send it in this field.

Response Fields

Field Name	Field Key	Notes
Transcript	transcript	The transcribed text from the audio.
Language	lang	Detected language code of the audio.
Duration	duration	Length of the audio in seconds.

List Available Voices

command: listvoices

Show / Hide

Returns a list of all available AWS Polly voices. Each voice includes its ID, display name, gender, language, and supported engines. Use this to find valid voice IDs for the getaudio command or to build a voice selector in your application. You can filter results by engine type or language code.

POST App Command URL

https://api.aiappsapi.com app: voices command: listvoices

This command is only available as an App Command. It is not available as a direct mode/webhook URL.

Optional Fields

Field Name	Field Key	Notes
Engine	engine	Filter by engine type: `neural`, `standard`, `long-form`, or `generative`. Only voices that support this engine will be returned.
Language Code	languageCode	Filter by language (e.g. `en-US`, `fr-FR`, `ja-JP`). Only voices for this language will be returned.

Response Fields

Field Name	Field Key	Notes
Voices	voices	Array of voice objects. Each contains: `id` (the voice ID to use in getaudio), `name` (display name), `gender` (Male/Female), `languageCode`, `languageName`, and `engines` (array of supported engine types).
Count	count	Total number of voices returned.

Chatbot Voice Integration

How AI Voices connects to the AI Chatbot

Show / Hide

AI Voices integrates directly with the AI Chatbot through the platform's AppBridge system. When a chatbot has a voice character assigned, every response the chatbot generates is automatically converted to spoken audio and returned alongside the text. This works across all chatbot types and delivery methods including the embeddable chat widget, webhooks, and direct API calls.

How to enable voice on a chatbot: In the AI Chatbot admin panel, edit a chatbot and set its Voice Character field to the index of a character you created in the AI Voices app. The chatbot will then route every response through the getaudio command using that character's voice settings. The audio and optional tween data are included in the chatbot's response automatically.

Voice input: The embeddable chat widget includes a microphone button when voice is enabled. Users hold the button to record audio, which is sent to the gettranscription command to convert speech to text before the chatbot processes it. This gives you full two-way voice conversations through the chat window.

Voice output: When the chatbot generates a response, it reads the character's saved settings (voice, engine, format, language, tween preference) and calls getaudio internally. The base64 audio and any tween animation data are added to the chatbot response so your front end can play the audio and animate an avatar without any extra API calls.

Admin: Voice Characters

Managing voice presets in the admin panel

Show / Hide

The Voice Characters page in the admin panel lets you create and manage named voice presets. Each character saves a complete set of voice settings -- Polly voice ID, engine, output format, language code, text type, and whether to include tween data. Once saved, reference a character by its index number in any getaudio API call or assign it to a chatbot for automatic voice output.

Character Settings

Setting	Description
Character Name	A display name for this voice preset (e.g. "Customer Support Voice", "Spanish Narrator").
Polly Voice ID	The AWS Polly voice to use. Examples: `Joanna`, `Matthew`, `Salli`, `Amy`, `Brian`. Use the `listvoices` API command to see all options.
Engine	Speech engine: `standard`, `neural`, `long-form`, or `generative`. Neural and generative produce the most natural speech. Not all voices support all engines.
Output Format	`mp3` (recommended for web) or `ogg_vorbis`.
Language Code	Optional. Only needed for bilingual voices or to override the voice's default language (e.g. `en-US`, `es-ES`).
Text Type	`text` (default) or `ssml` for advanced speech control with SSML markup.
Tween Data	Set to `1` to always include lip-sync animation data with this character's audio. Doubles billing per request.