Data Aggregator - API Documentation

Full reference for all commands, background jobs, and admin features.

In all URLs below, replace XXX with your AI Apps Account ID, found in your account settings.

How Endpoints Work

Two ways to call each endpoint - read this first

Show / Hide

App Command (API Call)
POST your request to https://api.aiappsapi.com with your API key and account ID in the POST body. The fields described in each section below go inside the jsonData POST field as a JSON-encoded string. Use this method when your server is making the call and you have your API credentials available.

Mode / Webhook (Direct URL)
POST directly to the endpoint URL shown in each section. The fields go in the POST body as standard form fields (the same fields you would include in an HTML form POST). Use this method for provider webhooks, website forms, or any third-party system that sends form data. No API key is required - the account ID is part of the URL.

Two sides of this app: The AI analysis side (aggregatedata, summarizeconversation, summarizedataset) uses cloud AI models to analyze and summarize your data. The ML pipeline side (predict, trainmodel) runs local machine learning models with zero per-request AI costs after training. Both sides are fully accessible through the API and work with Chain Commands.

Predict Through Pipeline

command: predict

Show / Hide

Run input data through a trained ML pipeline and get a prediction. The pipeline processes each model step in sequence, passing each step's output as the next step's input, and returns the final result. If the pipeline has live training enabled, each model in the sequence also learns from the prediction automatically. No cloud AI costs - all predictions run locally.

POST App Command URL

https://api.aiappsapi.com app: dataaggregator command: predict

This command is only available as an App Command. It is not available as a direct mode/webhook URL.

Required Fields

Field Name	Field Key	Notes
Pipeline ID	pipelineID	The index number of the pipeline to run (0 = first pipeline).
Input	input	The data to classify or predict on. For text models, pass a string. For numeric models, pass an array of numbers. Example text: `"I want to cancel my order"`. Example numeric: `[1200, 3, 2, 1995]`

Analyze Data with AI

command: aggregatedata

Show / Hide

Send any data array to a cloud AI model for analysis. The AI produces a structured summary highlighting patterns, common groupings, outlier data, and key statistics. Results are also stored in your account reports for later retrieval. Use this for ad-hoc analysis of customer feedback, survey responses, transaction records, or any data collection.

POST App Command URL

https://api.aiappsapi.com app: dataaggregator command: aggregatedata

This command is only available as an App Command. It is not available as a direct mode/webhook URL.

Required Fields

Field Name	Field Key	Notes
Data	data	An array of items to analyze. Can be strings, objects, or any structured data. Example: `["Great product!", "Shipping was slow", "Love it"]`

Optional Fields

Field Name	Field Key	Notes
Label	label	A label for this analysis report. Defaults to `dataaggregator`. Used to identify the report in your stored results.
Custom Prompt	prompt	Override the default analysis prompt. Tell the AI exactly what kind of summary or analysis you want from the data.

Summarize Conversation

command: summarizeconversation

Show / Hide

Compress a long conversation history into a compact summary while keeping the most recent messages verbatim. The older portion is summarized into a narrative and merged with the recent messages so full context is preserved without the storage and processing overhead. Works with chatbot, SMS, email, and live operator conversations. You can pass messages inline or reference an existing conversation by app and conversation ID.

POST App Command URL

https://api.aiappsapi.com app: dataaggregator command: summarizeconversation

This command is only available as an App Command. It is not available as a direct mode/webhook URL.

Provide messages one of two ways: pass them inline using filecontent, or pull them from an existing conversation by specifying app and convoID.

Fields

Field Name	Field Key	Notes
Inline Messages	filecontent	Messages as a JSON array or jsonL text (one JSON object per line). Each object should have role and content fields matching the conversation format.
App Name	app	The app that owns the conversation (e.g. `chatbot`, `smsbroadcast`). Used with `convoID` to load messages from the database.
Conversation ID	convoID	The ID of the conversation to consolidate. Used with `app`.
Max Length	maxLength	Maximum character length for the consolidated result. Default: `5000`.

Summarize Dataset

job: summarizedataset

Show / Hide

Run an AI-powered summary on a stored dataset using a named summary configuration. The summary config defines which dataset to analyze, which AI model to use, the maximum data size, and an optional custom prompt. Results are stored in your account reports keyed by date. This runs as a background job.

POST App Command URL (Background Job)

https://api.aiappsapi.com app: dataaggregator job: summarizedataset

This runs as a background job. Configure your summary settings in the admin area under the Summaries page.

Required Fields

Field Name	Field Key	Notes
Dataset ID	datasetID	The index number of the dataset to summarize (0 = first dataset).
Summary ID	summaryID	The index number of the summary configuration to use (0 = first summary config).

Optional Fields

Field Name	Field Key	Notes
Inline Data	data	Pass data inline instead of using the stored dataset. If provided, this overrides the dataset's stored entries.

Train Model

job: trainmodel

Show / Hide

Train a specific model step within a pipeline. Models that support incremental training (KNN, Naive Bayes, DNN, Logistic Regression, KNN Regressor, MLP Regressor, K-Means) can learn one row at a time or from a batch without losing previous learning. All other models require a full dataset and retrain from scratch each time. Training data can be passed inline as a single row, as a jsonL batch, or pulled from a stored dataset file.

POST App Command URL (Background Job)

https://api.aiappsapi.com app: dataaggregator job: trainmodel

This runs as a background job. You can also train models directly from the Pipeline Editor in the admin area.

Required Fields

Field Name	Field Key	Notes
Pipeline ID	pipelineID	The index number of the pipeline containing the model.
Model Index	modelIndex	The index of the model step within the pipeline (0 = first step).

Training Data (provide at least one)

Field Name	Field Key	Notes
Input	input	A single training input. For text models: a string. For numeric models: an array of numbers.
Label	label	The target value for this input. Required for classifiers (string category) and regressors (number). Not used for clusterers or anomaly detectors.
Batch Data	filecontent	Multiple training rows as jsonL (one JSON object per line). Each row needs an `input` field and optionally a `label` field. Example: `{"input": "cancel order", "label": "cancel"}`
Retrain	retrain	Set to `1` to retrain a full-retrain model from its stored dataset file. No new data is needed - the model rebuilds from everything stored so far.

Admin: Datasets

Manage data collections for AI summarization

Show / Hide

Datasets hold the data entries used for AI summarization jobs. Create named datasets, set their type (text or numeric), add entries one by one or paste bulk jsonL via the Upload page, and optionally add a description. Each dataset can be targeted by one or more summary configs.

Dataset Fields

Field	Description
Name	A descriptive name for the dataset (e.g. "Q4 Customer Feedback").
Data Type	Text (each entry is a string like a review or message) or Numeric (each entry is an array of numbers like measurements or scores).
Description	Optional notes about what data this dataset contains and where it comes from.
Data Entries	The actual data items. Each entry is a JSON object. Add them one at a time on the edit page, or bulk import via the Upload page.

Admin: Summary Configs

Configure AI-powered analysis jobs

Show / Hide

Summary configs define how the AI analyzes a dataset. Each config targets a specific dataset and lets you choose the AI model, set a data size limit, and provide a custom analysis prompt. Run the summary through the API as a background job using the summarizedataset command.

Summary Config Fields

Field	Description
Name	A label for this summary config (e.g. "Q4 Feedback Summary").
Dataset Index	Which dataset to summarize (0 = first dataset in the Datasets list).
Max Data Length	Maximum characters of data to send to the AI. Larger values give more context but cost more. If the data exceeds this, the most recent rows are used. Default: 5000.
AI Model	Which AI model to use for the analysis. Options include GPT and Claude models.
Custom Prompt	Override the default analysis prompt. Leave blank to use the default which highlights patterns, frequent values, and outliers.

Admin: Bulk Upload

Import data into datasets in bulk

Show / Hide

The Bulk Upload page lets you paste a block of jsonL data (one JSON object per line) to import many entries into a dataset at once. Existing entries in the target dataset are preserved and new rows are appended. Each upload batch gets a label and status for tracking.

Upload Fields

Field	Description
Label	A label for this upload batch (e.g. "Q4 Survey Import").
Dataset Index	Which dataset to append entries to (0 = first dataset).
jsonL Data	One JSON object per line. Must match the format of the target dataset. Example: `{"input": "I love this product", "label": "positive"}`

Admin: Pipeline Editor

Build and train ML pipeline sequences

Show / Hide

The Pipeline Editor is the main interface for building, configuring, and training your custom ML models. Each pipeline is a sequence of one or more model steps that run in order. A single-step pipeline works like a standalone model. Multi-step pipelines pass each step's prediction as input to the next step. For complex workflows with branching logic, build those in Chain Commands and call this pipeline as one step in that chain.

Pipeline List Page

Feature	Description
Create Pipeline	Give your pipeline a name and it is created immediately, ready for you to add model steps.
View Pipelines	See all your pipelines with the number of model steps in each one. Click any pipeline to edit it.

Pipeline Edit Page

Feature	Description
Live Training	When enabled, the pipeline trains on every live prediction request. Each model in the sequence learns from what it predicts. Best for data that does not have an exact right answer or when you want models to learn on their own.
Add Model Steps	Add one or more model steps to the pipeline. Each step has its own data type, category, algorithm, and parameter settings.
Configure Each Step	For each model step, choose the data type (text or numeric), category (classifier, regressor, clusterer, anomaly), algorithm, and tuning parameters. Save each step's config individually.
Train Incremental Models	Models that support incremental training show an inline training form where you can enter one input/label pair at a time and train immediately.
Manage Full-Retrain Datasets	Models that require a full dataset have a separate dataset management page where you can add rows, view all training data, delete rows, clear the dataset, and retrain the model from the complete dataset.
Delete Pipeline	Permanently delete a pipeline and all its trained model files and dataset files.

Available ML Models

18 algorithms across 4 categories

Show / Hide

All models run locally using Rubix ML with no cloud AI costs after training. Models marked with * support incremental training (can learn one row at a time without rebuilding). All other models require a full dataset and retrain from scratch.

Classifiers - predict a category label from input data

Algorithm	Key	Best For
K-Nearest Neighbors *	knn	Very small datasets (under 500 rows). Compares input to the closest training examples.
Gaussian Naive Bayes *	naiveBayes	Text classification. Fast and effective for spam detection, sentiment, and topic sorting.
Deep Neural Network *	dnn	Large datasets with complex patterns (1000+ rows). Configurable layers, dropout, and learning rate.
Random Forest	randomForest	Best general-purpose classifier. Ensemble of decision trees. Strong accuracy on most tabular data.
Decision Tree	tree	When you need interpretable results. A single tree that is easy to understand and explain.
AdaBoost	adaBoost	Class imbalance problems where one category is much rarer than others.
Logistic Regression *	logisticRegression	Fast linear baseline. Good first model to test before trying more complex algorithms.

Regressors - predict a numeric value from input data

Algorithm	Key	Best For
Ridge Regression	ridge	Fast linear baseline with regularization. First choice for linear relationships.
KNN Regressor *	knnRegressor	Simple non-linear predictions. Averages the values of nearest neighbors.
Support Vector Regression	svr	Small-to-medium datasets (under 10,000 rows). Finds optimal regression boundaries.
MLP Regressor *	mlpRegressor	Complex non-linear patterns. Neural network regression with configurable architecture.
Gradient Boost	gradientBoost	Most powerful for tabular numeric data. Ensemble of boosted regression trees.

Clusterers - find natural groups with no labels needed

Algorithm	Key	Best For
K-Means *	kmeans	Fast clustering when you know roughly how many groups to expect. Specify k (number of clusters).
DBSCAN	dbscan	Density-based clustering. No k required. Finds irregular-shaped groups and detects noise points.
Fuzzy C-Means	fuzzyCMeans	Soft clustering where items can belong to multiple groups with membership probabilities.

Anomaly Detectors - train on normal data, flag outliers

Algorithm	Key	Best For
Isolation Forest	isolationForest	Best general-purpose anomaly detection. Works well on most data types and sizes.
Local Outlier Factor	localOutlierFactor	Density-based detection. Best when normal data forms clear tight clusters.
Robust Z-Score	robustZScore	Simplest and fastest option. Flags values that exceed a statistical threshold.

Models marked with * support incremental training and can be trained one row at a time through the admin panel or API without losing previous learning. All other models must be retrained from a complete dataset.