Data Aggregator - API Documentation

Full reference for all commands, background jobs, and admin features.

In all URLs below, replace XXX with your AI Apps Account ID, found in your account settings.

How Endpoints Work

Two ways to call each endpoint - read this first
Show / Hide
App Command (API Call)
POST your request to https://api.aiappsapi.com with your API key and account ID in the POST body. The fields described in each section below go inside the jsonData POST field as a JSON-encoded string. Use this method when your server is making the call and you have your API credentials available.

Mode / Webhook (Direct URL)
POST directly to the endpoint URL shown in each section. The fields go in the POST body as standard form fields (the same fields you would include in an HTML form POST). Use this method for provider webhooks, website forms, or any third-party system that sends form data. No API key is required - the account ID is part of the URL.
Two sides of this app: The AI analysis side (aggregatedata, summarizeconversation, summarizedataset) uses cloud AI models to analyze and summarize your data. The ML pipeline side (predict, trainmodel) runs local machine learning models with zero per-request AI costs after training. Both sides are fully accessible through the API and work with Chain Commands.

Predict Through Pipeline

command: predict
Show / Hide
Run input data through a trained ML pipeline and get a prediction. The pipeline processes each model step in sequence, passing each step's output as the next step's input, and returns the final result. If the pipeline has live training enabled, each model in the sequence also learns from the prediction automatically. No cloud AI costs - all predictions run locally.
POST App Command URL
https://api.aiappsapi.com     app: dataaggregator     command: predict
This command is only available as an App Command. It is not available as a direct mode/webhook URL.

Required Fields
Field NameField KeyNotes
Pipeline ID pipelineID The index number of the pipeline to run (0 = first pipeline).
Input input The data to classify or predict on. For text models, pass a string. For numeric models, pass an array of numbers. Example text: "I want to cancel my order". Example numeric: [1200, 3, 2, 1995]

Analyze Data with AI

command: aggregatedata
Show / Hide
Send any data array to a cloud AI model for analysis. The AI produces a structured summary highlighting patterns, common groupings, outlier data, and key statistics. Results are also stored in your account reports for later retrieval. Use this for ad-hoc analysis of customer feedback, survey responses, transaction records, or any data collection.
POST App Command URL
https://api.aiappsapi.com     app: dataaggregator     command: aggregatedata
This command is only available as an App Command. It is not available as a direct mode/webhook URL.

Required Fields
Field NameField KeyNotes
Data data An array of items to analyze. Can be strings, objects, or any structured data. Example: ["Great product!", "Shipping was slow", "Love it"]
Optional Fields
Field NameField KeyNotes
Label label A label for this analysis report. Defaults to dataaggregator. Used to identify the report in your stored results.
Custom Prompt prompt Override the default analysis prompt. Tell the AI exactly what kind of summary or analysis you want from the data.

Summarize Conversation

command: summarizeconversation
Show / Hide
Compress a long conversation history into a compact summary while keeping the most recent messages verbatim. The older portion is summarized into a narrative and merged with the recent messages so full context is preserved without the storage and processing overhead. Works with chatbot, SMS, email, and live operator conversations. You can pass messages inline or reference an existing conversation by app and conversation ID.
POST App Command URL
https://api.aiappsapi.com     app: dataaggregator     command: summarizeconversation
This command is only available as an App Command. It is not available as a direct mode/webhook URL.

Provide messages one of two ways: pass them inline using filecontent, or pull them from an existing conversation by specifying app and convoID.
Fields
Field NameField KeyNotes
Inline Messages filecontent Messages as a JSON array or jsonL text (one JSON object per line). Each object should have role and content fields matching the conversation format.
App Name app The app that owns the conversation (e.g. chatbot, smsbroadcast). Used with convoID to load messages from the database.
Conversation ID convoID The ID of the conversation to consolidate. Used with app.
Max Length maxLength Maximum character length for the consolidated result. Default: 5000.

Summarize Dataset

job: summarizedataset
Show / Hide
Run an AI-powered summary on a stored dataset using a named summary configuration. The summary config defines which dataset to analyze, which AI model to use, the maximum data size, and an optional custom prompt. Results are stored in your account reports keyed by date. This runs as a background job.
POST App Command URL (Background Job)
https://api.aiappsapi.com     app: dataaggregator     job: summarizedataset
This runs as a background job. Configure your summary settings in the admin area under the Summaries page.

Required Fields
Field NameField KeyNotes
Dataset ID datasetID The index number of the dataset to summarize (0 = first dataset).
Summary ID summaryID The index number of the summary configuration to use (0 = first summary config).
Optional Fields
Field NameField KeyNotes
Inline Data data Pass data inline instead of using the stored dataset. If provided, this overrides the dataset's stored entries.

Train Model

job: trainmodel
Show / Hide
Train a specific model step within a pipeline. Models that support incremental training (KNN, Naive Bayes, DNN, Logistic Regression, KNN Regressor, MLP Regressor, K-Means) can learn one row at a time or from a batch without losing previous learning. All other models require a full dataset and retrain from scratch each time. Training data can be passed inline as a single row, as a jsonL batch, or pulled from a stored dataset file.
POST App Command URL (Background Job)
https://api.aiappsapi.com     app: dataaggregator     job: trainmodel
This runs as a background job. You can also train models directly from the Pipeline Editor in the admin area.

Required Fields
Field NameField KeyNotes
Pipeline ID pipelineID The index number of the pipeline containing the model.
Model Index modelIndex The index of the model step within the pipeline (0 = first step).
Training Data (provide at least one)
Field NameField KeyNotes
Input input A single training input. For text models: a string. For numeric models: an array of numbers.
Label label The target value for this input. Required for classifiers (string category) and regressors (number). Not used for clusterers or anomaly detectors.
Batch Data filecontent Multiple training rows as jsonL (one JSON object per line). Each row needs an input field and optionally a label field. Example: {"input": "cancel order", "label": "cancel"}
Retrain retrain Set to 1 to retrain a full-retrain model from its stored dataset file. No new data is needed - the model rebuilds from everything stored so far.

Admin: Datasets

Manage data collections for AI summarization
Show / Hide
Datasets hold the data entries used for AI summarization jobs. Create named datasets, set their type (text or numeric), add entries one by one or paste bulk jsonL via the Upload page, and optionally add a description. Each dataset can be targeted by one or more summary configs.
Dataset Fields
FieldDescription
Name A descriptive name for the dataset (e.g. "Q4 Customer Feedback").
Data Type Text (each entry is a string like a review or message) or Numeric (each entry is an array of numbers like measurements or scores).
Description Optional notes about what data this dataset contains and where it comes from.
Data Entries The actual data items. Each entry is a JSON object. Add them one at a time on the edit page, or bulk import via the Upload page.

Admin: Summary Configs

Configure AI-powered analysis jobs
Show / Hide
Summary configs define how the AI analyzes a dataset. Each config targets a specific dataset and lets you choose the AI model, set a data size limit, and provide a custom analysis prompt. Run the summary through the API as a background job using the summarizedataset command.
Summary Config Fields
FieldDescription
Name A label for this summary config (e.g. "Q4 Feedback Summary").
Dataset Index Which dataset to summarize (0 = first dataset in the Datasets list).
Max Data Length Maximum characters of data to send to the AI. Larger values give more context but cost more. If the data exceeds this, the most recent rows are used. Default: 5000.
AI Model Which AI model to use for the analysis. Options include GPT and Claude models.
Custom Prompt Override the default analysis prompt. Leave blank to use the default which highlights patterns, frequent values, and outliers.

Admin: Bulk Upload

Import data into datasets in bulk
Show / Hide
The Bulk Upload page lets you paste a block of jsonL data (one JSON object per line) to import many entries into a dataset at once. Existing entries in the target dataset are preserved and new rows are appended. Each upload batch gets a label and status for tracking.
Upload Fields
FieldDescription
Label A label for this upload batch (e.g. "Q4 Survey Import").
Dataset Index Which dataset to append entries to (0 = first dataset).
jsonL Data One JSON object per line. Must match the format of the target dataset. Example: {"input": "I love this product", "label": "positive"}

Admin: Pipeline Editor

Build and train ML pipeline sequences
Show / Hide
The Pipeline Editor is the main interface for building, configuring, and training your custom ML models. Each pipeline is a sequence of one or more model steps that run in order. A single-step pipeline works like a standalone model. Multi-step pipelines pass each step's prediction as input to the next step. For complex workflows with branching logic, build those in Chain Commands and call this pipeline as one step in that chain.
Pipeline List Page
FeatureDescription
Create Pipeline Give your pipeline a name and it is created immediately, ready for you to add model steps.
View Pipelines See all your pipelines with the number of model steps in each one. Click any pipeline to edit it.
Pipeline Edit Page
FeatureDescription
Live Training When enabled, the pipeline trains on every live prediction request. Each model in the sequence learns from what it predicts. Best for data that does not have an exact right answer or when you want models to learn on their own.
Add Model Steps Add one or more model steps to the pipeline. Each step has its own data type, category, algorithm, and parameter settings.
Configure Each Step For each model step, choose the data type (text or numeric), category (classifier, regressor, clusterer, anomaly), algorithm, and tuning parameters. Save each step's config individually.
Train Incremental Models Models that support incremental training show an inline training form where you can enter one input/label pair at a time and train immediately.
Manage Full-Retrain Datasets Models that require a full dataset have a separate dataset management page where you can add rows, view all training data, delete rows, clear the dataset, and retrain the model from the complete dataset.
Delete Pipeline Permanently delete a pipeline and all its trained model files and dataset files.

Available ML Models

18 algorithms across 4 categories
Show / Hide
All models run locally using Rubix ML with no cloud AI costs after training. Models marked with * support incremental training (can learn one row at a time without rebuilding). All other models require a full dataset and retrain from scratch.
Classifiers - predict a category label from input data
AlgorithmKeyBest For
K-Nearest Neighbors *knnVery small datasets (under 500 rows). Compares input to the closest training examples.
Gaussian Naive Bayes *naiveBayesText classification. Fast and effective for spam detection, sentiment, and topic sorting.
Deep Neural Network *dnnLarge datasets with complex patterns (1000+ rows). Configurable layers, dropout, and learning rate.
Random ForestrandomForestBest general-purpose classifier. Ensemble of decision trees. Strong accuracy on most tabular data.
Decision TreetreeWhen you need interpretable results. A single tree that is easy to understand and explain.
AdaBoostadaBoostClass imbalance problems where one category is much rarer than others.
Logistic Regression *logisticRegressionFast linear baseline. Good first model to test before trying more complex algorithms.
Regressors - predict a numeric value from input data
AlgorithmKeyBest For
Ridge RegressionridgeFast linear baseline with regularization. First choice for linear relationships.
KNN Regressor *knnRegressorSimple non-linear predictions. Averages the values of nearest neighbors.
Support Vector RegressionsvrSmall-to-medium datasets (under 10,000 rows). Finds optimal regression boundaries.
MLP Regressor *mlpRegressorComplex non-linear patterns. Neural network regression with configurable architecture.
Gradient BoostgradientBoostMost powerful for tabular numeric data. Ensemble of boosted regression trees.
Clusterers - find natural groups with no labels needed
AlgorithmKeyBest For
K-Means *kmeansFast clustering when you know roughly how many groups to expect. Specify k (number of clusters).
DBSCANdbscanDensity-based clustering. No k required. Finds irregular-shaped groups and detects noise points.
Fuzzy C-MeansfuzzyCMeansSoft clustering where items can belong to multiple groups with membership probabilities.
Anomaly Detectors - train on normal data, flag outliers
AlgorithmKeyBest For
Isolation ForestisolationForestBest general-purpose anomaly detection. Works well on most data types and sizes.
Local Outlier FactorlocalOutlierFactorDensity-based detection. Best when normal data forms clear tight clusters.
Robust Z-ScorerobustZScoreSimplest and fastest option. Flags values that exceed a statistical threshold.
Models marked with * support incremental training and can be trained one row at a time through the admin panel or API without losing previous learning. All other models must be retrained from a complete dataset.