Tutorials12 min read

Getting Started with AutoML in CorePlexML

CorePlexML Team·February 27, 2026

Introduction

Machine learning has transformed nearly every industry, from healthcare diagnostics to financial fraud detection to supply chain optimization. Yet for most organizations, the barrier to entry remains frustratingly high. Traditional ML workflows demand expertise in algorithm selection, feature engineering, hyperparameter tuning, and cross-validation strategies. A single project can take weeks of iteration before a data scientist arrives at a model worth deploying.

Automated Machine Learning, or AutoML, changes the equation entirely. AutoML systems automate the most labor-intensive parts of the ML pipeline: they test dozens of algorithms, engineer features, tune hyperparameters, and even build ensemble models that combine the strengths of multiple learners. The result is production-quality models in a fraction of the time, accessible to both experienced data scientists who want to accelerate their workflow and domain experts who understand their data but lack deep ML expertise.

CorePlexML's AutoML engine is built on top of H2O.ai, one of the most battle-tested open-source ML frameworks available. It supports classification, regression, and time-series forecasting out of the box, and it automatically handles tasks like missing value imputation, categorical encoding, and feature interaction discovery. In this tutorial, we will walk through the complete journey from raw dataset to deployed model, covering every step in detail so you can be confident in each decision along the way.

Step 1: Upload Your Data

Every ML project begins with data. CorePlexML supports CSV, Excel (XLSX), JSON, and XML file formats, and can handle datasets ranging from a few hundred rows to several million. When you upload a file, the platform automatically infers column types, detects delimiters, and generates a schema that you can review and adjust before proceeding.

To upload through the UI, navigate to the Datasets section in your project sidebar and click New Dataset. Drag and drop your file or browse to select it. The upload process parses the file, samples the data to infer types (numeric, categorical, date, text), and presents you with a schema editor where you can rename columns, change inferred types, or exclude columns entirely.

If you prefer a programmatic approach, the SDK makes uploads straightforward:

from coreplexml import CorePlexClient

client = CorePlexClient(
    base_url="https://api.coreplexml.io",
    api_key="your-api-key"
)

# Upload a CSV file to your project
dataset = client.datasets.upload(
    project_id="proj_abc123",
    file_path="./data/customer_churn.csv",
    name="Customer Churn Dataset",
    description="Monthly churn data with 15 features, 50k rows"
)

print(f"Dataset ID: {dataset.id}")
print(f"Version: {dataset.version}")
print(f"Columns: {len(dataset.columns)}")
print(f"Rows: {dataset.row_count}")

Once uploaded, CorePlexML creates a dataset version, an immutable snapshot of your data at that point in time. This versioning is critical for reproducibility. Every experiment you run links back to a specific dataset version, so you can always trace a model's lineage back to the exact data it was trained on.

Before moving to the next step, take a moment to review the schema. Check that numeric columns were not accidentally inferred as categorical (this can happen with integer IDs or zip codes), and that date columns were parsed correctly. A clean schema at this stage prevents surprises later in the pipeline.

Step 2: Create an Experiment

With your data uploaded, it is time to create an experiment. An experiment in CorePlexML is a single AutoML training run: you specify what to predict, how long to train, and the platform handles everything else.

In the UI, navigate to Experiments and click New Experiment. You will be prompted to configure four settings. First, select the dataset version you just uploaded. Second, choose the target column, the column your model should learn to predict. Third, set the problem type. CorePlexML usually infers this automatically (classification for categorical targets, regression for numeric targets), but you can override it if needed. Fourth, configure the training budget by setting a maximum training time (in seconds) and optionally a maximum number of models to evaluate.

The SDK gives you full control over these parameters:

experiment = client.experiments.create(
    project_id="proj_abc123",
    dataset_version_id=dataset.version_id,
    target_column="churn",
    problem_type="classification",
    max_runtime_secs=600,       # 10 minutes of training
    max_models=30,              # Evaluate up to 30 models
    nfolds=5,                   # 5-fold cross-validation
    seed=42,                    # Reproducibility seed
    balance_classes=True,       # Handle class imbalance
    stopping_metric="AUC",      # Optimize for AUC
    exclude_algos=[]            # Include all algorithms
)

print(f"Experiment ID: {experiment.id}")
print(f"Status: {experiment.status}")

Behind the scenes, the AutoML engine kicks off a sophisticated search process. It begins with fast baseline models (a single GLM, a default Random Forest) to establish performance benchmarks, then progresses through increasingly complex algorithms. XGBoost and GBM models are trained with varied hyperparameters, Deep Learning models explore different network architectures, and finally, Stacked Ensembles are constructed that combine the best individual models into a single, more powerful predictor. All of this happens with automatic cross-validation, so every metric you see reflects genuine out-of-sample performance rather than overfitting to the training data.

You can monitor progress in real time through the UI, which shows a live leaderboard updating as each model finishes training.

Step 3: Review the Leaderboard

Once training completes, CorePlexML presents a model leaderboard ranked by your chosen stopping metric. This is where you evaluate which model best fits your needs. Each entry on the leaderboard includes the algorithm type, all relevant performance metrics, training time, and model size.

Understanding the metrics is essential for making good decisions:

AUC (Area Under the ROC Curve) measures a classification model's ability to distinguish between classes across all decision thresholds. Values range from 0.5 (random guessing) to 1.0 (perfect separation). AUC is particularly useful when you care about ranking predictions rather than a specific threshold.
Accuracy is the fraction of predictions that are correct. While intuitive, accuracy can be misleading with imbalanced datasets. A model predicting "no churn" for every customer achieves 95% accuracy if only 5% of customers churn, but it is completely useless.
Log Loss penalizes confident wrong predictions more heavily than uncertain ones. Lower is better. This metric is ideal when you need well-calibrated probability estimates, such as risk scoring or medical diagnosis.
RMSE (Root Mean Squared Error) applies to regression problems and measures the average magnitude of prediction errors in the same units as the target variable. Lower RMSE means more accurate predictions.

The algorithms you will typically see on the leaderboard include:

XGBoost is a gradient boosting framework known for strong performance on tabular data with excellent handling of missing values and regularization to prevent overfitting.
GBM (Gradient Boosting Machine) is H2O's native gradient boosting implementation with slightly different defaults and optimizations compared to XGBoost.
Deep Learning refers to H2O's multi-layer neural network implementation, which can capture complex non-linear relationships but may require more data to generalize well.
GLM (Generalized Linear Model) provides a fast, interpretable baseline. It works well when relationships between features and target are approximately linear.
Stacked Ensemble combines predictions from all other models using a meta-learner. Ensembles frequently top the leaderboard because they leverage the diverse strengths of individual models.

As a general rule, if interpretability is paramount, consider the best GLM or single-tree GBM. If raw predictive power is the priority, the Stacked Ensemble or top XGBoost model is usually the best choice.

Step 4: Explore and Explain

Selecting a model from the leaderboard is only the beginning. Before deploying, you need to understand why the model makes the predictions it does. CorePlexML provides a suite of explainability tools that help you build trust in your model and satisfy regulatory requirements for transparency.

Feature Importance ranks the input features by their contribution to model predictions. This gives you a global view of what the model learned. You can retrieve feature importance programmatically:

"hl-kw">curl -X GET "https://api.coreplexml.io/api/models/{model_id}/feature_importance" \
  -H "Authorization: Bearer your-api-key"

The response returns a ranked list of features with their relative importance scores. Use this to verify that the model is relying on sensible features. If an irrelevant column (like a row ID or timestamp) ranks highly, it may indicate data leakage that you should address before deployment.

SHAP (SHapley Additive exPlanations) Values provide instance-level explanations. While feature importance tells you what matters globally, SHAP values explain why the model made a specific prediction for a specific row. Each feature receives a positive or negative SHAP value indicating how much it pushed the prediction above or below the average. This is invaluable for regulated industries where you need to justify individual decisions, such as why a loan application was denied or why a patient was flagged as high-risk.

Partial Dependence Plots (PDPs) reveal the marginal relationship between a single feature and the model's predictions, holding all other features constant. For example, a PDP for "account_age" might show that churn probability decreases sharply for the first 12 months and then levels off. These plots help you translate model behavior into actionable business insights.

Note that Stacked Ensemble models do not support SHAP contributions or variable importance directly, due to their composite nature. If you need full explainability, consider deploying one of the top individual models (such as the best XGBoost) instead.

Step 5: Deploy to Production

Once you are satisfied with your model's performance and explanations, it is time to deploy it so applications can request real-time predictions. CorePlexML's MLOps module supports multiple deployment strategies to match your risk tolerance:

Direct deployment swaps the model in immediately. Best for development environments or low-risk use cases where speed matters more than caution.
Canary deployment routes a small percentage of traffic (typically 5-10%) to the new model while the existing model continues serving the majority. If the new model's metrics degrade, automatic rollback kicks in. If metrics hold steady, you gradually increase traffic until the new model serves 100%.
Blue-Green deployment runs two complete environments side by side and switches traffic atomically. This provides zero-downtime deployment with instant rollback capability, at the cost of temporarily doubled resource usage.

Here is how to create a deployment with the SDK:

deployment = client.deployments.create(
    project_id="proj_abc123",
    model_id=experiment.best_model_id,
    name="Churn Predictor v1",
    strategy="canary",
    traffic_percentage=10,          # Start with 10% traffic
    auto_rollback=True,             # Roll back on metric degradation
    rollback_threshold=0.05         # Roll back if AUC drops by 5%
)

print(f"Deployment ID: {deployment.id}")
print(f"Endpoint: {deployment.endpoint}")

Once deployed, your model is accessible via a REST endpoint. Send a JSON payload with the input features, and the endpoint returns predictions along with optional probability scores and SHAP contributions. You can also configure monitoring alerts to notify your team via Slack, email, or webhook when performance degrades or data drift is detected.

Troubleshooting

Even with AutoML handling most of the complexity, there are a few common issues you may encounter:

Target column not found. This usually means the column name in your experiment configuration does not exactly match the column name in the dataset schema. Column names are case-sensitive. Double-check spelling and ensure there are no trailing whitespace characters.

Insufficient rows. AutoML needs enough data to train, validate, and test models reliably. With 5-fold cross-validation, each fold must have enough rows to learn meaningful patterns. As a rough guideline, aim for at least 100 rows per class for classification problems, and more for datasets with many features.

Training timeout. If your maximum runtime is too short relative to your dataset size and number of models, the engine may only evaluate a handful of algorithms. Increase max_runtime_secs or reduce max_models to ensure the most promising algorithms have enough time to complete.

Class imbalance. When one class vastly outnumbers another (such as 95% non-fraud vs 5% fraud), models may learn to predict the majority class exclusively. Enable balance_classes=True in your experiment configuration, which applies oversampling and class weighting to help the model learn from minority class examples. Also consider using AUC or log loss as your stopping metric instead of accuracy, since these metrics are less sensitive to class distribution.

Memory errors during training. Very large datasets or many concurrent models can exhaust available memory. Consider reducing the dataset size by sampling, reducing max_models, or increasing your instance's memory allocation.

What's Next?

With your first model deployed, you have completed the core AutoML workflow. Here are the natural next steps to explore:

ML Studio provides What-If analysis, letting you create hypothetical scenarios and see how changing input features affects predictions. This is invaluable for business planning and sensitivity analysis.
Privacy Suite scans your datasets for personally identifiable information and applies transformations (masking, hashing, encryption) to achieve compliance with HIPAA, GDPR, PCI-DSS, and CCPA before training.
Auto-Retraining keeps your models fresh by automatically triggering new training runs when data drift is detected, when performance drops below a threshold, or on a fixed schedule.
SynthGen generates synthetic data using GANs when you need more training data or want to share realistic but non-sensitive datasets with partners.
A/B Testing lets you compare two model versions in production with statistical rigor, measuring which one performs better on your actual user population.

The entire workflow we covered in this tutorial, from data upload to production deployment, can be completed in under 10 minutes through the UI, or fully automated with the SDK for repeatable, CI/CD-integrated ML pipelines. The key insight behind AutoML is not that it replaces data science expertise, but that it amplifies it. By automating the repetitive search over algorithms and hyperparameters, AutoML frees you to focus on the decisions that truly require human judgment: defining the right problem, curating the right data, and interpreting the results in the context of your business.