Model Registry: Version, Stage, and Govern Your ML Models
Why Model Governance Matters
In the early days of a machine learning practice, governance feels like overhead. You have one model, one dataset, and one data scientist who knows everything about both. But as the practice grows, the complexity compounds rapidly. Multiple teams train models on overlapping datasets. Versions proliferate as experiments branch and merge. A model that was deployed six months ago starts drifting, and no one remembers which dataset version it was trained on, which hyperparameters produced it, or which experiment showed it was the best candidate.
Without governance, answering basic questions becomes difficult or impossible. Which model is currently serving production traffic? What data was it trained on? Who approved it for deployment? What changed between v2 and v3? If an auditor asks how a particular prediction was made, can you trace it back to the training data and the algorithm configuration?
Model governance answers these questions systematically. It provides a single source of truth for every model artifact, its lineage, its performance characteristics, and its lifecycle status. In regulated industries like finance and healthcare, governance is not optional. Regulations require audit trails, explainability, and the ability to reproduce any model that influenced a decision. But even in unregulated environments, governance prevents the organizational chaos that slows teams down and introduces hidden risks.
CorePlexML's model registry is the central component of its governance framework. Every model produced by an experiment or retraining job is automatically registered with complete metadata, and the registry tracks the model through its entire lifecycle from candidate to production to retirement.
Semantic Versioning for ML Models
Software engineers have long used semantic versioning (major.minor.patch) to communicate the nature of changes between releases. CorePlexML applies the same convention to ML models, adapted for the unique characteristics of model changes.
Major version (X.0.0) indicates a fundamental change to the model. This includes switching the algorithm family (for example, from gradient boosting to a neural network), changing the target variable, significantly altering the feature set, or retraining on a fundamentally different dataset. Major version changes carry the highest risk and typically require thorough validation before deployment.
Minor version (1.X.0) indicates an incremental improvement to the model. This includes retraining on updated data while keeping the same feature set and algorithm, adding a small number of new features, or tuning hyperparameters. Minor versions are expected to be backward-compatible in terms of input schema and output format.
Patch version (1.0.X) indicates a non-functional change. This includes updating model metadata, fixing documentation, or re-registering a model with corrected tags. Patch versions do not change the model artifact itself.
from coreplexml import CorePlexClient
client = CorePlexClient(
base_url="https://api.coreplexml.io",
api_key="your-api-key"
)
# Register a new model version
version = client.registry.create_version(
project_id="proj_abc123",
model_id="model_xgb_churn",
version="2.1.0",
description="Retrained on Q4 2025 data with 3 new behavioral features",
experiment_id="exp_q4_retrain",
dataset_version_id="dsv_q4_2025",
metrics={
"auc": 0.912,
"accuracy": 0.867,
"precision": 0.843,
"recall": 0.791,
"f1": 0.816,
"log_loss": 0.298
},
parameters={
"algorithm": "xgboost",
"max_depth": 6,
"learning_rate": 0.1,
"n_estimators": 250,
"nfolds": 5,
"stopping_metric": "AUC"
},
tags=["candidate", "q4-retrain", "behavioral-features"]
)
print(f"Registered: {version.model_id} v{version.version}")
print(f"Stage: {version.stage}")
The registry enforces version uniqueness within a model. You cannot register two versions with the same version number, which prevents the confusion that arises when multiple artifacts claim to be the same version.
Stage Management
Every model version in the registry has a lifecycle stage that reflects its current status. CorePlexML defines four stages with clear semantics and controlled transitions.
Development is the initial stage for newly registered models. The model has been trained and its metrics recorded, but it has not yet been evaluated for production readiness. Most models spend their entire lives in this stage, as only a fraction of experimental models progress further.
Staging indicates that the model has been selected as a candidate for production and is undergoing validation. This might include offline evaluation on held-out datasets, shadow deployment against the current production model, or review by a model governance committee. The transition from development to staging is a deliberate decision that signals intent to deploy.
Production means the model is actively serving predictions in a production environment. Only one version of a given model should be in the production stage at any time. Transitioning a new version to production automatically transitions the previous production version to archived. This enforces the invariant that production deployments have a clear, unambiguous model identity.
Archived is the terminal stage for models that are no longer serving traffic. Archived models remain in the registry with their full metadata and lineage intact, but they cannot be deployed without first transitioning back through staging. This preserves the audit trail while preventing accidental reactivation of retired models.
# Transition a model from development to staging
client.registry.transition_stage(
model_id="model_xgb_churn",
version="2.1.0",
target_stage="staging",
notes="Approved for shadow deployment by ML review board. "
"AUC improvement of 2.3% over current production model.",
approved_by="maria.chen@company.com"
)
# Later, promote from staging to production
client.registry.transition_stage(
model_id="model_xgb_churn",
version="2.1.0",
target_stage="production",
notes="Shadow deployment completed successfully. "
"7-day comparison shows consistent improvement across all segments.",
approved_by="james.wu@company.com"
)
Stage transitions are immutable events in the registry's audit log. Each transition records who initiated it, when it occurred, and the notes provided. This creates a complete approval chain that satisfies compliance requirements and provides organizational accountability.
Model Cards
A model card is a structured document that accompanies a model version and describes its purpose, capabilities, limitations, and appropriate use cases. CorePlexML generates model cards automatically from training metadata and allows teams to supplement them with human-written context.
# Create a detailed model card
client.registry.update_model_card(
model_id="model_xgb_churn",
version="2.1.0",
model_card={
"description": "Customer churn prediction model for subscription "
"products. Predicts probability of churn within "
"the next 30 days.",
"intended_use": "Real-time scoring of active subscribers to "
"prioritize retention outreach.",
"limitations": "Trained on US market data only. Performance may "
"degrade for international customers. Not validated "
"for enterprise tier accounts.",
"ethical_considerations": "Model uses behavioral features only. "
"No demographic or protected class "
"features are included.",
"training_data_summary": {
"source": "Production subscription database",
"date_range": "2025-01-01 to 2025-12-31",
"total_rows": 247_500,
"positive_class_ratio": 0.087,
"feature_count": 18
},
"performance_by_segment": {
"enterprise": {"auc": 0.89, "samples": 12_400},
"professional": {"auc": 0.91, "samples": 85_300},
"starter": {"auc": 0.93, "samples": 149_800}
}
}
)
Model cards serve multiple audiences. Data scientists use them to understand a model's strengths and weaknesses before building on top of it. Product managers use them to assess whether a model is appropriate for a new use case. Compliance officers use them to verify that the model meets regulatory requirements. By centralizing this information in the registry, CorePlexML ensures that everyone works from the same source of truth.
Lineage Tracking
Model lineage answers the question: where did this model come from? CorePlexML tracks lineage automatically, connecting each model version to the experiment that produced it, the dataset version it was trained on, and the parent model version it was derived from.
# Retrieve full lineage for a model version
lineage = client.registry.get_lineage(
model_id="model_xgb_churn",
version="2.1.0"
)
print(f"Model: {lineage.model_id} v{lineage.version}")
print(f"Experiment: {lineage.experiment_id}")
print(f" Algorithm: {lineage.experiment.algorithm}")
print(f" Training time: {lineage.experiment.training_duration_secs}s")
print(f" Max models evaluated: {lineage.experiment.max_models}")
print(f"Dataset version: {lineage.dataset_version_id}")
print(f" Dataset: {lineage.dataset.name}")
print(f" Rows: {lineage.dataset.row_count}")
print(f" Created: {lineage.dataset.created_at}")
print(f"Parent version: {lineage.parent_version or 'None (initial)'}")
print(f"\nVersion chain:")
for ancestor in lineage.version_chain:
print(f" v{ancestor.version} ({ancestor.stage}) - "
f"{ancestor.created_at.strftime('%Y-%m-%d')}")
The version chain provides a complete history of model evolution. You can trace a production model back through every iteration to the original dataset and experiment. This is invaluable for debugging (when did a regression get introduced?), for compliance (prove that the training data did not contain prohibited information), and for organizational learning (which training configurations consistently produce the best results?).
Lineage tracking also enables automated impact analysis. When a dataset is found to contain errors or when a data source is deprecated, CorePlexML can identify every model that was trained on that data, including models currently in production. This turns a potentially difficult investigation into a simple query.
Searching and Organizing with Tags
As your model registry grows, finding the right model version becomes a search problem. CorePlexML's tagging system lets you organize models with arbitrary labels and filter by tag in queries.
# Search for models by tag
champion_models = client.registry.list_versions(
project_id="proj_abc123",
tags=["champion"],
stage="production"
)
for model in champion_models:
print(f"{model.model_id} v{model.version} - "
f"AUC: {model.metrics['auc']:.4f} - "
f"Tags: {', '.join(model.tags)}")
# Search across all projects with filters
candidates = client.registry.search(
query="churn",
stage="staging",
min_metric={"auc": 0.85},
created_after="2026-01-01",
tags=["candidate"]
)
Common tagging conventions include lifecycle labels like "champion" and "candidate" and "baseline," organizational labels like team names or business units, and technical labels like "requires-gpu" or "ensemble" or "lightweight." Tags are free-form, so teams can develop conventions that fit their workflow without being constrained by the platform.
Comparing Model Versions
Before promoting a model, you often need to compare it against the current production version or against other candidates. CorePlexML provides a dedicated comparison interface that highlights differences across metrics, parameters, and lineage:
comparison = client.registry.compare_versions(
model_id="model_xgb_churn",
versions=["2.0.0", "2.1.0"]
)
print(f"Comparing {comparison.versions[0]} vs {comparison.versions[1]}")
print(f"\n{'Metric':<20} {'v2.0.0':<12} {'v2.1.0':<12} {'Delta':<12}")
print("-" * 56)
for metric in comparison.metric_diffs:
print(f"{metric.name:<20} {metric.values[0]:<12.4f} "
f"{metric.values[1]:<12.4f} {metric.delta:+<12.4f}")
print(f"\nParameter changes:")
for param in comparison.parameter_diffs:
print(f" {param.name}: {param.values[0]} -> {param.values[1]}")
print(f"\nDataset changes:")
print(f" v2.0.0 trained on: {comparison.datasets[0].name} "
f"({comparison.datasets[0].row_count} rows)")
print(f" v2.1.0 trained on: {comparison.datasets[1].name} "
f"({comparison.datasets[1].row_count} rows)")
The comparison output makes it clear what changed and what the impact was. This is particularly useful in review meetings where a team is deciding whether to promote a candidate to production. Instead of comparing spreadsheets, everyone looks at the same structured comparison from the registry.
Integration with Deployments
The model registry is tightly integrated with CorePlexML's deployment system. When you create a deployment, you reference a specific registered model version. The deployment system validates that the version exists, records the link in the registry, and updates the version's deployment history.
# Deploy a registered model version
deployment = client.deployments.create(
project_id="proj_abc123",
registry_version={
"model_id": "model_xgb_churn",
"version": "2.1.0"
},
name="Churn Predictor Production",
strategy="canary",
canary_config={
"stages": [
{"traffic_percentage": 5, "duration_minutes": 60},
{"traffic_percentage": 25, "duration_minutes": 120},
{"traffic_percentage": 50, "duration_minutes": 180},
{"traffic_percentage": 100, "duration_minutes": 0}
],
"auto_advance": True,
"auto_rollback": True
}
)
# The registry now tracks this deployment
version_info = client.registry.get_version(
model_id="model_xgb_churn",
version="2.1.0"
)
print(f"Active deployments: {len(version_info.deployments)}")
for dep in version_info.deployments:
print(f" {dep.name} ({dep.strategy}) - {dep.status}")
This bidirectional link ensures consistency. The registry knows which versions are deployed and where, and the deployment system knows the full provenance of the model it is serving. When a deployment is retired, the registry is updated automatically.
Best Practices
Register every model, even failed experiments. The registry is not just for production models. Registering failed experiments with their metrics and notes creates institutional knowledge about what does not work, which is often as valuable as knowing what does. A junior data scientist who is about to try an approach that was already tested and abandoned can find that information in the registry.
Bump the major version for risky changes. When you change the algorithm, the feature set, or the training data source, increment the major version. This signals to downstream consumers that the model's behavior may have changed in ways that require validation. Minor versions should be safe to deploy with standard canary procedures.
Write meaningful model cards. Auto-generated metadata is a starting point, not a substitute for human context. The most valuable parts of a model card are the limitations section (what the model should not be used for) and the ethical considerations section (what biases might be present). Take the time to write these thoughtfully.
Use approval workflows for production transitions. Do not allow any model to reach the production stage without a recorded approval from a designated reviewer. CorePlexML's stage transition system supports this by requiring an approved_by field and persisting the approval in the audit log. This is a lightweight governance control that prevents accidental or unauthorized deployments.
Clean up old versions periodically. While archived versions should be retained for compliance and audit purposes, development-stage versions from old experiments can accumulate rapidly. Establish a retention policy (for example, archive development versions older than 90 days) and apply it consistently.
Standardize tags across teams. Tags are only useful for search if teams use them consistently. Establish a shared vocabulary for common tags and document it. The difference between tagging a model "prod-ready" versus "production-ready" versus "approved" seems trivial until you are searching for all production-approved models and miss half of them.
A model registry is the backbone of a mature ML practice. It transforms model management from an ad-hoc process that relies on individual knowledge into a systematic practice that scales with your team and satisfies the governance requirements of enterprise ML. Every model that touches production data should have a registered version with full lineage, metrics, and lifecycle tracking.
For more about CorePlexML's model governance and MLOps capabilities, visit the MLOps features page.