O Guia Completo do Python SDK do CorePlexML
Introducao
O CorePlexML e uma plataforma API-first. Cada operacao disponivel na interface web esta respaldada por uma API REST. O Python SDK envolve essa API em uma biblioteca cliente idiomatica e fortemente tipada que cuida da autenticacao, serializacao, tratamento de erros e paginacao para que voce possa focar nos seus fluxos de trabalho de ML.
Seja automatizando pipelines de treinamento em CI/CD, construindo dashboards personalizados ou integrando o CorePlexML em uma plataforma de dados maior, o SDK e seu ponto de integracao principal.
Instalacao
Instale o SDK via PyPI:
"hl-kw">pip install coreplexml
O SDK requer Python 3.9 ou posterior. Tem dependencias minimas: requests para HTTP, pydantic para modelos de resposta, e nenhum framework pesado de ML.
Autenticacao
from coreplexml import CorePlexMLClient
client = CorePlexMLClient(
base_url="https://api.coreplexml.io",
api_key="sk_your_api_key"
)
Para uso em producao, armazene a chave API em uma variavel de ambiente:
import os
client = CorePlexMLClient(
base_url=os.environ["COREPLEXML_URL"],
api_key=os.environ["COREPLEXML_API_KEY"]
)
Os Seis Modulos do SDK
1. Projects
Projetos sao a unidade organizacional de nivel superior:
project = client.projects.create(name="Customer Churn Analysis")
projects = client.projects.list()
for p in projects["items"]:
print(f"{p['id']}: {p['name']}")
2. Datasets
Datasets suportam uploads versionados com deteccao automatica de esquema:
dataset = client.datasets.upload(
project_id="proj_abc123",
file_path="/data/customers.csv",
name="Customer Data"
)
print(f"Dataset: {dataset['id']}, Version: {dataset['version_id']}")
schema = client.datasets.get_schema(version_id=dataset["version_id"])
for col in schema["columns"]:
print(f" {col['name']}: {col['type']} "
f"(missing: {col['missing_pct']}%)")
3. Experiments
Experimentos executam trabalhos de treinamento AutoML:
experiment = client.experiments.create(
project_id="proj_abc123",
dataset_version_id="dv_xyz789",
target_column="Churn",
problem_type="classification",
max_runtime_secs=600,
max_models=20
)
print(f"Training job: {experiment['job_id']}")
client.jobs.wait(experiment["job_id"], timeout=1200)
leaderboard = client.experiments.get_leaderboard(
experiment_id=experiment["experiment_id"]
)
for rank, model in enumerate(leaderboard["models"], 1):
print(f" #{rank} {model['algorithm']}: "
f"AUC={model['metrics']['auc']:.4f}")
4. Deployments
Implantacoes tornam modelos treinados disponiveis para predicoes em tempo real:
deployment = client.deployments.create(
project_id="proj_abc123",
model_id=leaderboard["models"][0]["id"],
name="Churn Predictor v1",
strategy="canary",
canary_percent=10
)
prediction = client.deployments.predict(
deployment_id=deployment["id"],
features={
"tenure": 24,
"MonthlyCharges": 70.0,
"Contract": "Month-to-month",
"InternetService": "Fiber optic"
}
)
print(f"Prediction: {prediction['result']}")
print(f"Confidence: {prediction['probability']:.2f}")
As quatro estrategias sao direct, canary, blue_green e shadow.
5. Privacy
O modulo Privacy escaneia datasets em busca de PII e aplica transformacoes de anonimizacao:
scan = client.privacy.scan(
dataset_version_id="dv_xyz789"
)
print(f"PII columns found: {scan['pii_count']}")
transform = client.privacy.transform(
dataset_version_id="dv_xyz789",
profile="HIPAA",
rules=[
{"column": "email", "action": "mask"},
{"column": "ssn", "action": "redact"},
{"column": "zip_code", "action": "generalize", "level": 3}
]
)
print(f"Anonymized version: {transform['output_version_id']}")
O modulo Privacy suporta mais de 72 tipos de PII em quatro perfis de conformidade: HIPAA, GDPR, PCI-DSS e CCPA.
6. SynthGen
SynthGen gera dados tabulares sinteticos:
model = client.synthgen.create_model(
dataset_version_id="dv_xyz789",
engine="CTGAN",
epochs=300
)
client.jobs.wait(model["job_id"])
synthetic = client.synthgen.generate(
model_id=model["model_id"],
num_rows=10000
)
print(f"Generated: {synthetic['row_count']} rows")
Tres motores disponiveis: CTGAN, CopulaGAN e TVAE.
Tratamento de Erros
O SDK lanca excecoes tipadas:
from coreplexml.exceptions import (
AuthenticationError,
NotFoundError,
ValidationError,
RateLimitError,
ServerError
)
try:
prediction = client.deployments.predict(
deployment_id="dep_nonexistent",
features={"tenure": 24}
)
except AuthenticationError:
print("Invalid or expired API key")
except NotFoundError as e:
print(f"Resource not found: {e.resource_id}")
except ValidationError as e:
print(f"Invalid input: {e.details}")
except RateLimitError as e:
print(f"Rate limited. Retry after {e.retry_after} seconds")
except ServerError:
print("Platform error. Retry or contact support")
Para erros transitorios, implemente uma estrategia de retentativa:
import time
def predict_with_retry(client, deployment_id, features, max_retries=3):
for attempt in range(max_retries):
try:
return client.deployments.predict(
deployment_id=deployment_id,
features=features
)
except RateLimitError as e:
if attempt < max_retries - 1:
time.sleep(e.retry_after)
else:
raise
except ServerError:
if attempt < max_retries - 1:
time.sleep(2 ** attempt)
else:
raise
Integracao CI/CD
O SDK e projetado para automacao de pipelines. Aqui esta um workflow GitHub Actions:
name: ML Pipeline
on:
push:
branches: [main]
paths: ["data/**", "config/**"]
jobs:
train-and-deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install SDK
run: pip install coreplexml
- name: Train Model
env:
COREPLEXML_URL: ${{ secrets.COREPLEXML_URL }}
COREPLEXML_API_KEY: ${{ secrets.COREPLEXML_API_KEY }}
run: python scripts/train.py
- name: Validate Model
run: python scripts/validate.py
- name: Deploy Model
if: success()
run: python scripts/deploy.py
Configuracao Baseada em Ambiente
import os
ENV = os.environ.get("DEPLOY_ENV", "dev")
config = {
"dev": {
"url": "https://dev.coreplexml.io",
"key": os.environ.get("COREPLEXML_DEV_KEY")
},
"staging": {
"url": "https://staging.coreplexml.io",
"key": os.environ.get("COREPLEXML_STAGING_KEY")
},
"prod": {
"url": "https://api.coreplexml.io",
"key": os.environ.get("COREPLEXML_PROD_KEY")
}
}
client = CorePlexMLClient(
base_url=config[ENV]["url"],
api_key=config[ENV]["key"]
)
Novos Modulos do SDK (v2.4+)
Model Registry
version = client.registry.create_version(
project_id="proj_abc",
model_id="mod_xgb_v2",
version="1.2.0",
model_card={"description": "Improved feature set", "metrics": {"auc": 0.94}}
)
client.registry.transition_stage(version["id"], stage="production")
A/B Testing
test = client.ab_tests.create(
project_id="proj_abc",
model_a_id="mod_v1",
model_b_id="mod_v2",
traffic_split_a=50,
primary_metric="accuracy",
min_sample_size=1000
)
results = client.ab_tests.get_results(test["id"])
if results["is_significant"]:
client.ab_tests.declare_winner(test["id"], variant="B")
Alertas e Monitoramento
channel = client.alerts.create_channel(
name="Slack Ops",
channel_type="slack",
config={"webhook_url": "https://hooks.slack.com/..."}
)
rule = client.alerts.create_rule(
deployment_id="dep_prod",
name="Drift Alert",
metric="drift_psi",
operator="gt",
threshold=0.2,
severity="critical",
channel_ids=[channel["id"]]
)
Predicoes em Lote e Streaming
job = client.predictions.create(
deployment_id="dep_prod",
file_path="batch_input.csv"
)
result = client.predictions.wait(job["id"])
client.predictions.download(job["id"], "predictions.csv")
for row in client.streaming.predict(deployment_id="dep_prod", data=records):
process(row["prediction"])
Dicas
Use variaveis de ambiente para as chaves API. Nunca inclua chaves API no controle de versao.
Predicoes em lote para maior desempenho. Se precisa pontuar muitos registros, use o endpoint de trabalho de predicao em lote em vez de chamar o endpoint individual em um loop.
Consulte o status do trabalho para operacoes longas. Use client.jobs.wait() para scripts simples, ou implemente seu proprio loop de consulta com client.jobs.get().
Fixe a versao do SDK. Use coreplexml==X.Y.Z no seu arquivo de requisitos em vez de uma instalacao sem versao fixa.
Verifique os limites de taxa. A API aplica limites de taxa por chave. Em pipelines de producao, sempre implemente logica de retentativa.
O SDK coloca todo o poder da plataforma CorePlexML nos seus scripts Python, notebooks e pipelines de CI/CD. Para a referencia completa, consulte nossa pagina do Python SDK e a documentacao da API.