FuncionalidadesCasos de UsoBlogReferência APIPor Que CorePlexMLPreços
Começar Grátis
← Voltar ao Blog
Tutoriais5 min read

O Guia Completo do Python SDK do CorePlexML

CorePlexML Team·

Introducao

O CorePlexML e uma plataforma API-first. Cada operacao disponivel na interface web esta respaldada por uma API REST. O Python SDK envolve essa API em uma biblioteca cliente idiomatica e fortemente tipada que cuida da autenticacao, serializacao, tratamento de erros e paginacao para que voce possa focar nos seus fluxos de trabalho de ML.

Seja automatizando pipelines de treinamento em CI/CD, construindo dashboards personalizados ou integrando o CorePlexML em uma plataforma de dados maior, o SDK e seu ponto de integracao principal.

Instalacao

Instale o SDK via PyPI:

"hl-kw">pip install coreplexml

O SDK requer Python 3.9 ou posterior. Tem dependencias minimas: requests para HTTP, pydantic para modelos de resposta, e nenhum framework pesado de ML.

Autenticacao

from coreplexml import CorePlexMLClient

client = CorePlexMLClient(
    base_url="https://api.coreplexml.io",
    api_key="sk_your_api_key"
)

Para uso em producao, armazene a chave API em uma variavel de ambiente:

import os

client = CorePlexMLClient(
    base_url=os.environ["COREPLEXML_URL"],
    api_key=os.environ["COREPLEXML_API_KEY"]
)

Os Seis Modulos do SDK

1. Projects

Projetos sao a unidade organizacional de nivel superior:

project = client.projects.create(name="Customer Churn Analysis")

projects = client.projects.list()
for p in projects["items"]:
    print(f"{p['id']}: {p['name']}")

2. Datasets

Datasets suportam uploads versionados com deteccao automatica de esquema:

dataset = client.datasets.upload(
    project_id="proj_abc123",
    file_path="/data/customers.csv",
    name="Customer Data"
)
print(f"Dataset: {dataset['id']}, Version: {dataset['version_id']}")

schema = client.datasets.get_schema(version_id=dataset["version_id"])
for col in schema["columns"]:
    print(f"  {col['name']}: {col['type']} "
          f"(missing: {col['missing_pct']}%)")

3. Experiments

Experimentos executam trabalhos de treinamento AutoML:

experiment = client.experiments.create(
    project_id="proj_abc123",
    dataset_version_id="dv_xyz789",
    target_column="Churn",
    problem_type="classification",
    max_runtime_secs=600,
    max_models=20
)
print(f"Training job: {experiment['job_id']}")

client.jobs.wait(experiment["job_id"], timeout=1200)

leaderboard = client.experiments.get_leaderboard(
    experiment_id=experiment["experiment_id"]
)
for rank, model in enumerate(leaderboard["models"], 1):
    print(f"  #{rank} {model['algorithm']}: "
          f"AUC={model['metrics']['auc']:.4f}")

4. Deployments

Implantacoes tornam modelos treinados disponiveis para predicoes em tempo real:

deployment = client.deployments.create(
    project_id="proj_abc123",
    model_id=leaderboard["models"][0]["id"],
    name="Churn Predictor v1",
    strategy="canary",
    canary_percent=10
)

prediction = client.deployments.predict(
    deployment_id=deployment["id"],
    features={
        "tenure": 24,
        "MonthlyCharges": 70.0,
        "Contract": "Month-to-month",
        "InternetService": "Fiber optic"
    }
)
print(f"Prediction: {prediction['result']}")
print(f"Confidence: {prediction['probability']:.2f}")

As quatro estrategias sao direct, canary, blue_green e shadow.

5. Privacy

O modulo Privacy escaneia datasets em busca de PII e aplica transformacoes de anonimizacao:

scan = client.privacy.scan(
    dataset_version_id="dv_xyz789"
)
print(f"PII columns found: {scan['pii_count']}")

transform = client.privacy.transform(
    dataset_version_id="dv_xyz789",
    profile="HIPAA",
    rules=[
        {"column": "email", "action": "mask"},
        {"column": "ssn", "action": "redact"},
        {"column": "zip_code", "action": "generalize", "level": 3}
    ]
)
print(f"Anonymized version: {transform['output_version_id']}")

O modulo Privacy suporta mais de 72 tipos de PII em quatro perfis de conformidade: HIPAA, GDPR, PCI-DSS e CCPA.

6. SynthGen

SynthGen gera dados tabulares sinteticos:

model = client.synthgen.create_model(
    dataset_version_id="dv_xyz789",
    engine="CTGAN",
    epochs=300
)
client.jobs.wait(model["job_id"])

synthetic = client.synthgen.generate(
    model_id=model["model_id"],
    num_rows=10000
)
print(f"Generated: {synthetic['row_count']} rows")

Tres motores disponiveis: CTGAN, CopulaGAN e TVAE.

Tratamento de Erros

O SDK lanca excecoes tipadas:

from coreplexml.exceptions import (
    AuthenticationError,
    NotFoundError,
    ValidationError,
    RateLimitError,
    ServerError
)

try:
    prediction = client.deployments.predict(
        deployment_id="dep_nonexistent",
        features={"tenure": 24}
    )
except AuthenticationError:
    print("Invalid or expired API key")
except NotFoundError as e:
    print(f"Resource not found: {e.resource_id}")
except ValidationError as e:
    print(f"Invalid input: {e.details}")
except RateLimitError as e:
    print(f"Rate limited. Retry after {e.retry_after} seconds")
except ServerError:
    print("Platform error. Retry or contact support")

Para erros transitorios, implemente uma estrategia de retentativa:

import time

def predict_with_retry(client, deployment_id, features, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.deployments.predict(
                deployment_id=deployment_id,
                features=features
            )
        except RateLimitError as e:
            if attempt < max_retries - 1:
                time.sleep(e.retry_after)
            else:
                raise
        except ServerError:
            if attempt < max_retries - 1:
                time.sleep(2 ** attempt)
            else:
                raise

Integracao CI/CD

O SDK e projetado para automacao de pipelines. Aqui esta um workflow GitHub Actions:

name: ML Pipeline
on:
  push:
    branches: [main]
    paths: ["data/**", "config/**"]

jobs:
  train-and-deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install SDK
        run: pip install coreplexml

      - name: Train Model
        env:
          COREPLEXML_URL: ${{ secrets.COREPLEXML_URL }}
          COREPLEXML_API_KEY: ${{ secrets.COREPLEXML_API_KEY }}
        run: python scripts/train.py

      - name: Validate Model
        run: python scripts/validate.py

      - name: Deploy Model
        if: success()
        run: python scripts/deploy.py

Configuracao Baseada em Ambiente

import os

ENV = os.environ.get("DEPLOY_ENV", "dev")

config = {
    "dev": {
        "url": "https://dev.coreplexml.io",
        "key": os.environ.get("COREPLEXML_DEV_KEY")
    },
    "staging": {
        "url": "https://staging.coreplexml.io",
        "key": os.environ.get("COREPLEXML_STAGING_KEY")
    },
    "prod": {
        "url": "https://api.coreplexml.io",
        "key": os.environ.get("COREPLEXML_PROD_KEY")
    }
}

client = CorePlexMLClient(
    base_url=config[ENV]["url"],
    api_key=config[ENV]["key"]
)

Novos Modulos do SDK (v2.4+)

Model Registry

version = client.registry.create_version(
    project_id="proj_abc",
    model_id="mod_xgb_v2",
    version="1.2.0",
    model_card={"description": "Improved feature set", "metrics": {"auc": 0.94}}
)

client.registry.transition_stage(version["id"], stage="production")

A/B Testing

test = client.ab_tests.create(
    project_id="proj_abc",
    model_a_id="mod_v1",
    model_b_id="mod_v2",
    traffic_split_a=50,
    primary_metric="accuracy",
    min_sample_size=1000
)

results = client.ab_tests.get_results(test["id"])
if results["is_significant"]:
    client.ab_tests.declare_winner(test["id"], variant="B")

Alertas e Monitoramento

channel = client.alerts.create_channel(
    name="Slack Ops",
    channel_type="slack",
    config={"webhook_url": "https://hooks.slack.com/..."}
)

rule = client.alerts.create_rule(
    deployment_id="dep_prod",
    name="Drift Alert",
    metric="drift_psi",
    operator="gt",
    threshold=0.2,
    severity="critical",
    channel_ids=[channel["id"]]
)

Predicoes em Lote e Streaming

job = client.predictions.create(
    deployment_id="dep_prod",
    file_path="batch_input.csv"
)
result = client.predictions.wait(job["id"])
client.predictions.download(job["id"], "predictions.csv")

for row in client.streaming.predict(deployment_id="dep_prod", data=records):
    process(row["prediction"])

Dicas

Use variaveis de ambiente para as chaves API. Nunca inclua chaves API no controle de versao.

Predicoes em lote para maior desempenho. Se precisa pontuar muitos registros, use o endpoint de trabalho de predicao em lote em vez de chamar o endpoint individual em um loop.

Consulte o status do trabalho para operacoes longas. Use client.jobs.wait() para scripts simples, ou implemente seu proprio loop de consulta com client.jobs.get().

Fixe a versao do SDK. Use coreplexml==X.Y.Z no seu arquivo de requisitos em vez de uma instalacao sem versao fixa.

Verifique os limites de taxa. A API aplica limites de taxa por chave. Em pipelines de producao, sempre implemente logica de retentativa.

O SDK coloca todo o poder da plataforma CorePlexML nos seus scripts Python, notebooks e pipelines de CI/CD. Para a referencia completa, consulte nossa pagina do Python SDK e a documentacao da API.