Documentation

Installation

Install the Watchtower SDK from PyPI using pip:

pip install watchtower

Requirements: Python 3.8+ • The SDK depends on requests, pandas, and numpy.

Configuration

The SDK uses environment variables for zero-config setup in production. Set these before running your application:

# Required: Your project's API key from the Watchtower dashboard
export WATCHTOWER_API_KEY="your_project_api_key"

# Required for cloud: Your deployed backend URL
export WATCHTOWER_API_URL="https://watchtower-ai-production-604f.up.railway.app"

If WATCHTOWER_API_URL is not set, the SDK defaults to http://localhost:8000.

Tip: You can also pass api_key and endpoint directly to any monitor constructor. Environment variables are simply the recommended approach for production.

Quick Start

Here's the fastest way to start logging data:

import pandas as pd
from watchtower.monitor import WatchtowerInputMonitor

# Initialize (reads WATCHTOWER_API_KEY and WATCHTOWER_API_URL from env)
monitor = WatchtowerInputMonitor(project_name="My ML Project")

# Load and send your data
df = pd.read_csv("production_data.csv")
response = monitor.log(df)
print(response)

That's it! Your data is now being monitored for drift and quality issues on the Watchtower dashboard.

SDK 1 Feature Monitoring — `WatchtowerInputMonitor`

This is the primary SDK for monitoring tabular/structured data. Use it to log feature vectors (model inputs) so Watchtower can detect data drift, validate data quality, and alert you when your production data deviates from training data.

Constructor

Parameter	Type	Required	Description
`project_name`	str	Yes	The name of your project (must match the project created on the dashboard).
`api_key`	str	No	API key. Falls back to `WATCHTOWER_API_KEY` env var.
`endpoint`	str	No	Backend URL. Falls back to `WATCHTOWER_API_URL` env var.

Usage & Examples

Logging a Pandas DataFrame

import pandas as pd
from watchtower.monitor import WatchtowerInputMonitor

monitor = WatchtowerInputMonitor(project_name="Credit Scoring v2", api_key="your_project_api_key", endpoint="https://watchtower-ai-production-604f.up.railway.app")

df = pd.DataFrame({
    "age": [25, 34, 45, 52, 61],
    "income": [45000, 78000, 92000, 55000, 110000],
    "credit_score": [680, 720, 750, 630, 800],
    "loan_amount": [15000, 25000, 35000, 10000, 50000]
})

response = monitor.log(df, stage="model_input")
print(response)

Logging with Custom Metadata

from datetime import datetime

response = monitor.log(
    features=df,
    stage="model_input",
    event_time=datetime(2026, 2, 13, 12, 0, 0),
    metadata={"batch_id": "batch_042", "environment": "production"}
)

Supported Data Formats: The log() method accepts Pandas DataFrames, Python dictionaries, lists of dicts, and NumPy arrays. All are automatically serialized.

Drift Detection Tests

Once enough data is ingested, Watchtower automatically runs the following statistical tests to detect drift between your baseline (training) data and current (production) data:

μ

Mean Shift

Measures the relative change in the mean value of each feature. A large shift indicates the central tendency of your data has changed.

Statistical

M̃

Median Shift

Measures the relative change in the median. More robust to outliers than the mean, useful for skewed distributions.

Statistical

σ²

Variance Shift

Detects changes in the spread/dispersion of your data. A widening or narrowing variance often signals upstream data pipeline issues.

Statistical

KS

Kolmogorov-Smirnov Test

A non-parametric test that compares the entire cumulative distribution. If p-value < threshold, the distributions are statistically different.

Distribution

Ψ

Population Stability Index (PSI)

Quantifies how much the distribution has shifted. PSI < 0.1 = no drift, 0.1–0.25 = moderate drift, > 0.25 = significant drift.

Distribution

🌲

Model-Based Drift

Trains a RandomForest classifier to distinguish between baseline and current data. If accuracy > 50% threshold, drift is detected.

ML-Based

Threshold Configuration

Watchtower uses sensible defaults for all drift thresholds. You can customize them per-project via the dashboard or the API.

Threshold	Default Value	Description
`mean_threshold`	0.10 (10%)	Maximum allowed relative change in mean before flagging drift.
`median_threshold`	0.10 (10%)	Maximum allowed relative change in median.
`variance_threshold`	0.20 (20%)	Maximum allowed relative change in variance.
`ks_pvalue_threshold`	0.05	If p-value is below this, the KS test flags drift.
`psi_thresholds`	[0.1, 0.25]	PSI severity bands: < 0.1 = None, 0.1–0.25 = Moderate, > 0.25 = High.
`psi_bins`	10	Number of histogram bins used for PSI calculation.
`min_samples`	50	Minimum data points required for valid statistical tests.
`alert_threshold`	2	Number of individual test failures needed to trigger an overall drift alert.
`model_based_drift_threshold`	0.50	RandomForest accuracy above this value indicates drift.

Data Quality Checks

Beyond drift, Watchtower automatically performs quality checks on every batch of data you log:

Missing Values: Identifies columns with null/NaN values and reports the percentage per column.
Duplicate Rows: Detects and counts duplicate records in the batch.
Schema Validation: Verifies that the number of columns and their data types match the expected schema from the first batch.
LLM Interpretation: An AI-powered summary of drift results, explaining what changed and why it matters.

SDK 2 Prediction Monitoring — `WatchtowerModelMonitor`

Use this SDK to monitor your model outputs and performance metrics over time. It supports both classification and regression models.

Constructor

Parameter	Type	Required	Description
`project_name`	str	Yes	The name of your project.
`api_key`	str	No	API key. Falls back to env var.
`endpoint`	str	No	Backend URL. Falls back to env var.
`model_type`	str	No	`"classification"` or `"regression"`.

Usage & Examples

Logging Predictions with Metrics (Classification)

from watchtower.monitor import WatchtowerModelMonitor

model_monitor = WatchtowerModelMonitor(
    project_name="Fraud Detector",
    model_type="classification",
    api_key="your_project_api_key",
    endpoint="https://watchtower-ai-production-604f.up.railway.app"
)

# Log predictions along with current performance metrics
predictions = [0, 1, 0, 0, 1, 1, 0, 1]

response = model_monitor.log(
    predictions=predictions,
    accuracy=0.92,
    precision=0.89,
    recall=0.95,
    f1_score=0.91,
    roc_auc=0.96,
    metadata={"batch_id": "eval_batch_7"}
)
print(response)

Logging Predictions with Metrics (Regression)

model_monitor = WatchtowerModelMonitor(
    project_name="House Price Predictor",
    model_type="regression"
)

predictions = [250000, 180000, 320000, 410000]

response = model_monitor.log(
    predictions=predictions,
    mae=12500.0,
    mse=225000000.0,
    rmse=15000.0,
    r2_score=0.87
)

Supported Metrics

Classification

accuracy — Overall correctness (0–1)
precision — True positives / predicted positives
recall — True positives / actual positives
f1_score — Harmonic mean of precision & recall
roc_auc — Area under the ROC curve

Regression

mae — Mean Absolute Error
mse — Mean Squared Error
rmse — Root Mean Squared Error
r2_score — R-squared coefficient

SDK 3 LLM Monitoring — `WatchtowerLLMMonitor`

Designed for Generative AI / LLM applications. Log every prompt-response pair and get automated analysis for toxicity, response quality, token usage, and semantic drift.

Constructor

Parameter	Type	Required	Description
`api_key`	str	Yes	API key for authentication.
`project_name`	str	Yes	The name of your LLM project.
`endpoint`	str	No	Backend URL. Defaults to `http://localhost:8000`.
`timeout`	int	No	Request timeout in seconds. Default: `60`.

Usage & Examples

Logging an LLM Interaction

from watchtower.llm_monitor import WatchtowerLLMMonitor

llm_monitor = WatchtowerLLMMonitor(
    api_key="your_api_key",
    project_name="Customer Support Bot",
    endpoint="https://watchtower-ai-production-604f.up.railway.app"
)

response = llm_monitor.log_interaction(
    input_text="How do I reset my password?",
    response_text="Navigate to Settings > Security > Reset Password. You will receive a confirmation email.",
    metadata={
        "model": "gpt-4",
        "latency_ms": 320,
        "user_id": "user_abc123",
        "session_id": "sess_789"
    }
)
print(response)

Batch Logging in a Loop

# Log multiple interactions from a conversation
conversations = [
    {"input": "What are your hours?", "output": "We are open 9 AM - 5 PM, Mon-Fri."},
    {"input": "Can I speak to a manager?", "output": "I'll transfer you to our management team."},
]

for conv in conversations:
    llm_monitor.log_interaction(
        input_text=conv["input"],
        response_text=conv["output"]
    )

Evaluation Features

When you log LLM interactions, Watchtower automatically evaluates them on the backend:

🛡️

Toxicity Detection

Each response is scanned using the Detoxify library. Scores above the configurable threshold (default: 0.5) are flagged as toxic.

Safety

📊

Token Length Tracking

Response token lengths are tracked over time. Sudden increases or decreases in verbosity can signal model behavior changes.

Performance

📉

Token Length Drift

Compares average token lengths between baseline and monitoring windows. Drift threshold default: 15% change.

Distribution

🧠

LLM Judge Evaluation

Uses a secondary LLM to evaluate response quality, relevance, and hallucination risk with configurable thresholds.

AI-Powered

Installation

Configuration

Quick Start

SDK 1 Feature Monitoring — WatchtowerInputMonitor

Constructor

Usage & Examples

Logging a Pandas DataFrame

Logging with Custom Metadata

Drift Detection Tests

Mean Shift

Median Shift

Variance Shift

Kolmogorov-Smirnov Test

Population Stability Index (PSI)

Model-Based Drift

Threshold Configuration

Data Quality Checks

SDK 2 Prediction Monitoring — WatchtowerModelMonitor

Constructor

Usage & Examples

Logging Predictions with Metrics (Classification)

Logging Predictions with Metrics (Regression)

Supported Metrics

Classification

Regression

SDK 3 LLM Monitoring — WatchtowerLLMMonitor

Constructor

Usage & Examples

Logging an LLM Interaction

Batch Logging in a Loop

Evaluation Features

Toxicity Detection

Token Length Tracking

Token Length Drift

LLM Judge Evaluation

Welcome Back

Create Account

SDK 1 Feature Monitoring — `WatchtowerInputMonitor`

SDK 2 Prediction Monitoring — `WatchtowerModelMonitor`

SDK 3 LLM Monitoring — `WatchtowerLLMMonitor`