Tech/Calibration Pipeline

Calibration Pipeline

Platt scaling and Gaussian CDF for well-calibrated bracket probabilities

Why Calibration Matters

Raw model outputs are not calibrated probabilities. A model that says 70% should be correct 70% of the time for optimal Kelly sizing. Without calibration, the Kelly criterion will systematically over- or under-size positions, destroying expected value even when the model has genuine edge.

Gaussian CDF Bracket Mapping

Given the predicted mean (mu) and standard deviation (sigma) from the LightGBM dual model, we compute the probability of temperature falling in each bracket using the Gaussian CDF. For a bracket [a, b], the probability is Phi((b - mu) / sigma) - Phi((a - mu) / sigma), where Phi is the standard normal CDF from scipy.stats.norm.

Uses scipy.stats.norm.cdf for precise computation
Handles edge brackets: (-inf, a] and [b, +inf)
Probabilities sum to 1.0 across all brackets
Typical markets have 9-11 temperature brackets

Platt Scaling Calibrator

After CDF mapping, a Platt scaling layer (logistic regression on log-odds) corrects for systematic biases. The calibrator is fit on a held-out validation set using sklearn's CalibratedClassifierCV equivalent. This brings Brier scores from ~0.08 uncalibrated down to ~0.05 calibrated, a meaningful improvement for trading.

Logistic regression: calibrated_prob = 1 / (1 + exp(-(a * raw + b)))
Fit on held-out temporal validation data
Brier score improvement: 0.08 -> 0.05
Saved as calibrator_v2.pkl (4.2 KB)