Why I’m Interested in Causal ML?
Overview
My interest in Causal Machine Learning (Causal ML) arises from a long-standing divide in how different scientific and data communities think about data.
Statistics and econometrics have traditionally aimed to explain the world, while machine learning has focused on predicting it. Causal ML is where these worlds meet — combining the interpretability and rigor of econometrics with the flexibility and scalability of machine learning.
The Two Worlds (at a Glance)
In his influential essay Statistical Modeling: The Two Cultures (2001), Leo Breiman described two very different traditions.
| Lens | What you assume | What you optimize | Typical outputs |
|---|---|---|---|
| Data Modeling (Statistics/Econometrics) | A stochastic data-generating process (DGP) | Valid inference under assumptions | Parameters, confidence intervals, p-values |
| Algorithmic Modeling (Machine Learning) | DGP is unknown; data are i.i.d. | Predictive accuracy & generalization | Predictions, cross-validation scores, error curves |
Both cultures ask: what can we learn from data? — but their answers differ profoundly.
World 1 — The Statistics and Econometrics Tradition
(Model → Inference)
Did you know that when you run a regression, you are implicitly making a strong claim about how the world works?
In this tradition, everything begins with the data-generating process (DGP) — what Aris Spanos in Probability Theory and Statistical Inference calls a stochastic mechanism. Imagine it as a hidden machine tossing probabilistic dice, producing the data we observe.
A statistical model is not just a tool; it’s a story about that hidden mechanism — a simplified mathematical narrative about how outcomes arise.
Examples:
- A linear regression assumes wages, test scores, or prices are generated as a straight-line combination of inputs plus random noise.
- A Poisson model assumes counts follow a specific probabilistic law.
- A time-series AR(1) assumes today’s value depends on yesterday’s, plus a random shock.
If the DGP is hidden, what do we do? — We try to uncover its truth.
That’s why introductory statistics begins with the idea of a Population and then introduces Samples. Since we can’t observe the whole population, we rely on probabilistic laws like the Law of Large Numbers (LLN) and the Central Limit Theorem (CLT) to argue that sample averages approximate population truths.
From here arises Inference:
> If I trust my assumptions about the DGP, I can use my sample to make representative claims about the population.
This is the heart of classical statistics and econometrics:
1. Model the hidden process.
2. Estimate its parameters.
3. Test whether your story holds up against data.
But remember: the DGP is always an assumption. If it’s wrong, inference can mislead.
World 2 — The Machine Learning Tradition
(Data → Prediction)
Now comes the other world — the algorithmic modeling culture.
Here, the hidden mechanism is ignored. No one asks whether wages are “truly linear” or if prices follow a Poisson law. The question is brutally pragmatic:
Can I predict well?
The central challenge here is generalization — ensuring that a model trained on one dataset (the training set) performs well on unseen data (the test set).
Key trade-offs: - Underfitting: model too simple → high bias, low accuracy.
- Overfitting: model too complex → captures noise, poor generalization.
Success is measured by predictive accuracy, not by unbiasedness or efficiency.
Interpretability is optional; performance is everything.
But this comes with risks — strong predictions may fail when the underlying data distribution shifts.
When Each Culture Wins — and Loses
Let’s make this distinction concrete.
Example 1 — Estimating the causal effect of education on wages
In econometrics, I’d model wages as a function of education, experience, and other covariates to isolate the causal effect of education.
A pure ML model might predict wages well, but it wouldn’t reveal whether education causes higher wages — maybe it’s just correlated with family wealth or job networks.
ML excels at prediction, but fails at interpretation and identification.
Example 2 — Predicting house prices
An econometrician might specify a linear model with a few variables and strong assumptions — too simplistic to capture reality.
ML models like gradient boosting can handle thousands of variables and complex nonlinearities, achieving far higher predictive accuracy — even if they can’t “explain” why.
So:
- Statistics shines when we seek mechanistic understanding.
- ML shines when we need accurate prediction.
- Each fails where the other succeeds.
Why Causal ML Is the Synthesis
✨ This is why Causal Machine Learning excites me.
It’s the bridge — combining ML’s flexibility with econometrics’ discipline of identification.
Researchers like Victor Chernozhukov, Christian Hansen, Nathan Kallus, Martin Spindler, and Vasilis Syrgkanis show that we can merge both worlds:
- From machine learning, borrow flexible algorithms for complex patterns.
- From econometrics, borrow identification strategies to isolate causality.
- From statistics, borrow inference tools for uncertainty and robustness.
A Real-World Case — Online Advertising
A standard supervised ML model might predict that showing more ads increases purchases, because ads and purchases are positively correlated.
But randomized experiments revealed a much smaller — sometimes zero — causal effect.
Why? Because users likely to purchase were already targeted with more ads.
Prediction looked strong.
Causality revealed the truth.
This illustrates why prediction ≠ causation — and why combining both is powerful.
How Causal ML Works (Intuition First)
Causal ML methods are sophisticated, but their core intuitions are simple.
- Sample splitting / cross-fitting: train nuisance models on one part of the data, test causal relations on another — prevents “reusing” the same information and overfitting.
- Orthogonalization (Double ML): construct estimators that remain unbiased even if nuisance models are imperfect.
- Heterogeneous treatment effects: use trees, forests, or boosting to uncover who benefits most.
- Beyond IID: Causal inference helps reason about distribution shifts, interventions, and counterfactuals — what if tomorrow doesn’t look like today?
Together, these ideas unify prediction and inference under one theoretical umbrella.
Focus of This Project
My main focus is to master the mathematical foundations that underpin Causal ML — to move beyond using tools, and toward understanding why they work.
Mathematical depth transforms intuition into clarity. Too often, methods are applied skillfully but without true comprehension. This project aims to bridge that gap.
“Mathematical maturity is not about memorizing formulas,
but about seeing how different ideas connect.”
The Foundational Path (The Journey Ahead)
To understand Causal ML properly, we’ll build the entire mathematical backbone step by step:
- Logic & Set Theory — precision and proof techniques.
- Real Analysis — limits, continuity, optimization.
- Linear Algebra — geometry of models and data.
- Functional Analysis & Hilbert Spaces — kernels, projections, and RKHS.
- Topology & Measure Theory — σ-algebras and integration.
- Probability — rigorous uncertainty, LLN/CLT, expectations.
- Mathematical Statistics — estimation, hypothesis testing, asymptotics.
- Statistical Learning Theory (Interlude) — generalization, VC dimension, ERM.
- Causality — SCMs, potential outcomes, identifiability, and Causal ML methods.
Each phase builds a bridge — from logic to learning, from inference to causation.
Causal ML is not just a toolkit.
It’s a synthesis — the meeting point of mathematical rigor, algorithmic flexibility, and causal reasoning.
Understanding its foundations is the first step toward using it wisely.