Author

Caio Velasco

Published

March 19, 2026

© 2025 Caio Velasco. All rights reserved.

This is my structured learning roadmap to prepare for research in causal machine learning with mathematical rigor.

Here you can find both mathematical foundations and applications.

Journey Phases

Why I’m Interested in Causal ML?

Statistics and econometrics build models on top of stochastic mechanisms (data-generating processes - DGPs), aiming for explanation and inference.
Machine learning often ignores those mechanisms and focuses on prediction accuracy, generalization, and performance.
Causal ML is a synthesis: it combines ML’s flexibility with statistics’ concern for identification, adding a causal lens to reason about interventions and counterfactuals.

Phase 1 - Logic & Set Theory

Goal: Build comfort with the language of mathematics.

Proof techniques: direct, contrapositive, contradiction, induction.
Sets, Cartesian products, power sets.
Functions and relations.
Cardinality: countable vs. uncountable sets.

Theory Output: Formal proofs and mathematical maturity.
Application Project: Formalize simple statistical statements using logical quantifiers and analyze which assumptions are required for conclusions to hold. (not very hands-on, I know, but having mathematical maturity will change your life!)

References:
- Velleman - How to Prove It

Phase 2 - Linear Algebra

Goal: Understand the geometry underlying estimation and projection.

Vector spaces, bases, dimension.
Linear transformations and matrices.
Inner product spaces and orthogonality.
Eigenvalues, spectral theorem (finite-dimensional).
Singular value decomposition.

Theory Output: Proof of spectral theorem (symmetric case), rank-nullity theorem.
Application Project: Derive and implement PCA via SVD.

References:
- Axler - Linear Algebra Done Right

Phase 3 - Real Analysis (ℝ)

Goal: Rigorous calculus and convergence.

Sequences and limits.
Continuity and compactness in ℝ.
Differentiation and Mean Value Theorem.
Uniform convergence.

Theory Output: ε-δ proofs, compactness arguments.
Application Project: Analyze convergence of gradient descent under convexity assumptions.

References:
- Rudin - Principles of Mathematical Analysis
- Tao - Analysis I

Phase 4 - Metric Spaces

Goal: Introduce minimal topology needed for probability and asymptotics.

Metric spaces and convergence.
Open/closed sets.
Completeness.
Compactness in metric spaces.

Theory Output: General convergence beyond ℝ.
Application Project: Compare different notions of convergence in simulated estimators.

References:
- Rudin - Principles of Mathematical Analysis (Ch. 2)
- Abbott - Understanding Analysis
- Munkres - Topology (selected sections on metric spaces only)

Phase 5 - Measure Theory

Goal: Build the foundation of integration and probability.

σ-algebras and measurable functions.
Lebesgue measure and integration.
Convergence theorems (MCT, DCT).
Product measures and Fubini.

Theory Output: Construction of Lebesgue integral and convergence theorems.
Application Project: Monte Carlo integration and convergence analysis.

References:
- Schilling - Measures, Integrals and Martingales

Phase 6 - Probability (Measure-Theoretic)

Goal: Define uncertainty rigorously.

Probability spaces and random variables.
Modes of convergence.
LLN and CLT.
Conditional expectation as L² projection.

Theory Output: Proof sketches of LLN/CLT; conditional expectation as projection.
Application Project: A/B testing simulation with asymptotic confidence intervals.

References:
- Durrett - Probability: Theory and Examples

Phase 7 - Mathematical Statistics

Goal: Move from probability to inference.

Likelihood and estimation (MLE).
Consistency and asymptotic normality.
Delta method.
Efficiency and influence functions.

Theory Output: Consistency of MLE; asymptotic normality.
Application Project: Logistic regression - prove Bernoulli MLE consistency and simulate convergence.

References:
- Casella & Berger - Statistical Inference

Phase 8 - Decision & Risk

Goal: Formalize optimal decisions under uncertainty.

Loss functions.
Risk and Bayes risk.
Minimax and admissibility.
Connection between estimation and decision rules.

Theory Output: Basic optimality results under quadratic loss.
Application Project: Compare decision rules under different loss functions.

References:
- Berger - Statistical Decision Theory and Bayesian Analysis
- Lehmann & Casella - Theory of Point Estimation
- Ferguson - Mathematical Statistics: A Decision Theoretic Approach

Phase 9 - Uncertainty Quantification

Goal: Make uncertainty explicit and operational.

Parameter uncertainty (confidence intervals, bootstrap).
Predictive distributions.
Robustness under misspecification.
Conformal inference (distribution-free guarantees).
Uncertainty propagation.

Theory Output: Coverage guarantees and bootstrap consistency.
Application Project: Implement bootstrap intervals and conformal prediction for regression.

References:
- Efron & Tibshirani - An Introduction to the Bootstrap
- Wasserman - All of Statistics (bootstrap & nonparametrics sections)
- van der Vaart - Asymptotic Statistics (confidence sets & efficiency)
- Romano, Shafer & Candès - Conformal inference papers
- Rasmussen & Williams - Gaussian Processes for Machine Learning (predictive uncertainty perspective)

Phase 10 - Statistical Learning Theory

Goal: Bridge inference and prediction rigorously.

Empirical Risk Minimization (ERM).
Generalization bounds.
VC dimension and capacity control.
Regularization and bias-variance tradeoff.

Theory Output: Derive simple generalization bound via Hoeffding’s inequality.
Application Project: Simulate empirical vs. generalization error under increasing model complexity.

References:
- Shalev-Shwartz & Ben-David - Understanding Machine Learning
- Breiman - Statistical Modeling: The Two Cultures

Phase 11 - Causal Inference + Causal ML

Goal: Reason about interventions with statistical guarantees.

Structural Causal Models and potential outcomes.
Identifiability (backdoor/frontdoor).
Orthogonalization and Double ML.
Valid uncertainty for treatment effects.

Theory Output: Proof sketches of backdoor criterion and Neyman orthogonality.
Application Project: Implement Double ML for heterogeneous treatment effects.

References:
- Pearl - Causality
- Peters, Janzing, Schölkopf - Elements of Causal Inference
- Chernozhukov et al. - Causal ML papers