© 2025 Caio Velasco. All rights reserved.
This is my structured learning roadmap to prepare for research in causal machine learning with mathematical rigor.
Here you can find both mathematical foundations and applications.
Journey Phases
Why I’m Interested in Causal ML?
- Statistics and econometrics build models on top of stochastic mechanisms (data-generating processes - DGPs), aiming for explanation and inference.
- Machine learning often ignores those mechanisms and focuses on prediction accuracy, generalization, and performance.
- Causal ML is a synthesis: it combines ML’s flexibility with statistics’ concern for identification, adding a causal lens to reason about interventions and counterfactuals.
Phase 1 - Logic & Set Theory
Goal: Build comfort with the language of mathematics.
- Proof techniques: direct, contrapositive, contradiction, induction.
- Sets, Cartesian products, power sets.
- Functions and relations.
- Cardinality: countable vs. uncountable sets.
Theory Output: Formal proofs and mathematical maturity.
Application Project: Formalize simple statistical statements using logical quantifiers and analyze which assumptions are required for conclusions to hold. (not very hands-on, I know, but having mathematical maturity will change your life!)
References:
- Velleman - How to Prove It
Phase 2 - Linear Algebra
Goal: Understand the geometry underlying estimation and projection.
- Vector spaces, bases, dimension.
- Linear transformations and matrices.
- Inner product spaces and orthogonality.
- Eigenvalues, spectral theorem (finite-dimensional).
- Singular value decomposition.
Theory Output: Proof of spectral theorem (symmetric case), rank-nullity theorem.
Application Project: Derive and implement PCA via SVD.
References:
- Axler - Linear Algebra Done Right
Phase 3 - Real Analysis (ℝ)
Goal: Rigorous calculus and convergence.
- Sequences and limits.
- Continuity and compactness in ℝ.
- Differentiation and Mean Value Theorem.
- Uniform convergence.
Theory Output: ε-δ proofs, compactness arguments.
Application Project: Analyze convergence of gradient descent under convexity assumptions.
References:
- Rudin - Principles of Mathematical Analysis
- Tao - Analysis I
Phase 4 - Metric Spaces
Goal: Introduce minimal topology needed for probability and asymptotics.
- Metric spaces and convergence.
- Open/closed sets.
- Completeness.
- Compactness in metric spaces.
Theory Output: General convergence beyond ℝ.
Application Project: Compare different notions of convergence in simulated estimators.
References:
- Rudin - Principles of Mathematical Analysis (Ch. 2)
- Abbott - Understanding Analysis
- Munkres - Topology (selected sections on metric spaces only)
Phase 5 - Measure Theory
Goal: Build the foundation of integration and probability.
- σ-algebras and measurable functions.
- Lebesgue measure and integration.
- Convergence theorems (MCT, DCT).
- Product measures and Fubini.
Theory Output: Construction of Lebesgue integral and convergence theorems.
Application Project: Monte Carlo integration and convergence analysis.
References:
- Schilling - Measures, Integrals and Martingales
Phase 6 - Probability (Measure-Theoretic)
Goal: Define uncertainty rigorously.
- Probability spaces and random variables.
- Modes of convergence.
- LLN and CLT.
- Conditional expectation as L² projection.
Theory Output: Proof sketches of LLN/CLT; conditional expectation as projection.
Application Project: A/B testing simulation with asymptotic confidence intervals.
References:
- Durrett - Probability: Theory and Examples
Phase 7 - Mathematical Statistics
Goal: Move from probability to inference.
- Likelihood and estimation (MLE).
- Consistency and asymptotic normality.
- Delta method.
- Efficiency and influence functions.
Theory Output: Consistency of MLE; asymptotic normality.
Application Project: Logistic regression - prove Bernoulli MLE consistency and simulate convergence.
References:
- Casella & Berger - Statistical Inference
Phase 8 - Decision & Risk
Goal: Formalize optimal decisions under uncertainty.
- Loss functions.
- Risk and Bayes risk.
- Minimax and admissibility.
- Connection between estimation and decision rules.
Theory Output: Basic optimality results under quadratic loss.
Application Project: Compare decision rules under different loss functions.
References:
- Berger - Statistical Decision Theory and Bayesian Analysis
- Lehmann & Casella - Theory of Point Estimation
- Ferguson - Mathematical Statistics: A Decision Theoretic Approach
Phase 9 - Uncertainty Quantification
Goal: Make uncertainty explicit and operational.
- Parameter uncertainty (confidence intervals, bootstrap).
- Predictive distributions.
- Robustness under misspecification.
- Conformal inference (distribution-free guarantees).
- Uncertainty propagation.
Theory Output: Coverage guarantees and bootstrap consistency.
Application Project: Implement bootstrap intervals and conformal prediction for regression.
References:
- Efron & Tibshirani - An Introduction to the Bootstrap
- Wasserman - All of Statistics (bootstrap & nonparametrics sections)
- van der Vaart - Asymptotic Statistics (confidence sets & efficiency)
- Romano, Shafer & Candès - Conformal inference papers
- Rasmussen & Williams - Gaussian Processes for Machine Learning (predictive uncertainty perspective)
Phase 10 - Statistical Learning Theory
Goal: Bridge inference and prediction rigorously.
- Empirical Risk Minimization (ERM).
- Generalization bounds.
- VC dimension and capacity control.
- Regularization and bias-variance tradeoff.
Theory Output: Derive simple generalization bound via Hoeffding’s inequality.
Application Project: Simulate empirical vs. generalization error under increasing model complexity.
References:
- Shalev-Shwartz & Ben-David - Understanding Machine Learning
- Breiman - Statistical Modeling: The Two Cultures
Phase 11 - Causal Inference + Causal ML
Goal: Reason about interventions with statistical guarantees.
- Structural Causal Models and potential outcomes.
- Identifiability (backdoor/frontdoor).
- Orthogonalization and Double ML.
- Valid uncertainty for treatment effects.
Theory Output: Proof sketches of backdoor criterion and Neyman orthogonality.
Application Project: Implement Double ML for heterogeneous treatment effects.
References:
- Pearl - Causality
- Peters, Janzing, Schölkopf - Elements of Causal Inference
- Chernozhukov et al. - Causal ML papers