Data Scientist with strong foundations in data engineering, focused on building reliable analytics and data platforms that support statistical analysis and decision-making.
Over the past ~4 years, I’ve worked across the US, UK, Spain, and Brazil designing end-to-end data platforms using dbt, Snowflake, Redshift, SQL, Python, and AWS in industries including airlines, sports, e-commerce, and IoT.
My work sits at the intersection of analytics engineering and data science, with a strong background in causal inference and statistical modeling from academia, and growing industry exposure to applied modeling.
I hold a background in Mechanical Engineering from UFRJ (Federal Univeristy of Rio de Janeiro) and a Master’s in Economics & Public Policy from UCLA (University of California Los Angeles), with a full scholarship from the Lemann Foundation.
Earlier, I was the first employee at Stone Payments (NASDAQ: STNE) and founded an online math prep platform that helped low-income Brazilian students prepare for GRE and GMAT exams, where I improved my entrepreneurial mindset and communication skills.
Throughout my journey, I’ve been recognized with awards and scholarships from UCLA, Yale University, the Lemann Foundation, the General Electric Foundation, and The Club of Rome.
Focus: reliable data movement, schema control, and cost-efficient storage
Focus: analytical correctness, business logic, and scalable modeling
Focus: historical correctness and temporal consistency
Focus: trust, monitoring, and CI/CD-driven quality guarantees
Focus: ML-ready data, feature pipelines, and reproducibility
Tech Stack: Python (Pandas, NumPy, Statsmodels, scikit-learn, CausalInference)
Focus: estimating impact, not just predicting outcomes
Focus: translating data into actionable signals
Focus: analytical rigor, feature preparation, and reproducibility
Tech Stack: Quarto, Markdown, LaTeX, GitHub Pages
Focus: rigorous mathematical foundations, intuition-building, and applied causal reasoning
See all projects below!
This project uses a Dockerized environment to extract data from Postgres (as if it were data in “Production”). Then, it converts the data into Parquet files, saving them into an AWS S3 Bucket. I used my AWS Free Tier account and implemented the dbt-DuckDB adapter to expand dbt’s core functionality (transformation) into an ingestion layer.

This ETL pipeline uses Python functions to extract data from an external API and transform it into CSV files for downstream consumption by Tableau or other visualization tools. The project runs in a Dockerized environment with PostgreSQL and Jupyter Notebook for interactive exploration.

This project extracts Parquet files stored in S3 using Snowflake External Tables. dbt performs transformations and materializes dimension and fact tables in the Silver layer, along with aggregated tables in the Gold schema, following the Medallion Architecture and Kimball Dimensional Modeling.

This project expands a previous Python-based ETL to simulate a real-world migration to dbt. Data is extracted from multiple CSV files, and transformation and loading are performed in PostgreSQL via dbt, following Bronze, Silver, and Gold layers and a star schema design.

This guide covers four essential pillars of Snowflake mastery:

This project implements a Slowly Changing Dimension (SCD) Type 2 to track historical changes in product status using a CDC stream as the source. The pipeline ensures ordered, deduplicated events, idempotency, and basic data quality checks via stored procedures.

This project provides a lightweight observability layer for raw Stripe data ingested into S3 via Meltano. The goal is to validate the raw layer before downstream transformations.
Key features include:
boto3
This project uses a Dockerized environment to extract Parquet and CSV data from S3 and load it into PostgreSQL, following the Medallion Architecture and object-oriented transformation design.

This project builds an end-to-end Python ETL pipeline designed for machine learning use cases. The pipeline runs in Docker, uses PostgreSQL and Jupyter Notebook, and follows the Medallion Architecture and Kimball star schema to produce ML-ready feature tables.

Measuring the Effect of a New Recommendation System on an E-Commerce Marketplace

Measuring the Effect of a New Customer-Satisfaction Program on an Airline Company

Focus: analytical rigor, feature preparation, and statistical best practices
When datasets are large, it can take forever for a Machine Learning model to make predictions. This project focuses on storing and encoding categorical data efficiently without changing dataset size.
Best practices for parsing, standardizing, and validating date, time, and time zone data prior to modeling.
I have a strong interest in teaching and in building clear bridges between mathematical foundations, statistical reasoning, and real-world data science practice. I care deeply about rigor, intuition, and the responsible use of quantitative methods in decision-making.
I’m developing a long-term open study book (and future course) focused on the mathematical and statistical foundations underlying Data Science, Econometrics, and Causal Machine Learning. The goal is to make advanced concepts accessible without sacrificing rigor, and to connect theory directly to modern ML and applied data problems.
The project is freely available online:
Foundations of Data Science & Causal Machine Learning – A Mathematical Journey