Senior Analytics Engineer with strong foundations in data science, focused on building reliable analytics-ready data platforms, reporting layers, and predictive models that support statistical analysis and decision-making.
Over the past 4+ years, I’ve worked with international clients across the US, UK, Spain, Brazil, and Europe, designing data platforms and warehouse transformations with dbt, Snowflake, Redshift, Databricks, SQL, Python, CI/CD, and AWS in industries including insurance, airlines, sports, e-commerce, and IoT.
My work sits at the intersection of analytics engineering, data science, and applied statistics. Alongside modern data platform work, I have developed churn and reactivation propensity models for a football club loyalty program in Brazil and contributed to econometric analysis for a World Bank research project.
I am a Mechanical Engineer from UFRJ (cum laude) and hold a Master’s in Economics & Public Policy from UCLA, supported by a full scholarship from the Lemann Foundation. My academic background includes rigorous training in mathematics, statistics, econometrics, machine learning, and causal inference.
Earlier, I was the first employee at Stone Payments / StoneCo (NASDAQ: STNE), working directly with the founders, and founded an online math prep platform that helped low-income Brazilian students prepare for the GRE and GMAT exams.
Throughout my journey, I’ve been recognized with awards and scholarships from UCLA, Yale University, the Lemann Foundation, the General Electric Foundation, and The Club of Rome.
Focus: reliable data movement, schema control, and cost-efficient storage
Focus: analytical correctness, business logic, and scalable modeling
Focus: historical correctness and temporal consistency
Focus: trust, monitoring, and CI/CD-driven quality guarantees
Focus: ML-ready data, feature pipelines, and reproducibility
Tech Stack: Python (Pandas, NumPy, Statsmodels, scikit-learn, CausalInference)
Focus: estimating impact, not just predicting outcomes
Focus: translating data into actionable signals
Focus: analytical rigor, feature preparation, and reproducibility
Tech Stack: Quarto, Markdown, LaTeX, GitHub Pages
Focus: rigorous mathematical foundations, intuition-building, and applied causal reasoning
See all projects below!
This project uses a Dockerized environment to extract data from Postgres (as if it were data in “Production”). Then, it converts the data into Parquet files, saving them into an AWS S3 Bucket. I used my AWS Free Tier account and implemented the dbt-DuckDB adapter to expand dbt’s core functionality (transformation) into an ingestion layer.

This ETL pipeline uses Python functions to extract data from an external API and transform it into CSV files for downstream consumption by Tableau or other visualization tools. The project runs in a Dockerized environment with PostgreSQL and Jupyter Notebook for interactive exploration.

This project extracts Parquet files stored in S3 using Snowflake External Tables. dbt performs transformations and materializes dimension and fact tables in the Silver layer, along with aggregated tables in the Gold schema, following the Medallion Architecture and Kimball Dimensional Modeling.

This project expands a previous Python-based ETL to simulate a real-world migration to dbt. Data is extracted from multiple CSV files, and transformation and loading are performed in PostgreSQL via dbt, following Bronze, Silver, and Gold layers and a star schema design.

This guide covers four essential pillars of Snowflake mastery:

This project implements a Slowly Changing Dimension (SCD) Type 2 to track historical changes in product status using a CDC stream as the source. The pipeline ensures ordered, deduplicated events, idempotency, and basic data quality checks via stored procedures.

This project provides a lightweight observability layer for raw Stripe data ingested into S3 via Meltano. The goal is to validate the raw layer before downstream transformations.
Key features include:
boto3
This project uses a Dockerized environment to extract Parquet and CSV data from S3 and load it into PostgreSQL, following the Medallion Architecture and object-oriented transformation design.

This project builds an end-to-end Python ETL pipeline designed for machine learning use cases. The pipeline runs in Docker, uses PostgreSQL and Jupyter Notebook, and follows the Medallion Architecture and Kimball star schema to produce ML-ready feature tables.

Measuring the Effect of a New Recommendation System on an E-Commerce Marketplace

Measuring the Effect of a New Customer-Satisfaction Program on an Airline Company

Focus: analytical rigor, feature preparation, and statistical best practices
When datasets are large, it can take forever for a Machine Learning model to make predictions. This project focuses on storing and encoding categorical data efficiently without changing dataset size.
Best practices for parsing, standardizing, and validating date, time, and time zone data prior to modeling.
I have a strong interest in teaching and in building clear bridges between mathematical foundations, statistical reasoning, and real-world data science practice. I care deeply about rigor, intuition, and the responsible use of quantitative methods in decision-making.
I’m developing a long-term open study book (and future course) focused on the mathematical and statistical foundations underlying Data Science, Econometrics, and Causal Machine Learning. The goal is to make advanced concepts accessible without sacrificing rigor, and to connect theory directly to modern ML and applied data problems.
The project is freely available online:
Foundations of Data Science & Causal Machine Learning – A Mathematical Journey