I’m a Mechanical Engineer from Brazil with a Master’s in Economics & Public Policy from UCLA. In 2021, I left a PhD in Economics in the Netherlands to take care of my family during the pandemic and transitioned fully into Data Science.
My journey blends engineering, economics, and data, with experience spanning Business Analysis, Consulting, Data & Analytics Engineering, and Data Science across the US, UK, Spain, and Brazil. I thrive at the intersection of advanced analytics and real-world impact.
Along the way, I’ve been honored with awards and scholarships from Yale University, UCLA, General Electric Foundation, Lemann Foundation, and The Club of Rome. Earlier in my career, I helped build Stone Payments (NASDAQ: STNE) and founded MePrepara, an online math prep platform with 140+ videos that helped low-income Brazilian students prepare for GRE/GMAT exams.
I bring not only strong technical skills (Python, SQL, dbt, Snowflake/Redshift, AWS, Looker, Advanced Mathematics, Statistics, Econometrics, and Machine Learning) but also entrepreneurial drive, teaching ability, and leadership. Education changed my life, and I aim to use data and technology to create the same opportunities for others.
See all projects below!
This project provides a lightweight observability layer for raw Stripe data landing in S3 from Meltano ingestion. The goal is to give immediate confidence in the raw layer before any downstream transformations or analytics.
Key features:
run_checks.sh
orchestrates all checks and provides immediate feedback on failures..env
file for credentials and configuration.boto3
to interact with S3 securely and efficiently.This task involves implementing a Slowly Changing Dimension (SCD) Type 2 to track changes to a product’s status over time within Snowflake. The source for this dimension is a Change Data Capture (CDC) stream that logs all data modification events (DML operations) from a transactional system. The main goal is to maintain historical records of product status changes, based on an ordered and deduplicated stream of changes assuring idempotency and with basic data quality checks.
This projects uses a Dockerized environment to extract data both Parquet and CSV Data from S3 Buckets to Load and Transform them in PostgreSQL, following the Medallion Architecture.
This projects uses a Dockerized environment to extract data from Postgres (as if it were data in “Production”). Then, it converts the data into a Parquet files, saving them into AWS S3 Bucket. I used my AWS Free Tier account and implemented the dbt-DuckDB adapter to expand dbt’s core function (the Transformation step) into an Ingestion machine.
This project uses a Dockerized environment to extract Parquet files stored in S3 Buckets. External Tables were In Snowflake following Snowflake’s Storage Integration and External Stage procedures. Then, dbt perfors the Transformation step and materialize dimension and facts in the Silver Layer and Aggregated tables in the Gold schema, following the Medallion Architecture and Kimbal’s Dimensional Modeling.
I built a Python ETL pipeline using Python functions to perform ETL steps. This project runs within a Dockerized environment, using PostgreSQL as a database and Jupyter Notebook as a quick way to interact with the data and materialize schemas and tables. The ETL process followed the Medallion Architecture (bronze, silver, and gold schemas) and Kimbal’s Dimensional Modeling (Star Schema).
I expanded a previous work to mimic a project where we want to migrate Python ETL Processes to dbt, within a Dockerized environment. The data is extracted from multiple CSV files and both the Transformation and Loading steps are done against PostgreSQL, via dbt. The ETL process followed the Medallion Architecture (bronze, silver, and gold schemas) and Kimbal’s Dimensional Modeling (Star Schema).
This ETL pipeline uses Python functions to perform ETL steps, extracting from an external API and transforming the data to be saved as CSV files for later use by Tableau or any other visualization tool. This project runs within a Dockerized environment, using PostgreSQL as a database and Jupyter Notebook as a quick way to interact with the data.
When datasets are large, it can take forever for a Machine Learning model to make predictions. We want to make sure that data is stored efficiently without having to change the size of the dataset.
Best Practices when cleaning dates, time, and time zone.
As hobbies, I play football competitively (forward), it’s my passion. I have played in amateur leagues in Brazil, USA, and the Netherlands. I also have a strong passion for teaching and educating others. A personal characteristic I am proud of is the ability to transform very complex subjects into intuitive topics for any audience.
I find happiness in the little things in life and I also learned a lot from every mistake I have made so far (and still do).
I have a passion for teaching and I have been trained by amazing professors in top notch universities around the globe.
Therefore, I have started to write a book that belongs to a (future) course I call “An Intuitive Course in Probability (and Statistics), for data science. The idea is to provide strong intuition for every major concept while keeping the mathematical formalization and rigor very close. I had this idea after taking a Probability Theory course from MIT. I am a fan. It will be available both in English and Portuguese.
Please, check the English version here and the Portuguese version aqui!
It’s a working in progress, so you may find only part of Chapter 1 now.
Sometimes, I try to contribute to some interesting communities. You can check an example below.