Caio Velasco

I’m a Mechanical Engineer from Brazil with a Master’s in Economics & Public Policy from UCLA. In 2021, I left a PhD in Economics in the Netherlands to take care of my family during the pandemic and transitioned fully into Data Science.

My journey blends engineering, economics, and data, with experience spanning Business Analysis, Consulting, Data & Analytics Engineering, and Data Science across the US, UK, Spain, and Brazil. I thrive at the intersection of advanced analytics and real-world impact.

Along the way, I’ve been honored with awards and scholarships from Yale University, UCLA, General Electric Foundation, Lemann Foundation, and The Club of Rome. Earlier in my career, I helped build Stone Payments (NASDAQ: STNE) and founded MePrepara, an online math prep platform with 140+ videos that helped low-income Brazilian students prepare for GRE/GMAT exams.

I bring not only strong technical skills (Python, SQL, dbt, Snowflake/Redshift, AWS, Looker, Advanced Mathematics, Statistics, Econometrics, and Machine Learning) but also entrepreneurial drive, teaching ability, and leadership. Education changed my life, and I aim to use data and technology to create the same opportunities for others.

Projects

See all projects below!

Data & Analytics Engineering


Data Observability for Raw Stripe Data in S3 with CI-CD

Check it out here!

This project provides a lightweight observability layer for raw Stripe data landing in S3 from Meltano ingestion. The goal is to give immediate confidence in the raw layer before any downstream transformations or analytics.

Key features:


Implementing an SCD Type 2 dimension from a CDC source using Snowflakes’s Stored procedure and Data Quality Checks.

Check it out here!

This task involves implementing a Slowly Changing Dimension (SCD) Type 2 to track changes to a product’s status over time within Snowflake. The source for this dimension is a Change Data Capture (CDC) stream that logs all data modification events (DML operations) from a transactional system. The main goal is to maintain historical records of product status changes, based on an ordered and deduplicated stream of changes assuring idempotency and with basic data quality checks.


Lead Quality Process (Reading Parquet and CSVs from S3 -> Transforming with Object-Oriented Design -> Postgres (Bronze, Silver, Gold Layers)).

Check it out here!

This projects uses a Dockerized environment to extract data both Parquet and CSV Data from S3 Buckets to Load and Transform them in PostgreSQL, following the Medallion Architecture.


Part 1 of 2 - Leveraging dbt-DuckDB to perform an Ingestion Step (Postgres -> AWS S3 Bucket (Parquet)).

Check it out here!

This projects uses a Dockerized environment to extract data from Postgres (as if it were data in “Production”). Then, it converts the data into a Parquet files, saving them into AWS S3 Bucket. I used my AWS Free Tier account and implemented the dbt-DuckDB adapter to expand dbt’s core function (the Transformation step) into an Ingestion machine.


Part 2 of 2 - Leveraging dbt-Snowflake to perform a Transformation Step (Parquet in S3 -> Snowflake External Tables -> Transformation in Snowflake via dbt).

Check it out here!

This project uses a Dockerized environment to extract Parquet files stored in S3 Buckets. External Tables were In Snowflake following Snowflake’s Storage Integration and External Stage procedures. Then, dbt perfors the Transformation step and materialize dimension and facts in the Silver Layer and Aggregated tables in the Gold schema, following the Medallion Architecture and Kimbal’s Dimensional Modeling.


ETL (Medallion Architecture and Kimball Dimensional Modeling) for Machine Learning (Churn Prediction), with Dockerized Postgres, Jupyter Notebook, and Python.

Check it out here!

I built a Python ETL pipeline using Python functions to perform ETL steps. This project runs within a Dockerized environment, using PostgreSQL as a database and Jupyter Notebook as a quick way to interact with the data and materialize schemas and tables. The ETL process followed the Medallion Architecture (bronze, silver, and gold schemas) and Kimbal’s Dimensional Modeling (Star Schema).


Migrating ETL (Medallion Architecture and Kimball Dimensional Modeling) to dbt, with Dockerized Postgres, Jupyter Notebook, and dbt.

Check it out here!

I expanded a previous work to mimic a project where we want to migrate Python ETL Processes to dbt, within a Dockerized environment. The data is extracted from multiple CSV files and both the Transformation and Loading steps are done against PostgreSQL, via dbt. The ETL process followed the Medallion Architecture (bronze, silver, and gold schemas) and Kimbal’s Dimensional Modeling (Star Schema).


ETL Pipeline from Crypto API to Tableau (CSV), with Dockerized Postgres, Jupyter Notebook, and Python.

Check it out here!

This ETL pipeline uses Python functions to perform ETL steps, extracting from an external API and transforming the data to be saved as CSV files for later use by Tableau or any other visualization tool. This project runs within a Dockerized environment, using PostgreSQL as a database and Jupyter Notebook as a quick way to interact with the data.

Applied Data Science and Machine Learning


Causal Inference (Propensity Score Matching & Difference-in-Differences): Measuring the Effect of a New Recommendation System on an E-Commerce Marketplace

Check it out here!


Causal Inference (Difference-in-Differences): Measuring the Effect of a New Customer-Satisfaction Program on an Airline Company

Check it out here!


Data Analytics with Python (Best Practices)


Data Cleaning - Preparing Categorical Data for Modeling

Check it out here!

When datasets are large, it can take forever for a Machine Learning model to make predictions. We want to make sure that data is stored efficiently without having to change the size of the dataset.


Data Cleaning - Parsing Date and Time Zone for Modeling

Check it out here!

Best Practices when cleaning dates, time, and time zone.


Data Analysis and Inferential Statistics with Python

Check it out here!


Sharing Knowledge

As hobbies, I play football competitively (forward), it’s my passion. I have played in amateur leagues in Brazil, USA, and the Netherlands. I also have a strong passion for teaching and educating others. A personal characteristic I am proud of is the ability to transform very complex subjects into intuitive topics for any audience.

I find happiness in the little things in life and I also learned a lot from every mistake I have made so far (and still do).


Statistics & Data Science

I have a passion for teaching and I have been trained by amazing professors in top notch universities around the globe.

Therefore, I have started to write a book that belongs to a (future) course I call “An Intuitive Course in Probability (and Statistics), for data science. The idea is to provide strong intuition for every major concept while keeping the mathematical formalization and rigor very close. I had this idea after taking a Probability Theory course from MIT. I am a fan. It will be available both in English and Portuguese.

Please, check the English version here and the Portuguese version aqui!

It’s a working in progress, so you may find only part of Chapter 1 now.

Math

Sometimes, I try to contribute to some interesting communities. You can check an example below.