Available for opportunities

Johnny
Costa

Data Engineer & Python Developer

Building end-to-end data solutions across cloud environments — from raw ingestion to actionable dashboards. Experienced in ETL pipelines, machine learning, and process automation across Cloud solutions.

View Projects LinkedIn GitHub

Johnny Costa

Data Engineer · Python Developer

4+ Years Exp.

Cloud Solutions

ETL Pipelines

Python BigQuery GCP AWS Databricks Streamlit Machine Learning SQL

About Me

End-to-end data solutions,
from raw ingestion to insights

I'm a Data Engineer with experience building scalable data solutions across cloud environments — GCP, AWS, and Azure. My work spans ETL pipeline design, machine learning, business intelligence, and process automation.

At Banese Card, I developed a fraud detection ML model with 80% recall and reduced operational SLA by 60%. At Ford Motor Company, I built a scalable ETL pipeline centralizing multi-source vehicle data into BigQuery and co-developed an ML model for vehicle weight estimation.

I hold a Bachelor's in Computer Science and two Postgraduate degrees in Big Data & AI and Software Engineering. Currently deepening expertise in AWS and Databricks. B2 CEFR English level.

Technical Skills

Languages & Frameworks

Python SQL Streamlit Pandas Pydantic Selenium

Cloud & Data Platforms

GCP BigQuery AWS S3 Azure Data Factory Databricks Looker Studio

Analytics & BI

Power BI DAX QlikView Machine Learning dbt Apache Spark

Projects

Selected work

View all repositories

AWS S3 Upload Automation

Automates the process of reading files from a local directory and uploading them to Amazon S3. Features secure credential management via environment variables and structured logging with Loguru — built as part of a hands-on AWS learning journey.

Python Boto3 AWS S3 Loguru python-dotenv

Steam DB — ETL Pipeline

Full ETL pipeline that scrapes game data from Steam via Selenium, transforms it with Pandas, and loads it into Google BigQuery. Connected to a live Google Sheets dashboard for real-time analysis of pricing trends and sales.

Python Selenium BigQuery Pandas GCP

CRM & Sales System — XPTO

Full-stack sales management app built with Streamlit and PostgreSQL. Features Pydantic data validation, a Medallion architecture (Bronze/Silver/Gold) orchestrated with dbt, and auto-generated docs via MkDocs.

Python Streamlit PostgreSQL dbt Pydantic SQLAlchemy

Experience

Professional journey

May 2025 — Present

Data Engineer / Data Analyst

Global Group · São Paulo, Brazil

Developed and automated an ETL pipeline to process and standardize vehicle data for engineering analysis (Python, BigQuery, GCP). Built an automated web scraping solution to collect and store vehicle data as a reliable source for project analysis. Created a Looker Studio dashboard providing engineers with real-time access to key vehicle insights. Currently building a data tool to replace manual Excel workflows — automating data transformation and generating interactive visualizations with Python, Streamlit, and Pandas.

Oct 2024 — May 2025

Researcher / Data Engineer

Ford Motor Company · São Paulo, Brazil

Built a scalable ETL pipeline to integrate and normalize vehicle data from multiple sources into Google BigQuery for advanced analytical modeling. Co-developed a machine learning model to estimate vehicle weights, delivering predictive insights at both total and subsystem levels directly to the engineering team (Python, ML, Streamlit). Standardized team templates and documentation, significantly reducing report preparation time.

Jul 2023 — Mar 2024

Fraud Analyst / Data Analyst

Banese Card · Aracaju, Brazil

Developed an ML model to predict fraudulent transaction profiles, achieving an 80% recall rate and significantly improving fraud detection accuracy. Automated the customer credit analysis workflow with Python, reducing operational SLA by 60%. Built and deployed interactive dashboards to monitor KPIs with real-time transactional data (Power BI, QlikView, DAX). Conducted deep-dive analysis on large datasets to identify complex fraud patterns and anomalies.

Jan 2023 — Jul 2023

Fraud Prevention Intern / Data Analyst Intern

Banese Card · Aracaju, Brazil

Created and optimized KPI queries that reduced the operational team's data response time by 40% (SQL, Power BI). Led the restructuring of team dashboards, establishing a new visual standard that improved usability. Built a reusable query library to increase analyst team efficiency and streamline data analysis workflows.

Jul 2021 — Dec 2022

Information Security Intern / IT Operations Intern

Banese Card · Aracaju, Brazil

Developed a system mapping solution that improved service efficiency and reduced the team's SLA by 60%. Built a centralized employee access management tracking system, reducing manual verification work and improving security auditing. Managed the full lifecycle of IT incidents including profile creation, access adjustments, and authorizations.

Education

Postgraduate — Data Engineering

Faculdade Focus · 2026

Postgraduate — Software Engineering

CENES Pós-Graduação · 2025

Postgraduate — Big Data & AI

Faculdade Focus · 2024

Bachelor's — Computer Science

Universidade Tiradentes · 2023

Get in Touch

Let's build something
worth the data.

Open to data engineering roles, freelance pipeline work, or just a conversation about data systems and Python.

Connect on LinkedIn johnnywscosta@gmail.com GitHub Profile

JohnnyCosta

End-to-end data solutions,from raw ingestion to insights

Selected work

Professional journey

Let's build somethingworth the data.

Johnny
Costa

End-to-end data solutions,
from raw ingestion to insights

Let's build something
worth the data.