Skip to content
View joaomj's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report joaomj

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
joaomj/README.md

Hello everyone!

Here you will find my professional experiences, skills, tools and projects involving my work with Data.

(take a look at my Data Science portfolio)

As my experiences below demonstrate, I am able to implement a complete Data Science project (“end-to-end”), from obtaining business requirements to publishing it in the cloud, including creating tools to access the Machine Learning Models for non-tech people.

Skills:

  • Programming and Databases: Python (pandas, sklearn, seaborn, matplotlib, numpy, among others), SQL (PostgreSQL, MySQL), REST APIs (Flask).
  • Data Science: Descriptive Statistics, Supervised Machine Learning Algorithms, feature selection, Sklearn, CRISP-DM framework, Jupyter Notebooks, Data Balancing, ML Performance Metrics (RMSE, MAE, MAPE, Confusion Matrix, Precision, Recall, Accumulative, ROC and AUC curves).
  • Software Engineering: Git (code versioning, DVC (data versioning), MLFlow (ML experiments tracking), Virtual Environment, Streamlit, Flask, Docker.
  • Cloud: Heroku Cloud, AWS (S3, Lambda, and EC2), Azure.
  • Big Data: Azure, PySpark, Databricks.
  • Data Visualization: Python (Matplotlib, Plotly, Seaborn, Folium), Power BI, Looker, Excel, Streamlit.
  • Soft Skills: Experience translating analyses into actionable insights for business teams; accustomed to working in multidisciplinary teams.
  • Digital Marketing: Google Analytics, Google Ads, Facebook Ads, Tag Manager.
  • Data Analysis: Power BI (DAX, Power Query), Excel, VBA, SQL, data storytelling, agile methodologies (Scrum, Kanban, Trello).
  • English Level: C2.

Professional Experience:

Data Mundo (Data Analyst, March 2024 – Present)

  • Followed DataOps protocols to optimize insight generation.
  • Built automation solutions in Business Intelligence using PowerBI (DAX, VBA, and Power Query) and Looker.
  • Developed Power BI dashboards to support business decisions, using Excel and SQL for specific functions.
  • Utilized agile methodologies (Scrum and Kanban) for efficient BI project execution.
  • Marketing Analytics: Google Analytics, Facebook Ads, Google Ads, Google Tag Manager.
  • Conducted exploratory analysis with descriptive statistics and hypothesis testing for clients in e-commerce, marketing, and industrial sectors.
  • Used Streamlit and REST APIs (Flask) to make interactive models available to users.

Ticker Research (Data Scientist, September 2022 - May 2024)

  • Developed predictive models for stock markets with Python.
  • Analyzed large volumes of financial data to identify trends and patterns.
  • Built ETLs in Azure, extracting and transforming data from various sources.
  • Code versioning with Git and machine learning experiments tracking with MLflow.
  • Created dynamic dashboards and detailed reports using Power BI to support data-driven decision-making.
  • Used Streamlit and REST APIs (Flask) to make interactive models available to users.

KPMG Brazil (Data Scientist, March 2022 - July 2022)

  • Data Science consultant for publicly traded companies in the utilities and industrial sectors.
  • Conducted business analysis to identify opportunities for digital transformation through data tools.
  • Built ETL processes in Big Data environments using Azure technologies.
  • Developed end-to-end Data Science solutions to support predictive maintenance processes for clients, following the CRISP-DM framework.
  • Utilized Databricks for end-to-end data science projects involving massive data volumes (along with Spark).
  • Proficient in Azure services including Data Factory, Synapse, Databricks, and Functions.
  • Performed exploratory analysis (descriptive statistics, hypothesis testing) on complex datasets using Jupyter notebooks.
  • Implemented supervised machine learning techniques such as regression and XGBoost in cloud environments.
  • Code versioning with Git and experiments tracking with MLflow.

Education

  • Bachelor degree in Computer Science through Sao Paulo State University (UNESP) - February, 2014.

(some) Data Science Learning Projects:

Sales Forecast

  • Business Problem: The CFO of a drugstore chain wanted a sales forecast for each individual store for the next 06 weeks.
  • Solution Strategy: Following the steps of the CRISP-DM framework, first I performed data analysis to extract insights and validate business hypotheses. Then, I performed the selection and validation of Machine Learning algorithms, choosing XGBoost for the balance between computational performance and accuracy. Finally, I deployed the application in the cloud (Heroku). Access to forecasts is made possible through a Telegram bot created by me.
  • Business Result: The choosen model presented an absolute mean percentage error (MAPE) of 15%. The expected revenue was in the range of R$ 275.15 - 277.46 (millions R$).Information about the drugstore sales forecasting was shown on a Telegram Bot.
  • Project: Sales Forecast Drugstores.

Marketing Budget Optimization

  • Business Problem: An insurance company wanted to sell another insurance policy (vehicle insurance) to its clients. However, due to the limited amount of its marketing budget and the size of its customer base, the company wanted to know which customers should be given priority in receiving a cross-sell proposal.
  • Solution: After understanding the problem (optimizing the use of the marketing budget), the company's data science team adopted a strategy of classification: customers will be classified according to the probability of purchasing the new insurance. This probability will be provided by a Machine Learning algorithm. With the clients dataset ordered by purchase probability, the marketing team will be able to focus its efforts only on the customers that are at the top of the sorted dataset ( = greater purchase probability).
  • Business Result: with the Machine Learning model adopted, the marketing team was able to approach 90% of interested customers by covering only 40% of the clients database. Assuming a cost per call of $10.00 and considering the total number of customers (381,109), the company achieved savings of $2,286,660 on phone calls costs (60% cost reduction).
  • Project: Marketing Optimization

Exploratory Data Analysis of Real Estate Data

  • Business Problem: The CEO of a real estate company located in King County wanted to know which properties available for sale would be a good business opportunity.
  • Solution: The data team's approach consisted of analyzing all properties in the county available for sale and finding those priced below market price ( = median price per m2 in each zip code). We would recommend buying the property if it is below market price and in good condition. Then the property would be resold at market prices.
  • Business Result: Thanks to this simple data analysis, the company was able to resell properties for up to 30x the purchase price.
  • Project: Exploratory Data Analysis

Curiosities about me:

  • Areas of Interest: Data Science, MLOps, Stock Analysis and History 😎 .

[Joao's GitHub stats

Pinned Loading

  1. health_insurance_cross_sell health_insurance_cross_sell Public

    Using Machine Learning algorithms to optimize an insurance company's marketing campaigns.

    Jupyter Notebook 5 1

  2. tipos_distancias tipos_distancias Public

    Obtendo distâncias de deslocamento (driving distances)

    Jupyter Notebook 25 9

  3. dataeng_roadmap dataeng_roadmap Public

    Uma trilha de estudos para Engenharia de Dados

    4 4