High-Traffic Recipe Prediction

Overview

This repository focuses on predicting recipes that generate high traffic for a Grocery App's official webpage. The Grocery App, initially designed for grocery shopping, observed a significant increase in traffic (29%) when posting certain recipes on their webpage.

In response to this discovery, an analyst on the team identified recipes that generated high traffic on the day of posting. It's important to note that this list is not exhaustive.

Objective

The goal for data scientists involved in this project is to analyze the provided data and uncover patterns in the relationship between recipes and their popularity. The objective is to gain insights into the factors that contribute to a recipe becoming popular, enabling accurate predictions of high-traffic recipes to be posted on the website daily.

Dataset

The dataset contains information about recipes that have historically generated high traffic. Data scientists will explore, analyze, and model this dataset to extract meaningful patterns and develop a predictive model.

Key Tasks

Data Exploration: Dive into the dataset to understand its structure, variables, and potential patterns. Link to Notebook
Analysis: Uncover insights into the characteristics of recipes that lead to high traffic. Link to Notebook
Model Development: Build a predictive model to accurately forecast popular recipes. Link to Notebook
Evaluation: Assess the model's performance and refine as necessary for optimal predictions. Link to Notebook
Interactive API Endpoints: Enhance the usability of the model by implementing interactive API endpoints, facilitating seamless interaction for prediction generation. For more details, refer to the app README.
DataOps + MLOps: Steps to ensure the model is seamlessly integrated into production, continuously monitored, and efficiently manages data operations and machine learning operations for sustained performance and reliability.

Visualizations

Data Exploration and Analysis
Below is the plot to analyze the pattern in the feature columns(servings, protein, sugar, carbohydrates, calories) with respect to the target(for high/low traffic).

Model Development
Below are plots for the champion model.
The first plot is a confusion matrix, highlighting the model's performance in classifying unseen data.
The second plot is a receiver operating characteristic curve. The closer the curve leans to the vertical left y-axis (orange line), the better the model. The dashed diagonal blue line represents a random choice model and can be used as a baseline.

Interacting with the Model

CLI (coming soon)
REST API controllers/endpoints
REST API

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
app		app
data		data
imgs		imgs
models		models
notebooks		notebooks
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

High-Traffic Recipe Prediction

Overview

Objective

Dataset

Key Tasks

Visualizations

License

About

Releases

Packages

Languages

License

cs-uche/High-Traffic-Recipe-Prediction

Folders and files

Latest commit

History

Repository files navigation

High-Traffic Recipe Prediction

Overview

Objective

Dataset

Key Tasks

Visualizations

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages