Skip to content

Latest commit

 

History

History
1467 lines (996 loc) · 34.1 KB

slides.md

File metadata and controls

1467 lines (996 loc) · 34.1 KB
theme layout lineNumbers themeConfig title info class highlighter drawings mdc hideInToc favicon titleTemplate
slidev-theme-dataroots
intro
false
title github twitter linkedin
MLOps - Bringing ideas to production 🚀
murilo-cunha
_murilocunha
in/murilo-cunha
MLOps
## MLOps
text-center
shiki
persist
true
true
true
%s

[MLOops]{v-mark.crossed-off=1} to MLOps



Bringing ideas to production 🚀

March 15th, 2024


hideInToc: true

Agenda



::left::

::right::



<style> li:not(li:first-child) { margin-top: 0; } </style>

About me

  • 🇧🇷 → 🇧🇪: Brazilian @ Belgium
  • 🤓 B.Sc. in Mechanical Engineering @PNW
  • 👨‍🎓 M.Sc. in Artificial Intelligence @KUL
  • Professional Data & ML Engineer
  • Machine Learning Specialty
  • Terraform Associate
  • DAG Authoring & Airflow
  • SnowPro Core
  • Prefect Associate
  • 🤪 Fun facts: 🐍, 🦀, 🐓
  • 🫂 Python User Group Belgium
  • 📣 Confs: 🇯🇵, 🇵🇱, 🇮🇪, 🇵🇹, 🇪🇸, 🇸🇪
  • 🎙️ Datatopics Unplugged Podcast
  • 🤖 Tech lead AI @
<style> li { margin-top: 0 !important; } </style>

hideInToc: true layout: default

I have worked on different [Data/AI projects]{.gradient-text}


::left::



  • Events company 📣
  • No show prediction 🫥
  • Record deduplication 👯‍♀️
  • Recommend visitors and exhibitors 🤝
  • PoC MVP Production 🚀

::right::





hideInToc: true layout: twocols

From the [prototyping]{.gradient-text} side...


::left::

::right::



  • Content moderation @ social media company 🤬
  • NER @ clinical studies 🔎
  • Q&A chatbots @ automotive industry 🏎️
  • Energy consumption forecasting @ public sector 📈
  • Network analysis @ accounting company 🕸️

hideInToc: true

...to

production

applications


::left::




  • Finacial sector 💰
  • Early customer lifetime value 🤑
  • Pipeline migrations 🧑‍🔧
  • Churn prediction 🫠

::right::


hideInToc: true

Why am I [here]{.gradient-text}?


" To help us understand what it takes for a machine learning project takes to go from [idea to production]{v-mark.box.yellow=1}, looking closely at the differences between [machine learning]{.gradient-text} and [operations]{.gradient-text} "


Use case: [content moderation]{.gradient-text}

“pixel art angry face with symbols on mouth censoring profanity” - DALL·E 2


hideInToc: true

What is [content moderation]{.gradient-text}?

::left::


  • You're the CEO of 10gag ( congrats! 🎉 )
    • (Like 9gag, but better)
  • Things haven't been so good lately 🫣
  • You have some trolls leaving nasty comments 🤬
  • You have an idea! 💡
    • You can probably detect these comments, and remove them from the platform
    • How well can we identify these comments using machine learning?

::right::

{.rounded-lg .shadow-lg .scale-80}


hideInToc: true

So you get to [work]{.gradient-text}...




::left::

{.rounded.shadow.scale-120.mx-10 v-click}

::right::

{.rounded .shadow-xl .object-contain v-click}


layout: cover

ML [Deployment]{.gradient-text}

(Congrats! 🎉 Many projects don't get this far)


hideInToc: true

What does it [mean]{.gradient-text}?

We are getting [value]{v-mark.highlight.yellow=1} from our models

How do we go about for [content moderation]{.gradient-text}? How are we using the model?


::left::

{.h-60 .w-60 .object-cover .rounded-xl .shadow}

::right::

{.h-60 .w-60 .object-cover .rounded-xl .shadow}


Batch vs. real time

::left::

Batch 🍪

$\approx$ scheduled

  1. Accumulate predictions and run them together
  2. Schedule runs every hour/day/week/month
  3. Write the predictions to a table to a dashboard

{.scale-150}

::right::

Real time 👟

$\approx$ event-driven

  1. User does something
  2. This "something" triggers a (REST) API call
  3. Call return results/action

{.rounded-lg .shadow}

::bottom::

When should we choose one over the other?

<style> img { @apply h-25 !important; } </style>

layout: cover

Hacker News content moderation in action 🚀

(Simplified) batch deployment demo


hideInToc: true

The [tech]{.gradient-text} 🧱




layout: cover

[Real time]{.gradient-text} deployments 🚀

And the problem of [latency]{v-mark.red=0}


hideInToc: true

Real time applications

[Serving]{.gradient-text} models

::right::

::left::

flowchart LR
  subgraph API
  	direction TB
  	cloud("☁️") <--> phone("📱")
  end
  API --> batch("🍪")
  API --> realtime("👟")
Loading

hideInToc: true

ML in production

Serving (API)

<iframe src="https://asciinema.org/a/646947/iframe?speed=5" p-5 w-full h-105/>

The problem of latency


hideInToc: true

Why latency?


compute available ⚖️ compute required

::left::

ML


  • Compute
  • Complexity
  • Size

::right::

Ops


  • Cold starts
  • Network
  • IO

::bottom::

graph LR
    bread("🥖") <--> bike("🚴‍♀️") <--> person("🥸")
Loading

hideInToc: true

What is latency?


“Latency is a [measurement]{v-mark.red="'+1'"} in Machine Learning to determine the performance of various models for a specific application. Latency refers to the [time taken to process one unit of data provided only one unit of data is processed at a time]{v-mark="{type:'highlight', color:'yellow', multiline:true, at:'+1'}"}.”


hideInToc: true

Latency [in action]{.gradient-text}

::left::

"ChatGPT-like"

  • Prompts:
    • Generate a list of the 10 most beautiful cities in the world.
    • How can I tell apart female and male red cardinals?
rootsacademy-model-latency/
├── ...
├── common
│   ├── __init__.py
│   └── utils.py
├── pyproject.toml
└── scripts
    ├── local.py
    └── remote.py

::right::

<style> li:not(li:first-child) { margin-top: 0; } </style>

hideInToc: true

Latency [in action]{.gradient-text}


<iframe src="https://asciinema.org/a/8PRQDcwFUTLXQ2WQYqczGV5IY/iframe?speed=2&idleTimeLimit=3" p-5 w-full h-105/>

hideInToc: true

Latency - what can we do about it?


::left::

::right::

::bottom::


hideInToc: true

Each scaling strategies has its [trade offs]{.gradient-text}


::left::

  • Cloud makes it easy to scale vertically
  • Higher overhead to manage clusters (horizontal scaling)
  • Vertical scaling has limits
  • [Serverless machines]{.gradient-text.font-bold} can help optimize costs
  • Horizontal scaling can be autoscaled
  • Both may be cost efficient, depending on the setup

::right::


{.rounded-lg.h-60.shadow}


[What's the main cause of your latency?]{v-mark="{type:'highlight', color:'yellow', multiline:true, at:'+0'}"}

<style> li { margin-top: 0 !important; } </style>

hideInToc: true

Scaling [in action]{.gradient-text}


<iframe src="https://asciinema.org/a/x5zX5jFKYtFVzg0FZXL9lW9Mo/iframe?speed=3&idleTimeLimit=3" p-5 w-full h-105/>

hideInToc: true

Latency [solved]{.gradient-text}?



::left::

::right::


hideInToc: true

Instead of growing, we can [shrink]{.gradient-text}


::left::


<iframe bg-slate-50 p-2 w-full h-60 rounded shadow src="https://ggml.ai/" />

::right::


hideInToc: true

Quantization [in action]{.gradient-text}

<iframe src="https://asciinema.org/a/iphbEPhRNdS01C2aupWDj5VMM/iframe?speed=2&idleTimeLimit=3" p-5 w-full h-115/>

hideInToc: true

Scaling is not always possible on [the edge]{.gradient-text}



“Edge machine learning (edge ML) is the process of running machine learning algorithms on computing devices [at the periphery of a network]{v-mark="{type:'highlight', color:'yellow', multiline:true, at:'+1'}"} to make decisions and predictions as close as possible to the originating source of data.”



hideInToc: true

For limited resources and extremely low latencies, you may need to [look outside ML]{.gradient-text}


::left::

<iframe w-full h-85 rounded shadow src="https://arxiv.org/pdf/2212.09410.pdf" />

::right::



layout: cover

[Real time]{.gradient-text} use cases


hideInToc: true

Auto-en-joker @ dataroots


hideInToc: true

Auto-en-joker @ dataroots


hideInToc: true

Edge defect detection @ chip manufacturer





::left::

::right::


flowchart LR
  	camera("📸") --> ruler("📐")
    ruler --> pos("👍")
    ruler --> neg("👎")
    neg --> brain("🧠")
    brain --> _pos("👍")
    brain --> _neg1("👎")
Loading

layout: cover

Why MLOps?


hideInToc: true

So you have a [model]{.gradient-text}...



::right::


{.rounded .shadow-xl .object-contain v-click=1}

::left::

👨‍💼 "How long will it take to go though 100 posts? How can we make it faster?"

👷‍♀️ "How can we make sure the model scales?"

👷‍♂️ "What packages did you use?"

😡 "Why is it removing my posts?"

👩‍🔬 "What models did you already try?"

🕵️ "What data was used to train this model?"

::bottom::

[MLOps decreases the burden of deploying ML systems by establishing best practices]{v-mark.highlight.yellow=8}


Real life [testimonials]{.gradient-text}

“At this point, everybody does what they like, there is [little to no standardisation]{v-mark.red=1}. Since there are little to [no best practices]{v-mark.box.yellow=1}, the current platform contains the largest common denominator of a lot of heterogeneous projects. This causes a lot of [burden in maintaining]{v-mark.circle.pink=1} these projects”

“For quite some time, the focus was on more traditional Business Intelligence and Data Engineering. More recently we have seen the focus shifted more towards Advanced Analytics in the form of some scattered initiatives and products, which in turn lead to [little success]{v-mark.highlight.yellow=2} on these.”

“While I love our Data Science team, the [code they write is not at all up to standards]{v-mark.box.red=3} in comparison to what we normally push into production. This puts a [heavy burden]{v-mark.circle.yellow=3} on the Data Engineering team to [rewrite and refactor]{v-mark.highlight.yellow=3} this. At the same time the Data Science team is often [unhappy]{v-mark.blue=3}, because this refactoring process tends to introduce [mistakes or misunderstandings]{v-mark.highlight.cyan=3}.”

When we talk MLOps we often talk deployment and/or [deployed models]{.gradient-text}

<style> h3 { @apply text-base !important; } </style>

layout: cover title: What is MLOps?

What is [ML]{v-mark.red=1}[Ops]{v-mark.circle.yellow=2}?


hideInToc: true

What's in the [name]{.gradient-text}?


::left::

Machine learning 🧠


  • Experimentation
  • Data exploration
  • Modelling
  • Hyperparameter tuning
  • Evaluation

::right::

Operations ⚙️


  • Infrastructure
  • Scalability
  • Reproducibility
  • Monitoring/Alerting
  • Automation

::bottom::

[✨ MLOps ✨]{.flex .justify-center .'-mt-20' v-click}


DevOps vs. [MLOps]{.gradient-text}


::left::

MLOps


  • Iterative-Incremental Development
  • [Automation]{v-mark.blue="'+1'"}
  • [Continuous Deployment]{v-mark.blue="'+1'"}
  • [Versioning]{v-mark.blue="'+1'"}
  • [Testing]{v-mark.blue="'+1'"}
  • [Reproducibility]{v-mark.blue="'+1'" v-mark.box.red="'+2'"}
  • Monitoring

::right::

vs. DevOps


+ Model

+ Features

+ Data


hideInToc: true

DevOps vs. [MLOps]{.gradient-text}

{.rounded .shadow .bg-blue .scale-50 v-click .'-mt-20' .'-mb-25'}

"[...] By codifying these practices, we hope to accelerate the adoption of ML/AI in software systems and fast delivery of intelligent software. In the following, we describe a set of important concepts in MLOps such as [Iterative-Incremental Development, Automation, Continuous Deployment, Versioning, Testing, Reproducibility, and Monitoring]{v-mark="{type:'highlight', color:'yellow', multiline:true, at:2}"}."


hideInToc: true

So... what is it?


“The [level]{v-mark.circle.blue=1} of automation of these steps defines the maturity of the ML process, which [reflects the velocity of training new models given new data or training new models given new implementations]{v-mark="{type:'highlight', color:'yellow', multiline:true, at:2}"}. The following sections describe three levels of MLOps, starting from the most common level, which involves no automation, up to automating both ML and CI/CD pipelines.”


hideInToc: true

(MLOps in theory vs. [practice]{.gradient-text})

[> “In theory, theory and practice are the same. In practice, they are not.” - Einstein]{v-click}

<iframe src="https://arxiv.org/ftp/arxiv/papers/2205/2205.02302.pdf" w-full h-95 rounded shadow-lg v-click/>

hideInToc: true

Pop Quiz 💥

For each of these challenges, which ones are related to [ML]{v-mark.highlight.red=1} or [Ops]{v-mark.highlight.cyan=1} ?


::left::

  • [Models and experiments are not properly tracked]{v-mark.highlight.cyan=2}
  • [Model decay]{v-mark.highlight.red=3}
  • [Changing business objectives]{v-mark.highlight.red=4}
  • [Models monitoring and (re)trainining]{v-mark.highlight.cyan=5}
  • [Data quality]{v-mark.highlight.red=6}
  • [Consistent project structure]{v-mark.highlight.cyan=7}
  • [Data availability]{v-mark.highlight.red=8}

::right::

  • [Code and dependencies tracking]{v-mark.highlight.cyan=9}
  • [Auditability and regulations - reproducibility and explainability]{v-mark.highlight.cyan=10}
  • [Wrong initial assumptions (problem definition)]{v-mark.highlight.red=11}
  • [Locality of the data (distributional shift)]{v-mark.highlight.red=12}
  • [Recreate model artifacts]{v-mark.highlight.cyan=13}
  • [Deploy model systems(not just one off solutions)]{v-mark.highlight.cyan=14}

layout: cover

MLOps Illustrated

ML Lifecyle Recap


hideInToc: true

ML lifecycle & development (simplified)

```mermaid %%{init: {"flowchart": {"htmlLabels": false}} }%% flowchart LR idea["`💡 Idea`"] poc["`Proof-of-Concept 🤖`"] mvp["`Minimal Viable Product 🦴`"] prod["`Iterate 🚀`"]
idea --> poc --> mvp --> prod

```mermaid
%%{init: {"flowchart": {"htmlLabels": false}} }%%
flowchart LR
    eda["`Exploratory Data Analysis 🔎`"]
    model["`Modeling 📦`"]
    eval["`Evaluation ⚖️`"]
    deploy["`Deployment 🏗️`"]
    monitor["`Monitoring 👀`"]
    eval .-> model
    eval .-> eda    
    eda --> model --> eval --> deploy --> monitor

{.absolute .top-0 .scale-110}


hideInToc: true

MLOps [Illustrated]{.gradient-text}

::left::

  • Data versioning 🚀
    • Reproducing models and scores
  • Feature engineering 📦
    • Version code + artifacts
  • Model training 🌱
    • Track experiments (models hyperparameters, etc.)
    • Use seeds
  • Quality assurance 🔍
    • Unit/integration tests
    • Statistical tests
    • Stability tests
    • GenAI tests? - Validation, self reflection, etc.
  • Prepare for deployment 🏗️
    • Packaging and containerizing!

::right::

{.px-3}

{.rounded-lg}

<style> li { margin-top: 0 !important; } img { @apply h-20 m-2 inline !important; } </style>

hideInToc: true

Recap




  • How to deploy models depends on whether you have a [batch or real time]{v-mark.box.blue=1} use case
  • It's important to minimize [latency]{v-mark="{at:2, color:'red', brackets:['bottom'], type: 'bracket'}"} in real time use cases
  • Latency can be reduced by [scaling resources or reducing computational needs]{v-mark.highlight.yellow=3}
  • MLOps is a [set of principles]{v-mark.red=4} reduces the burden of deploying and maintaining models
  • Unless you're doing research, the [value]{v-mark.box.yellow=5} of models only come after they have been [deployed]{v-mark.circle.green=5}

::bottom::

[As ML gets easier and easier to do, MLOps and software engineering skills becomes increasingly important]{.gradient-text}


hideInToc: true

Questions?