Skip to content

Latest commit

 

History

History
106 lines (86 loc) · 12.8 KB

readme.md

File metadata and controls

106 lines (86 loc) · 12.8 KB

👾 Personal notes:

  1. Google Slides - Intro to diffusion models
  2. 🌟Google Slides - Stable Diffusion, explained (This version contains important contents from previous slides)
  3. Review on ICLR 2023 Diffusion related paper
  4. Google Slides - ControlNet, explained

🦄 Some good materials to get start with:

  1. Lilianweng's Log - What are Diffusion Models?
  2. The Annotated Diffusion Model@HuggingFace
  3. Youtube@AICofeeBreak - "diffusion" related videos
  4. Bilibili@deep_thoughts - DDPM algorithm and its codes, video explained (Chinese)
  5. 生成扩散模型漫谈@苏剑林 (系列文章,这里只列出第一篇)

Awesome Paper List of Diffusion Models

This paper list contains papers of awesome diffusion models in different areas. Click 💡 to access its github.

Cornerstone papers of diffusion models:

  1. Deep unsupervised learning using nonequilibrium thermodynamics, 💡, Nov 2015, "Diffusion" occured in machine learning first time
  2. NCSN Generative modeling by estimating gradients of the data distribution, 💡, Jul 2019, Noice Conditioned Score Network
  3. DDPM Denoising diffusion probabilistic models, 💡, Jun 2020, could seen as cornerstone of following diffusion model works
  4. Score-Based Generative Modeling through Stochastic Differential Equations, Nov 2020, from SDE perspective, lots of math...

Speed Up

  1. DDIM: Denoising Diffusion Implicit Models 
  2. PLMS: Pseudo Numerical Methods for Diffusion Models on Manifolds ICLR 2022
  3. Analytic-DPM: an Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models, ICLR2022 (Outstanding Paper Award)
  4. EDM, Elucidating the Design Space of Diffusion-Based Generative Models, 💡 Jun 2022, from NVIDIA, NeurIPS 2022 (Oustanding Paper Award)
  5. Diffusion-GAN: Training GANs with Diffusion (https://arxiv.org/abs/2206.02262), 💡, Jun 2022, from Microsoft Azure AI.
  6. DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps, NeurIPS 2022
  7. DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models

Vision

Image generation

  1. Improved diffusion Improved denoising diffusion probabilistic models, 💡 Feb 2021, from OpenAI, PMLR 2021.
  2. Guided diffusion Diffusion models beat gans on image synthesis, 💡,Apr 2021, from OpenAI, NeurIPS 2021.
  3. FastDPM,On Fast Sampling of Diffusion Probabilistic Models, 💡, May 2021, from NVIDIA, ICLR Workshop 2021.
  4. LSGM, Score-based Generative Modeling in Latent Space, 💡 Jun 2021, from NVIDIA, NeurIPS 2021.
  5. Distilled-DM,Progressive Distillation for Fast Sampling of Diffusion Models, 💡, Feb 2022, from Google Brain, ICLR 2022.
  6. GGDM, Learning Fast Samplers for Diffusion Models by Differentiating Through Sample Quality Feb 2022, from Google Brain, ICLR 2022.
  7. DiT, Scalable Diffusion Models with Transformers, Dec 2022, from UCB. Replace Unet with transformer with adaptive layer norm layers.

Text to Image

  1. 🌟Stable Diffusion/LDM High-resolution image synthesis with latent diffusion models, 💡 Dec 2021, from Stability.AI & LMU Munic & Runway. Very awoesome for releasing codes & weights for free!
  2. Glide: Towards photorealistic image generation and editing with text-guided diffusion models, 💡, Dec 2021, from OpenAI
  3. Dalle-2 Hierarchical text-conditional image generation with clip latents, 💡 Apr 2022, from OpenAI. Btw, Dalle using different architecture - VQ-VAE.
  4. KNN Diffusion: Image Generation via Large-Scale Retrieval Apr 2022, from Meta AI.
  5. Imagen:Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding, 💡 May 2022, from Google Brain. NeurIPS 2022 (Oustanding Paper Award) Using pretrained languange model - T5.
  6. LAION-RDM, Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models, 💡, Jul 2022, from Ludwig-Maximilian University of Munich.
  7. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation, 💡, Aug 2022, from Google Research & Boston University.
  8. DreamFusion: Text-to-3D using 2D Diffusion, 💡, 29 Sep 2022, from Google Research & UCB.

Image Editing

  1. SDEdit, SDEdit: Image Synthesis and Editing with Stochastic Differential Equations, 💡 Aug 2021, from Stanford University & Carnegie Mellon University, ICLR 2022
  2. RePaint, RePaint: Inpainting using Denoising Diffusion Probabilistic Models, 💡, Jan 2022, from ETH Zurich, CVPR 2022.
  3. Prompt-to-Prompt Image Editing with Cross Attention Control, Aug 2022, from Google.
  4. DiffEdit, DiffEdit: Diffusion-based semantic image editing with mask guidance, Oct 2022, from Meta AI.
  5. InstructPix2Pix, InstructPix2Pix: Learning to Follow Image Editing Instructions 💡, Nov 2022, from UC Berkeley (Using GPT-3)
  6. Diffusion Guided Domain Adaptation of Image Generators, 💡 Dec 2022 from Rutgers University

Image Captioning

  1. Bit Diffusion, Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning, from Google Brain, ICLR 2023, analog bits, self conditioning and asymmetric time intervals. (Also could be used in discrete/categorical image generation task)

Video Genereation:

  1. Video diffusion models, Apr 2022, ICLR 2022 Workshop. GIF like.
  2. MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation, 💡, May 2022, from University of Montreal
  3. Make-A-Video: Text-to-Video Generation without Text-Video Data, 29 Sep 2022, from Meta AI (Previous work is Make A Scence, VQ-VAE + classifer free guidence)
  4. Imagen Video: High Definition Video Generation with Diffusion Models, 5 Oct 2022, from Google Brain (Also annoced Phenaki for long video generation, but it is not diffusion model)

Object Detection

  1. DiffusionDet: Diffusion Model for Object Detection 💡 17 Nov 2022, from HKU and Tencent AI

Segmentation

  1. Pix2Seq-D: A Generalist Framework for Panoptic Segmentation of Images and Videos

Natural language:

  1. Diffusion-LM Diffusion-LM Improves Controllable Text Generation, 💡, May 2022, from Stanford University.

Audio

Audio Generation

  1. DiffWave, DiffWave: A Versatile Diffusion Model for Audio Synthesis, 💡, Jun 2020, from UCSD & Nvidia & Baidu, ISMIR 2021
  2. WaveGrad, WaveGrad: Estimating Gradients for Waveform Generation, 💡, Sep 2020, from Google Brain, ICLR 2021. 3.Symbolic Music Generation with Diffusion Models, 💡, Mar 2021, from Google Brain, ISMIR 2021
  3. DiffSinger,DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism, 💡, May 2021, from Zhejiang University, AAAI 2022
  4. VDM,Variational Diffusion Models, 💡, Jul 2021, from Google Brain, NeurIPS 2021.
  5. FastDiff, FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis, 💡, Apr 2022, from Zhejiang University & Tencent AI Lab, IJCAI 2022
  6. BDDMs, BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis, 💡, May 2022, from Tencetn AI Lab, ICLR 2022
  7. SawSing, DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation, 💡, AUG 2022, ISMIR 2022
  8. Prodiff, ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech, 💡, JUL 2022, from Zhejiang University, ACM Muiltimedia 2022

Audio Conversion

  1. DiffVC, Diffusion-Based Voice Conversion with Fast Maximum Likelihood Sampling Scheme, 💡, Sep 2021, from Huawei Noah, ICLR 2022.

Audio Enhancement

  1. NU-Wave, NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling,💡, Apr 2021, from MINDSLAB & Seoul National University, Interspeech 2021
  2. CDiffSE,Conditional Diffusion Probabilistic Model for Speech Enhancement, 💡 Feb 2022, from CMU & Reality Labs Research, Pittsburgh & Academia Sinica, IEEE 2022

Text to Speech

  1. Grad-TTS, Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech, May 2021, from Huawei Noah
  2. EdiTTS, EdiTTS: Score-based Editing for Controllable Text-to-Speech, 💡 Oct 2021, from Yale University & Supertone Inc & Neosapience Inc
  3. DiffGAN-TTS, DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs, 💡, Jan 2022, from Tencent AI Lab
  4. Diffsound, Diffsound: Discrete Diffusion Model for Text-to-sound Generation, 💡, Jul 2022, from Beijing University & Tencent AI Lab