Skip to content

Leveraging LLMs to generate better prompts for Stable Diffusion models.

License

Notifications You must be signed in to change notification settings

theerfan/Reimage-GPT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reimage-GPT

Leveraging GPT-2 (or GPT-anything) to generate better prompts for stable diffusion 2.

The way it works is the following:

  1. We sample a number of images from the COCO dataset and use them as our training data,
  2. Then we pass these images to a frozen version of MetaAI's Detectron 2 model, which would give us a json describing the items in the picture.
  3. Then we would this json to generate a simplified string describing this image.
  4. Pass this string to GPT-2 with a system prompt telling it to come up with an image-generation prompt and how the string was structured.
  5. Then the output of this step would be passed into a frozen Stable Diffusion 2 pipeline to generate the output image
  6. We used the SSIM loss to compare the input image and the output image to fine-tune the weights of our GPT-2 instance.

[Final Project for UCLA's COM SCI 263 - Natural Language Processing]

About

Leveraging LLMs to generate better prompts for Stable Diffusion models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published