Lesson 9: Deep Learning Foundations to Stable Diffusion, 2022

18 Views· 01/24/24

kgbf1
7 Subscribers

In Film & Animation

(All lesson resources are available at http://course.fast.ai.) This is the first lesson of part 2 of Practical Deep Learning for Coders. It starts with a tutorial on how to use pipelines in the Diffusers library to generate images. Diffusers is (in our opinion!) the best library available at the moment for image generation. It has many features and is very flexible. We explain how to use its many features, and discuss options for accessing the GPU resources needed to use the library.

We talk about some of the nifty tweaks available when using Stable Diffusion in Diffusers, and show how to use them: guidance scale (for varying the amount the prompt is used), negative prompts (for removing concepts from an image), image initialisation (for starting with an existing image), textual inversion (for adding your own concepts to generated images), Dreambooth (an alternative approach to textual inversion).

The second half of the lesson covers the key concepts involved in Stable Diffusion:
- CLIP embeddings
- The VAE (variational autoencoder)
- Predicting noise with the unet
- Removing noise with schedulers.

You can discuss this lesson, and access links to all notebooks and resources from it, at this forum topic: https://forums.fast.ai/t/lesso....n-9-part-2-preview/1

0:00 - Introduction
6:38 - This course vs DALL-E 2
10:38 - How to take full advantage of this course
12:14 - Cloud computing options
14:58 - Getting started (Github, notebooks to play with, resources)
20:48 - Diffusion notebook from Hugging Face
26:59 - How stable diffusion works
30:06 - Diffusion notebook (guidance scale, negative prompts, init image, textual inversion, Dreambooth)
45:00 - Stable diffusion explained
53:04 - Math notation correction
1:14:37 - Creating a neural network to predict noise in an image
1:27:46 - Working with images and compressing the data with autoencoders
1:40:12 - Explaining latents that will be input into the unet
1:43:54 - Adding text as one hot encoded input to the noise and drawing (aka guidance)
1:47:06 - How to represent numbers vs text embeddings in our model with CLIP encoders
1:53:13 - CLIP encoder loss function
2:00:55 - Caveat regarding "time steps"
2:07:04 Why don’t we do this all in one step?

Thanks to fmussari for the transcript, and to Raymond-Wu (on forums.fast.ai) for the timestamps.

0 Comments

Up next

Lesson 9: Deep Learning Foundations to Stable Diffusion, 2022

Up next

Explore more

Please note that if you are under 18, you won't be able to access this site.

Up next

Lesson 9: Deep Learning Foundations to Stable Diffusion, 2022

Up next

Explore more

Choose a payment method

Language