Guide created by Chris Allen (@zippy731 on twitter)

EZ Charts – Diffusion Parameter Studies

VIsual Reference Guides for CLIP-Guided Diffusion 

I got sick of floundering with the settings on Jax2.4, so I’m running some more systematic visualizations”  

@EnzymeZoo, 12/16/21

Info for Visitors:

CLIP-Guided Diffusion parameters are complicated.  This visual library is meant to help you get a glimpse of some of the effects of adjusting parameters, to give you a leg up in your projects.

Two popular CLIP guided diffusion models are JAX (by nshepperd and modded by others) and Disco Diffusion (by somnai and gandamu, and modded by others.) Both of these use many of the same underlying parameters, so test results for both JAX and DD are included here. Tests run from December 2021 – present, spanning several versions of the notebooks.

These are meant to show you DIRECTIONALITY of the effect of changing parameters, not absolutes. Your project will differ from the tested project, but the effects of changing parameters should persist.

These tests are very lightly tagged. Search using parameter name (eta, ic_cut_pow, cut_pow, etc) that you’re interested in, use the TOC, or just browse.  Where possible, we’ve linked to the original source so you can research there further.

Many of these studies were created in @EnzymeZoo’s parameter explorer Colab notebooks.


Got ideas for your own study? Run it and share! 

  • This is all about images, not words, so your study should clearly show the effect of the parameter change. New combinations of parameters (vs what’s here) are preferred 
  • How to share? SHARE the image  (DD Discord, twitter, reddit, imgur) and some text about the other conditions of the test.  Then alert one of us and we will get it into library

Librarians: @zippy731, @EnzymeZoo, @KyrickYoung@EErratica, @annetropy, @kaliyuga_ai

Librarian workflow: just a compilation

  • MANTRA: KEEP IT SIMPLE. 
    • This is a visual reference. 
    • This is a compilation. No new research, just gathering.
    • Most EZ Charts are already labeled with key params.
  • How we will store newly discovered studies
    • Add to the bottom. Not in chrono order.
    • One line header. Format as ‘Heading 1’ and will show up in TOC automatically
      • params tested (required) (down, then across), platform & date (if known)
      • E.g. clamp_max vs steps, DD, 3/5/22 
    • Other tester notes, if provided directly w source. 
      • Don’t do any new research beyond what’s shared.
    • Link to source (twitter, Discord msg, reddit)
      • User can chase down remaining details using link.
    • Shrink image to fit on one page.
      • Use image options (right click image) to set to max 6.5” wide and 9” tall.
        • Otherwise, no reformatting of existing images. Just compile.
      • OK for multiple images to span multiple pages
      • User can chase down full size image at source link.

NOW FOR THE GOOD STUFF 

TABLE OF CONTENTS

Tv_scale vs sat_scale, JAX 2.4,12/16/21 5

Tv_scale vs sat_scale #2, JAX 2.4,12/16/21 8

Tv_scale vs steps, JAX 2.4, 12/17/21 11

Clip_guidance_scale vs steps JAX 2.4, 12/22/21 14

Clip_guidance_scale vs steps, JAX, 12/31/21 16

Eta vs steps, JAX, 1/5/22 17

Clamp_max vs steps, JAX, 1/18/22 18

Clip_guidance_scale vs steps DD, 1/30/22 19

Steps comparison, DD, 3/5/22 20

Clamp_max study, DD, 3/5/22 25

Clamp_max pt.2: , DD, 3/5/33 28

steps vs clamp_max, DD, 3/5/22. 31

ETA, DD, 3/5/22 34

ETA Negative Range, DD, 3/6/22 37

ETA vs clamp_max, DD, 3/6/22 39

Cut_overview vs cut_Innercut, DD, 3/9/22 41

Cut_ic_pow, DD, 3/8/22 43

Cut_ic_pow vs cut_innercut, DD, 3/10/22 48

Clamp_max vs skip_steps, DD, 3/11/22 53

Clip_guidance_scale, DD, 3/12/22 57

Clip_guidance_scale pt2, DD, 3/12/22 60

CLIP model comparisons, DD4.1, 3/3/22 63

Skip_steps, DD4.1, 3/4/22 68

Skip_steps, DD4.1, 3/4/22 69

Skip_steps, DD4.1, 3/6/22 77

Output image size (width_height), DD4.1, 3/8/22 82

Output image size (width_height), DD4.1, 3/8/22 87

use_secondary_model , DD, 3/13/22 93

Tv_scale vs sat_scale, JAX 2.4,12/16/21

https://discord.com/channels/869630568818696202/893041802720993310/921229247459241996

[THIS IS THE OG EZ CHART] EZ: I got sick of floundering with the settings on Jax2.4, so I’m running some more systematic visualizations:  all_title = “cursed toilet by Zdzisław Beksiński”

cutn = 8, cut_pow = 1.0, cut_batches = 10, steps = 250, eta = 1.0, starting_noise = 1.0

init_weight_mse = 0, Using default openai model

Tv_scale vs sat_scale #2, JAX 2.4,12/16/21

Here’s some more with all_title = “colorful forest by Zdzisław Beksiński” and cut_batches=4. I thought maybe we would see more effect with sat_scale if there was something “colorful” in the prompt. Might need to use different value ranges.

Tv_scale vs steps, JAX 2.4, 12/17/21

All tests with nshepperd Jax notebook v2.4 all_title = “cursed toilet by Zdzisław Beksiński” image_size = (512, 512) cutn = 8 cut_pow = 1.0 cut_batches = 4 sat_scale = 1000 eta = 1.0 starting_noise = 1.0 init_weight_mse = 0 Using default openai model, cutn

Image Image Image 

Clip_guidance_scale vs steps JAX 2.4, 12/22/21

https://discord.com/channels/869630568818696202/869675061211181107/923436854504751115

I ran this test to see for myself that your cgs/steps=10 ratio works out.  nshepperd Jax notebook v2.4  all_title = “cursed toilet by Zdzisław Beksiński”,  image_size = (512, 512),  cutn = 8,  cut_pow = 1.0  cut_batches = 4  tv_scale = 0  sat_scale = 0  eta = 1.0  starting_noise = 1.0  init_weight_mse = 0  Using default openai model  

You can see it looking more like the prompt as cgs increaseshttps://cdn.discordapp.com/attachments/869675061211181107/923436853552635944/beksinskitoilet_tvscale_0_iterstepscg_long.png

Clip_guidance_scale vs steps, JAX, 12/31/21

https://discord.com/channels/869630568818696202/869675223648202842/926585404906405958

 

Eta vs steps, JAX, 1/5/22

https://discord.com/channels/869630568818696202/869675061211181107/928489173801902142

“Stepping through eta might make for some interesting animations”Image

Clamp_max vs steps, JAX, 1/18/22

https://discord.com/channels/869630568818696202/913163239657992232/933078535084597318
Large Image:  https://cdn.discordapp.com/attachments/913163239657992232/933078534396727296/FGbwfDYMlQAAAAAASUVORK5CYII-1.png

Clamp_max .01 to .5,  steps 25 – 500https://cdn.discordapp.com/attachments/913163239657992232/933078534396727296/FGbwfDYMlQAAAAAASUVORK5CYII-1.png

Clip_guidance_scale vs steps DD, 1/30/22

https://discord.com/channels/869630568818696202/869675061211181107/937509848596250638

“Here’s from disco with clamp_max=0.2” Image

Steps comparison, DD, 3/5/22

All tests with 1280×960, cutn_batches=2 and seed=87654321. All other params on default. Steps Comparison: 250, 500, 1000, 1500 with four prompts.

Prompts:

  • a beautiful painting of a building in a serene landscape by Greg Rutkowski and Thomas Kinkade, trending on ArtStation.
  • an ominous painting of the Eiffel tower by Zdzisław Beksiński
  • a beautiful portrait of mecha statue of liberty by  James Jean and Ross Tran
  • a magic realism painting by Gediminas Pranckevicius depicting an abandoned building in a field of flowers landscape, vibrant, cinematic lighting

Original twitter thread.

Image
Image
Image
Image

Clamp_max study, DD, 3/5/22

Clamp_max: For clamp_max 0.03, 0.035, 0.04, 0.045, 0.05 with steps 250 across four prompts

Image
Image
Image

Clamp_max pt.2: , DD, 3/5/22


For clamp_max 0.01 – 0.08 with steps 250 across four prompts Same as previous test, but the range is increased. Should make it more apparent how clamp max effects the image overall!ImageImageImageImage

steps vs clamp_max, DD, 3/5/22

DD, 3/5/22

Steps vs clamp_max: For steps 250, 500, 1000 with clamp_max 0.03, 0.04, 0.05 with four prompts. 

ETA, DD, 3/5/22

For eta 0.0, 0.2, 0.4, 0.6, 0.8, 1.0 with 250 steps and clamp_max=0.05 across four prompts

ETA Negative Range, DD, 3/6/22

For -1.0, -0.08, -0.06, -0.04, 0.02 with 250 steps and clamp_max=0.05 across four prompts.

ETA vs clamp_max, DD, 3/6/22

For ETA -1.0 – 1.0 vs clamp_max 0.01 – 0.09 with 250 steps across four prompts.

Cut_overview vs cut_Innercut, DD, 3/9/22

For 3 cut_overview vs cut_innercut across four prompts.

Cut_ic_pow, DD, 3/8/22

For cut_ic_pow 1, 10, 100 across four prompts

Cut_ic_pow vs cut_innercut, DD, 3/10/22

For cut pow 1,10,100,1000 vs cutn 4,8,16,32 with cut_overview=4 across 4 prompts

Clamp_max vs skip_steps, DD, 3/11/22

For clamp_max 0.05,0.1,0.2,0.3 vs skip_steps 0,10,30,50 across four prompts.

Clip_guidance_scale, DD, 3/12/22

For 500,1500,5000,15000,45000,135000 clip_guidance_scale across four prompts

Clip_guidance_scale pt2, DD, 3/12/22

For clip_guidance 135000,405000,1215000 across four prompts.

CLIP model comparisons, DD4.1, 3/3/22

https://discord.com/channels/944025072648216586/944025513322774608/949139807341252619
Using Disco Diffusion v4.1

Prompt: “A painting of sea cliffs in a tumultuous storm, Trending on ArtStation.”

Seed: 2472644150

250 steps

Initial image:

Image

Only using “RN101”:Image

Only using “ViTL14”:

Image

Only using “ViTB32”:

Image

Only using “ViTB16”:

Image

Using “ViTB32, ViTB16, RN50” (default):Image

Using “ViTB32, ViTB16, RN50x4”:

Image

Using “ViTB32, ViTB16, RN101”:Image

Using “ViTB32, ViTB16, RN101, RN50”:

Image

Using “ViTB32, ViTB16, ViTL14, RN50”:

Image

— END OF MODEL COMPARISONS —-

Skip_steps, DD4.1, 3/4/22

https://discord.com/channels/944025072648216586/944025513322774608/949430945772089465
Using Disco Diffusion v4.1

Prompt: “A painting of sea cliffs in a tumultuous storm, Trending on ArtStation.”

Seed: 2472644150

init:

Image

Skip_steps = 50Image

Skip_steps, DD4.1, 3/4/22

https://discord.com/channels/944025072648216586/944025513322774608/949451996325347350

Using Disco Diffusion v4.1

Prompt: “A painting of sea cliffs in a tumultuous storm, Trending on ArtStation.”

Prompts set to start at frame 0

Seed: 2472644150

350 total steps, 100 steps skipped

Initial image:

Image

Using ViTB32, ViTB16, RN50:

Image

Using only ViTB32:

Image

Using only ViTB16:Image 

Using ViTB32, ViTB16, ViTL14, RN101, RN50, RN50x4:

Image

Using only RN50:

Image

Using only RN50x4:

Image

Using only RN50x16:

(lol what? It turned into a dragon)

Image

Only using RN101:

Image

Using only ViTL14:

Image

Using ViTB32, ViTB16:

Image

Using ViTB32, ViTL14:

Image

Using ViTB32, ViTB16, RN101:

Image

Using ViTB32, ViTB16, RN50, RN50x4:

Image

TESTER COMMENTS: I am currently running more examples. But let’s ask ourselves: what can we learn from the results thus far? 🤔

Well, at least in the short term (250 unskipped steps):

* ViTB32 and ViTB16 tend to be the most detailed

* Using a large number of settings at the same time tends to make the end picture more abstract overall

Skip_steps, DD4.1, 3/6/22

https://discord.com/channels/944025072648216586/949776749456138320/950153822947397762

Even more tests using the Seascape Settings!

This time we will be testing the number of steps skipped under the “skip_steps:” setting. This is used when you use an initial image (“init_image:”).  It’s a bit hard to describe, but here’s my understanding of it:

When you “skip steps”, it makes the initial image sharper and less blurred. In other words, the more steps you tell it to skip, the more your final version will resemble the initial image. REMEMBER that “skipped steps” will not actually be run, and so if you have 350 steps total and it’s set to skip 100, you will only end up running 250 steps. It can be a little hard to visualize, so let’s just show the tests

Initial image (same as prior test):

Image                 0% skipped (0 skipped, 250 total):Image 

5% skipped (13 skipped, 263 total): [NOTE: The numbers may end up seeming odd. This is because I used some math to make sure that the end result always runs for 250 frames, no matter how many are skipped]

Image

10% skipped (28 skipped, 278 total):Image

15% skipped (44 skipped, 294 total):

Image

20% skipped (63 skipped, 313 total):Image

~30% (technically 28%) skipped (100 skipped, 350 total):

Image

40% skipped (167 skipped, 417 total:

Image

50% skipped (250 skipped, 500 total):

Image

60% skipped (375 skipped, 625 total):

Image

Output image size (width_height), DD4.1, 3/8/22

https://discord.com/channels/944025072648216586/949776749456138320/950813424051441674

Time for another Seascape Test! We’ll be using all of these previously mentioned setting (look at the comment I’m replying to right now). We’ll also be using the settings ViTB32, ViTB16, RN50 (default settings). But there will be one BIG difference: we’ll be changing the resolution!

Note that in this first test, the initial image will still remain 1280×768. Later tests will change the size of the initial images used, and other tests will not use an initial image at all:

Initial image (same as prior test):

Image

12.5% size (160,96)

Image

25% size (320,192):

Image

37.5% size (480,288):

Image

50% size (640,384):

Image

75% size (960,576):

Image

100% size (1280,768):

Image

125% size (1600,960):

Image

137.5% size (1760,1056):

Image

TESTER COMMENTS: What can we infer from this? 🤔

Well, for one thing, in my opinion much lower resolutions get some really awesome abstract results. Slightly lower resolutions seem to get attractive painterly / impressionistic results. Larger resolutions aren’t particularly noteworthy to me.

Output image size (width_height), DD4.1, 3/8/22

https://discord.com/channels/944025072648216586/949776749456138320/950813424051441674

Once again, I’m using all the same settings in this post I’m quoting (click on it to read more), in addition to the ViTB32, ViTB16, RN50 (default) settings. But this time I decreased the number of steps to hilariously small levels, and also used an initial image.

Note: I altered the number of skipped steps each time to keep it at roughly 20% of the total number of steps.

5 steps (1 skipped, 6 total):

Image

10 steps (2 skipped, 12 total):

Image

15 steps (4 skipped, 19 total):

Image

20 steps (5 skipped, 25 total):

Image

25 steps (6 skipped, 31 total):

Image

30 steps (8 skipped, 38 total):

Image

35 steps (9 skipped, 44 total):

Image

40 steps (10 skipped, 50 total):

Image

45 steps (12 skipped, 57 total):

Image

50 steps (13 skipped, 63 total):

Image

TESTER NOTES: Okay, those are the results I have for now. Conclusion? People who do a huge number of steps are cowards 😤

Low steps look awesome IMO

use_secondary_model , DD, 3/13/22

https://discord.com/channels/944025072648216586/949776749456138320/952628761726165083

not using the Secondary Model really makes the rendering slow. I thought i was smart so i’ve ran it disabled for a few days.

without secondary model: Finished in 07m27s

with secondary model: Finished in 03m12s

The without images i did looks a little more interesting though, you can tell its more advanced with more things added overall to the image.

Without secondary: 

Image

With secondary:

Image

Test 2: without secondary

Image

Test 2: with secondary

Image

range_scale study, 4/27/22


Test of impact of range_scale over a very wide range of values.  

Analyst comment: Over a wide range from 0 up to 1 billion,  range_scale seemed to have little discernible effect.

Clamp_grad turned OFF to allow coloration to run without clamping. Cgs range as shown. Otherwise default values. 

https://discord.com/channels/944025072648216586/949776749456138320/968022827955523624

Model Comparison Study, DD v. 5–DDIM Sampling; 5/02/2022

From my (KaliYuga’s) original post, slightly edited to correct terminology: “Hey guys! I’ve been hard at work for the last few days putting together a set of comparisons exploring the results given by different combinations of the models available in Disco Diffusion V5.

The first chart is a comparison of all ViT model combinations vs all RN model combinations (excepting the RN50x64 model, which I can’t run on Colab Pro). The second chart is the same thing with transposed axes. The rest of the charts are a comparison for each ViT and RN model in isolation. I’ve posted all of them below; additionally, here’s a link to an imgur album of the same images, which are slightly higher resolution, I think. Because I screwed up, the first image in this series is at the very bottom of the imgur album.

All of these models can be found and selected under the “Models Settings” section of the Disco Diffusion V.5 notebook. 

Note from KaliYuga: You’ll note that the axes are labeled “Primary” and “Secondary” on the charts. This is an error. “Primary” should read “ViT” and “Secondary” should read “RN.”

[Images begin on next page]

Model Comparison Study, DD v. 5–PLMS Sampling; 5/02/2022

Original Post by KaliYuga

Link to Higher-Res Imgur Album

Images begin on next page