• in3d 3 hours ago
    • vessenes 3 hours ago

      Flux is so frustrating to me. Really good prompt adherence, strong ability to keep track of multiple parts of a scene, it's technically very impressive. However it seems to have had no training on art-art. I can't get it to generate even something that looks like Degas, for instance. And, I can't even fine tune a painterly art style of any sort into Flux dev. I get that there was working, living artist backlash at SD and I can therefore imagine that the BFL team has decided not to train on art, but, it's a real loss. Both in terms of human knowledge of, say composition, emotion, and so on, but also for style diversity.

      For goodness sake, the MET in New York has a massive trove of open CC0 type licensed art. Dear BFL, please ease up a bit on this, and add some art-art to your models, they will be better as a result.

      • whywhywhywhy 7 minutes ago

        >However it seems to have had no training on art-art. I can't get it to generate even something that looks like Degas, for instance

        It feels like they just removed names from the datasets to make it worse at recreating famous people and artists.

        • crystal_revenge 2 hours ago

          I've had a similar experience, incredible at generating a very specific style of image, but not great at generating anything with a specific style.

          I suspect we'll see the answer to this is LoRAs. Two examples that stick out are:

          - Flux Tarot v1 [0]

          - Flux Amateur Photography [1]

          Both of these do a great job of combining all the benefits of Flux with custom styles that seem to work quite well.

          [0] https://huggingface.co/multimodalart/flux-tarot-v1 [1] https://civitai.com/models/652699?modelVersionId=756149

          • throwup238 2 hours ago

            I’ve had the same problem with photography styles, even though the photographer I’m going for is Prokudin-Gorskii who used emulsion plates in the 1910s and the entire Library of Congress collection is in the public domain. I’m curious how they even managed to remove them from the training data since the entire LoC is such an easy dataset to access.

            • throwaway314155 2 hours ago

              i'm fairly confident they did a broad FirstName LastName removal.

            • gs17 2 hours ago

              And I can't imagine there's a real copyright (or ethical) issue with including artwork in the public domain because the artist died over a century ago.

              • pdntspa 2 hours ago

                I wonder if you can use Flux to generate the base image then img2img on SD1.4 to impart artistic style?

                • vunderba 2 hours ago

                  That's what a refiner is for in auto1111. Taking an image the last 10% and touching it up with an alternative model.

                  I actually use flux to generate image for purposes of adherence, then pull it in as a canny/depth controlnet with more established models like realvis, unstableXL, etc.

                • thomastjeffery 2 hours ago

                  I think that's part of what makes FLUX.1 so good: the content it's trained on is very similar.

                  Diversity is a double-edged sword. It's a desirable feature where you want it, and an undesirable feature everywhere else. If you want an impressionist painting, then it's good to have Monet and Degas in the training corpus. On the other hand, if you want a photograph of water lilies, then it's good to keep Monet out of the training data.

                • ilaksh 2 hours ago

                  Pretty smart model. Here's one I made: https://replicate.com/p/6ez0x8xqvsrga0cjadg8m7bah0

                  • PcChip 13 minutes ago
                    • drdaeman an hour ago

                      Yet, it doesn't seem to know how a Tektronix 4010 actually looks like... ;)

                      I had similar issues trying to paint a "I cast non-magic missile" meme with a fantasy wizard using a missile launcher. No model out there (I've tried SD, SDXL, FLUX.1dev and now this FLUX1.1pro) knows how a missile launcher looks like (neither as a generic term, nor any specific systems) and even has no clue how it's held, so they all draw really weird contraptions.

                      • loufe 2 hours ago

                        That is astoundingly good adherence to the description. I already liked and was impressed by Flux1 but that is perhaps the most impressive image generation I've ever seen.

                      • ChrisArchitect an hour ago
                        • doctorpangloss 3 hours ago

                          I'm worried about what happens when more people find out about Ideogram.

                          There are a lot of things that don't appear in ELO scores. For one, they will not reflect that you cannot prompt women's faces in Flux. We can only speculate why.

                          • liuliu 3 hours ago

                            What do you mean? FLUX.1 prompts women or women faces just fine? Do you mean the skin texture is unrealistic or some other artifacts?

                            • jjcm 2 hours ago

                              Flux tends to gravitate towards a single face archetype for both sexes. For women it's a narrow face with a very slightly cleft chin. Men almost always appear with a very short cut beard or stubble. r/stablediffusion calls it the "flux face", and there are several LoRAs that aim to steer the model away from them.

                              • doctorpangloss 3 hours ago

                                Flux will not adhere to your detailed description of a woman's face nearly as well as it does for a man, and it doesn't adhere to text descriptions of faces well in general. This is not a technical limitation, this was a choice in the captioning of the model's dataset and maybe other more sophisticated decisions like loss. It exhibits similar flaws with its representation of male versus female celebrities; it also exhibits this flaw when you use language that describes male celebrities versus female celebrities appearances.

                                • throwaway314155 an hour ago

                                  what they really mean is that it's not useful for generating lewd imagery of women. It was likely nerfed in this regard on purpose because BFL didn't want to be associated with that (however legal it may be).

                                • giancarlostoro 3 hours ago

                                  How locked down is it? My problem with a lot of these is I like to make really ridiculous meme type images, but I run into walls for dumb reasons. Like if I want to make something thats "copyrighted" like a mix of certain characters from one franchise or whatever, I cannot sometimes I get told that the model cannot generate copyrighted content, even though courts ruled that AI generated stuff cannot be copyrighted either way...

                                  I feel like AI should just be treated as fair use as long as its not 100% blatantly a literal clone of the original work.

                                  • sdenton4 19 minutes ago

                                    It's perfectly happy to make an imperial storm trooper riding a dragon, for what it's worth

                                    • doctorpangloss 3 hours ago

                                      > How locked down is it? ... I get told that the model cannot generate copyrighted... AI should just be treated as fair use

                                      Ideogram and Flux both have their own broad set of limitations that are non-technical and unpublished. IMO they are not really motivated by legal concerns, other than the lack of transparency itself.

                                      So maybe the issue is that transparency, and that the hazy legal climate means no transparency. You can't go anywhere and see the detailed list of dataset collection and captioning opinions for proprietary models. Open Model Initiative, trying to make a model, did publish their opinions, and they're not getting sued anytime soon. However, their opinions are an endless source of conflict.

                                      • jjordan 2 hours ago

                                        I've been using Venice.ai which offers afaik the most uncensored service currently available, outside of running your own instances. No problem with prompts that include copyrighted terms.

                                    • skybrian 2 hours ago

                                      It doesn’t get piano keyboards right, but it’s the first image generator I’ve tried that sometimes get “someone playing accordion” mostly right.

                                      When I ask for a man playing accordion, it’s usually a somewhat flawed piano accordion, but If I ask for a woman playing accordion, it’s usually a button accordion. I’ve also seen a few that are half-button, half-piano monstrosities.

                                      Also, if I ask for “someone playing accordion”, it’s always a woman.

                                      • vunderba 2 hours ago

                                        Periodic data is always hard for generative image systems - particularly if that "cycle" window is relatively large (as would be the case for octaves of a piano).

                                      • sharkjacobs 3 hours ago

                                        "state of the art" has become such tired marketing jargon.

                                        "our most advanced and efficient model yet"

                                        "a significant step forward in our mission to empower creators"

                                        I get it, you can't sell things if you don't market them, and you can't make a living making things if you don't sell them, but it's exhausting.

                                        • bemmu 3 hours ago

                                          Flux genuinely is the best model I’ve tried though. If there is a better one I’d love to know.

                                          • GaggiX 2 hours ago

                                            Have you tried Ideogram v2?

                                            • SV_BubbleTime an hour ago

                                              Have you run Ideogram offline?

                                          • johnfn 2 hours ago

                                            Flux is state of the art. You can see an ELO-scored leaderboard here:

                                            https://huggingface.co/spaces/ArtificialAnalysis/Text-to-Ima...

                                            • halJordan 3 hours ago

                                              It is state of the art. And it's not like the art has stagnated.

                                              • vunderba 2 hours ago

                                                Agreed, but the flux dev model is easily the best model out there in terms of overall prompt adherence that can also be run locally.

                                                Some comparisons against DALL-E 3.

                                                https://mordenstar.com/blog/flux-comparisons

                                                • arizen 3 hours ago

                                                  - How do copywriters greet each other in the morning?

                                                  - Take your morning to the next level!

                                                  • minimaxir 3 hours ago

                                                    The official blog post justifies the marketing copy a bit more with metrics.

                                                    • sharkjacobs 3 hours ago

                                                      The point is that the metrics say the thing, this stuff doesn't say actually anything.

                                                      What does "state of the art" mean? That it's using the latest "cutting edge" model technology?

                                                      When Apple releases a new iPhone Pro Max, it's "state of the art". When they release a new iPhone SE, there's an argument to be made that it's not because it uses 2 year old chips. But what would it even mean for BFL to release a model which wasn't "state of the art"

                                                      > our most advanced and efficient model yet

                                                      Yes, likewise, this is how technology companies work. They release something and then the next thing they release is more advanced.

                                                      > a significant step forward in our mission to empower creators

                                                      Going from 12 seconds to 4 seconds is a significant speed boost, but does it move the needle on their mission to empower creators? These are their words, not mine, it's a technical achievement and impressive incremental progress, but are there users out there who are more empowered by this? significantly more empowered!?

                                                      • throwaway314155 3 hours ago

                                                        Holy shit the level of pedantry. State of the art in this context means it out performs all other models to date on standard evaluations, which is precisely what it does.

                                                        Did you miss the first flux release? Black forest labs aren't screwing around. The team consists of many of the _actual_ originators of Stable Diffusion's research (which was effectively co-opted by Emad Mostaque who is likely a sociopath).

                                                        • sharkjacobs an hour ago

                                                          > State of the art in this context means it out performs all other models to date on standard evaluations, which is precisely what it does.

                                                          That's not what "state of the art" means, and if it did it would still be hollow marketing jargon, because there are specific and meaningful ways to say that FLUX1.1 [pro] outperforms all competitors (and they do say so, later in the press release)

                                                          Your confusion about what "state of the art" means is exactly why marketers still use the phrase even though it has been overused and worn out since at least the 1980's. State of the art means something is "new", and that it is the "latest development", and that it incorporates "cutting edge" technology. The implication is that new is better, and that the "state of the art" is an improvement over what came before. (And to be clear, that's often true! Including in this case!) But that's not what the phrase actually means, it just means that something is new. And every press release is about something new.

                                                          FLUX1.1 [pro] would be state of the art even if it was worse than the previous version. Stable Diffusion 2.0 was state of the art when it was released.

                                                          • throwaway314155 an hour ago

                                                            I said in this context for a reason. That's how state of the art has been used (in papers, not copy) with regard to deep learning since well before DALL-E 1. I maintain that you're being pedantic about appropriating a term of art to mean something else. Everyone else here knows what the meaning is in context. Just not you.

                                                  • Jackson__ 3 hours ago

                                                    Ah, that was one short gravy train even by modern tech company standards. Really wish the space was more competitive and open so it wouldn't just be one company at the top locking their models behind APIs.

                                                    • Der_Einzige 2 hours ago

                                                      Far more interesting will be when pony diffusion V7 launches.

                                                      No one in the image space wants to admit it, but well over half of your user base wants to generate hardcore NSFW with your models and they mostly don’t care about any other capabilities.

                                                      • ks2048 2 hours ago

                                                        Is there a good site that compares text-to-image models - showing a bunch of examples of text w/ output on each model?

                                                        • byteknight 3 hours ago

                                                          I won't pay for a model, but that cake image looks dang good.

                                                          • nirav72 3 hours ago

                                                            Are there any projects that allow for easy setup and hosting Flux locally? Similar to SD projects like InvokeAI or a1111

                                                            • nickthegreek 2 hours ago
                                                              • vunderba 2 hours ago

                                                                The answer is it really depends on your hardware, but the nice thing is that you can split out the text encoder when using ComfyUI. On a 24gb VRAM card I can run the Q8_0 GGUF version of flux-dev with the T5 FP16 text encoder. The Q8_0 gguf version in particular has very little visual difference from the original fp16 models. A 1024x1024 image takes about 15 seconds to generate.

                                                                • doctorpangloss 2 hours ago
                                                                  • minimaxir 3 hours ago

                                                                    Flux is more weird than old SD projects since Flux is extremely resource dependant and won't run on most hardware.

                                                                    • waffletower 3 hours ago

                                                                      Doesn't take a lot of effort to get Flux dev/schnell to run on 3090s unquantized, but I agree that 24gb is the consumer GPU memory limit and there are many with less than that. Flux runs great on modern Mac hardware as well, if you have at least 32gb of unified memory.

                                                                      • stoobs 2 hours ago

                                                                        I'm running Flux dev fine on a 3080 10GB, unquantised, on windows the nvidia drivers have a function to let it spill over into system ram. It runs a little slower, but it's not a deal-breaker unlike nvidia's pricing and power requirements at the moment

                                                                        • zamadatix 24 minutes ago

                                                                          What are you using to run it? When I run Flux Dev in Windows using comfy on a 4090 (24 GB) sometimes it all crashes because it runs out of VRAM when I'm doing too much other stuff.

                                                                        • drcongo an hour ago

                                                                          Really? I tried using it in ComfyUI on my Mac Studio, failed, went searching for answers and all I could find said that something something fp8 can't run on a Mac, so I moved on.

                                                                        • ziddoap 3 hours ago

                                                                          People have Flux running on pretty much everything at this point, assuming you are comfortable waiting 3+ minutes for a 512x512 image.

                                                                          I managed to get it running on an old computer with a 2060 Super, taking ~1.5 minutes per image gen. People are generating on a 1080.

                                                                          • Filligree 3 hours ago

                                                                            The GGUF quantisations do run on most recent hardware, albeit at increasingly concerning quality tradeoffs.

                                                                            • tripplyons 3 hours ago

                                                                              I haven't noticed any quality degradation with the 8-bit GGUF for Flux Dev, but I'm sure the smaller quantizations perform worse.

                                                                          • leumon 3 hours ago

                                                                            Using comfyui with the official flux workflow is easy and works nicely. comfy can also be used via API.

                                                                            • pdntspa 2 hours ago

                                                                              DrawThings on Mac

                                                                            • jchw 3 hours ago

                                                                              The generated images look impressive of course but I can't help but be mildly amused by the fact that the prompt for the second example image insists strongly that the image should say 1.1:

                                                                              > ... photo with the text "FLUX 1.1 [Pro]", ..., must say "1.1", ...

                                                                              ...And of course, it does not.

                                                                            • jeffbee 2 hours ago

                                                                              I asked for a simple scene and it drew in the exact same AI girl that every text-to-image model wants to draw, same face, same hair, so generic that a Google reverse image search pulls up thousands of the exact same AI girl. No variety of output at all.

                                                                              • melvinmelih 3 hours ago

                                                                                In case you want to try it out without hassling with the API, I've set up a free tool for it so you can try it out on WhatsApp: https://instatools.ai/products/fluxprovisions