Comments Page - FLUX1.1 [pro] – New SotA text-to-image model from Black Forest Labs

« Back FLUX1.1 [pro] – New SotA text-to-image model from Black Forest Labsreplicate.comSubmitted by fagerhult a year ago

in3d a year ago
Better link https://blackforestlabs.ai/announcing-flux-1-1-pro-and-the-b...
vessenes a year ago
Flux is so frustrating to me. Really good prompt adherence, strong ability to keep track of multiple parts of a scene, it's technically very impressive. However it seems to have had no training on art-art. I can't get it to generate even something that looks like Degas, for instance. And, I can't even fine tune a painterly art style of any sort into Flux dev. I get that there was working, living artist backlash at SD and I can therefore imagine that the BFL team has decided not to train on art, but, it's a real loss. Both in terms of human knowledge of, say composition, emotion, and so on, but also for style diversity.
For goodness sake, the MET in New York has a massive trove of open CC0 type licensed art. Dear BFL, please ease up a bit on this, and add some art-art to your models, they will be better as a result.
- crystal_revenge a year ago
  I've had a similar experience, incredible at generating a very specific style of image, but not great at generating anything with a specific style.
  I suspect we'll see the answer to this is LoRAs. Two examples that stick out are:
  - Flux Tarot v1 [0]
  - Flux Amateur Photography [1]
  Both of these do a great job of combining all the benefits of Flux with custom styles that seem to work quite well.
  [0] https://huggingface.co/multimodalart/flux-tarot-v1 [1] https://civitai.com/models/652699?modelVersionId=756149
  vessenes a year ago
  I like those, and there's an electroshock lora that's just awesome out there. That said, Tarot and others like it are "illustrator" type styles with extra juice. I have not successfully trained a LoRa for any painting style, Flux does not seem to know about painting.
  davidbarker a year ago
  I'm curious to give this a go. I've been training a lot of LoRAs for FLUX dev recently (purely for fun). I'm sure there must be a way to get this working.
  Here are a few I've recently trained: https://civitai.com/user/dvyio
  spython a year ago
  This looks really good! What is your process to get this kind of high quality LoRAs?
  davidbarker a year ago
  Thank you!
  A reasonable amount of training images (50 or so), and then I train for 2,000-ish steps for a new style.
  Many of them work well with Flux, particularly if they're illustration-based. Some don't seem to work at all, so I didn't upload those!
  stavros a year ago
  How long does this take, and on what equipment? It's amazing to me that you can do this from just 50 images, I would have thought tens of thousands.
  davidbarker a year ago
  It's very impressive. I aim for around 50 images if I'm training a style, but only 10 to 20 if training a concept (like an object or a face).
  I have a MacBook Air so I train using the various API providers.
  For training a style, I use Replicate: https://replicate.com/ostris/flux-dev-lora-trainer/train
  For training a concept/person, I use fal: https://fal.ai/models/fal-ai/flux-lora-fast-training
  With fal, you can train a concept in around 2 minutes and only pay $2. Incredibly cheap. (You could also use it for training a style if you wanted to. I just found I seem to get slightly better results using Replicate's trainer for a style.)
  throw14082020 a year ago
  $2 for 2 minutes? Can't you get less than $2 for 1 hour using GPU machines from providers like runpod or AirGPU? I found it a bit expensive to use replicate and fal after 10 minutes of prompting.
  I have not used runpod or airgpu, and not affiliated.
  reissbaker a year ago
  Yes, renting raw compute via Runpod and friends will generally be much cheaper than renting a higher level service that uses that compute e.g. fal.ai or Replicate. For example, an A6000 on fal.ai is a little over $2/hr (they only show you the price in seconds, perhaps to make it more difficult to compare with ordinary GPU providers); on Runpod an A6000 is less than half that, $0.76/hr in their managed "Secure Cloud." If you're willing to take some risk of boxes disappearing, and don't need much security, Runpod's "Community Cloud" is even cheaper at $0.49/hr.
  Similar deal with Replicate: an A100 there is over $5/hr, whereas on Runpod it's $1.64/hr.
  And if you use the "serverless" services, the pricing becomes even more astronomical; as you note, $1/minute is unreasonably expensive: that's over 20x the cost of renting 8xH100s on Runpod's "Secure Cloud" (and 8xH100s are extreme overkill for finetuning image generators: even 1xH100 would be sufficient, meaning it's actually 160x markup).
  stavros a year ago
  Wow, fantastic, thanks! I thought it would be much, much more expensive than this. Thanks for the info!
  davidbarker a year ago
  Happy to help! It's a lot of fun. And it becomes even more fun when you combine LoRAs. So you could train one on your face, and then use that with a style LoRA, giving you a stylised version of your face.
  If you do end up training one on yourself with fal, it should ultimately take you here (https://fal.ai/models/fal-ai/flux-lora) with your new LoRA pre-filled.
  Then:
  1. Click 'Add item' to add another LoRA and enter the URL of a style LoRA's SafeTensor file (with Civitai, go to any style you like and copy the URL from the download button) (you can also find LoRAs on Hugging Face)
  2. Paste that SafeTensor URL as the second LoRA, remembering to include the trigger word for yourself (you set this when you start the training) and the trigger word for the style (it tells you on the Civitai page)
  3. Play with the strength for the LoRAs if you want it to look more like you or more like the style, etc.
  -----
  If you want a style LoRA to try, this one of SNL title cards I trained actually makes some great photographic images. https://civitai.com/models/773477/flux-lora-snl-portrait (the download link would be https://civitai.com/api/download/models/865105?type=Model&fo...)
  -----
  There's a lot of trial and error to get the best combinations. Have fun!
  throwup238 a year ago
  Have you tried img2text when training a style?
  I want to make a LoRA of Peokudin-Gorskii photographs from the Library of Congress collection and they have thousands of photos, so I’m curious whether that’s effective for autogenerating the caption for images.
  davidbarker a year ago
  It's funny you should ask. I recently released a plugin (https://community-en.eagle.cool/plugin/4B56113D-EB3E-4020-A8...) for Eagle (an asset library management app) that allows you to write rules to caption/tag images and videos using various AI models.
  I have a preset in there that I sometimes use to generate captions using GPT-4o.
  If you use Replicate, they'll also generate captions for you automatically if you wish. (I think they use LLaVA behind the scenes.) I typically use this just because it's easier, and seems to work well enough.
  throwup238 a year ago
  That’s awesome! Thank you for the replicate link too. I didn’t know they also did LoRA training. They’ve been kind of hitting it lit the park lately.
  stavros a year ago
  Thanks for all this! I had created a SD LoRA of my face back in the day, time for another one!
  davidbarker a year ago
  Awesome! :)
  vessenes a year ago
  @davidbarker -- please do, that sounds awesome! I did not have good results.
  davidbarker a year ago
  It's trickier than I thought it would be.
  Here are a few in Degar style I made after training for 2,500 steps. I'd love to hear what you think of them. To my (untrained) eye, they seem a little too defined, perhaps?
  https://imgur.com/a/sqsQLPg
  vessenes a year ago
  Yep absolutely nothing like degas well I take that back. I think it picked up some favorite colors/tones. But it has no concept of the materials or poses or composition. So plasticky! Compare to https://images.app.goo.gl/JiDRYNNKUP9tczkQ7
  davidbarker a year ago
  I suspect it really needs more training examples. The problem I found when I looked for images to use was that 60% were of dancers, and from past experience, it will end up trying to fit a dancer into every image you create. But of course, there are only a (small) finite number of Degas images that you can train with.
  A possible solution may be to incorporate artificial images in the training data. So, create an initial LoRA with the original Degas images and generate 500 images. From those generated images, pick the ones that most resemble Degas. Add those to the training set and train again. Repeat until (hopefully) it learns the correct style.
  davidbarker a year ago
  Out of curiosity, what do you think of these? https://imgur.com/a/8p7RlMe
  vessenes a year ago
  Significantly better - they feel like watercolor more than degas but if that’s flux I am impressed!
  davidbarker a year ago
  Unfortunately, not Flux. They're from Midjourney, using a few Degas as a style reference.
  Whatever they're doing at Midjourney is still impressive. No training needed and a better result.
- whywhywhywhy a year ago
  >However it seems to have had no training on art-art. I can't get it to generate even something that looks like Degas, for instance
  It feels like they just removed names from the datasets to make it worse at recreating famous people and artists.
  vessenes a year ago
  No, they absolutely did not just do that in this case, although that was the SD plan. If you prompt for "painterly, oil painting, thick brush strokes, impressionistic oil painting style" to flux, you will get ... anime-ish renderings.
  whywhywhywhy a year ago
  That's not what I'm talking about, SDXL you can literally prompt a famous artists entire style and mix and match them, even conceptual artists and sculptors.
- throwup238 a year ago
  I’ve had the same problem with photography styles, even though the photographer I’m going for is Prokudin-Gorskii who used emulsion plates in the 1910s and the entire Library of Congress collection is in the public domain. I’m curious how they even managed to remove them from the training data since the entire LoC is such an easy dataset to access.
  vessenes a year ago
  Yes, exactly. I think they purposely did not train on stuff like this. I'd bet that you could do a LoRa of Prokudin-Gorskii though; there's a lot of photographic content in flux's training set.
  throwaway314155 a year ago
  i'm fairly confident they did a broad FirstName LastName removal.
- gs17 a year ago
  And I can't imagine there's a real copyright (or ethical) issue with including artwork in the public domain because the artist died over a century ago.
- thomastjeffery a year ago
  I think that's part of what makes FLUX.1 so good: the content it's trained on is very similar.
  Diversity is a double-edged sword. It's a desirable feature where you want it, and an undesirable feature everywhere else. If you want an impressionist painting, then it's good to have Monet and Degas in the training corpus. On the other hand, if you want a photograph of water lilies, then it's good to keep Monet out of the training data.
  doctorpangloss a year ago
  DALL-E3 doesn't struggle with this. It's just opinions. There's no technical limitation. They chose to weaken the model in this regard.
  thomastjeffery a year ago
  Nonsense. FLUX.1-dev is famous for its consistency, prompt adherence, etc.; and it fits on a consumer GPU. That has to come with compromises. You can call any optimization weakness: that's the nature of compromise.
- weebull a year ago
  I wonder if part of the reason it's good is because it's been trained for a more specific task. I can only imagine that if your concept of a "house" includes range from a stately home to "a pineapple under the sea" you're going to end up with a very generalised concept. It's then takes specific prompting to remove the influences you're not interested in.
  I suspect the same goes for art styles. There's such huge variety that really they'd be better surveys by separate models.
- DeathArrow a year ago
  There are people who undistilled Flux so it can be further finetuned, so adding art training won't be an issue.
  https://huggingface.co/nyanko7/flux-dev-de-distill
- pdntspa a year ago
  I wonder if you can use Flux to generate the base image then img2img on SD1.4 to impart artistic style?
  vunderba a year ago
  That's what a refiner is for in auto1111. Taking an image the last 10% and touching it up with an alternative model.
  I actually use flux to generate image for purposes of adherence, then pull it in as a canny/depth controlnet with more established models like realvis, unstableXL, etc.
  andersa a year ago
  That is an interesting idea, I somehow hadn't thought of using flux in a chain like that, thanks!
  vessenes a year ago
  Yes, that is my current workflow as well.
- skort a year ago
  >but, it's a real loss. Both in terms of human knowledge of, say composition, emotion, and so on, but also for style diversity
  But that real art still exists, and can still be found, so what exactly is the loss here?
  vessenes a year ago
  We may differ on our take about the usefulness of diffusion models, but I'd say it's a loss in that many of the visuals humans will see in the next ten years are going to be generated by these models, and I for one wish they weren't just trained on weeb shit.
  dagaci a year ago
  Just think that before 1995 (and in reality, decades later than that) most of the world would never have access to 99% of the worlds art.
  And between 1995 and 2022 the amount of Art produced surpasses the cumulative output of all other periods of human history.
  vessenes a year ago
  ... And between 2022 and 2025 the amount of imagery generated will drive the percent of Art created to roughly 0% of all imagery.
  skort a year ago
  You'll still be able to ask a person to create art in a specific style if you'd like.
  vessenes a year ago
  Unfortunately we will have a generation of young artists who learn to draw based on models like flux, unless they get classical training..
ilaksh a year ago
Pretty smart model. Here's one I made: https://replicate.com/p/6ez0x8xqvsrga0cjadg8m7bah0
- jug a year ago
  One thing that makes FLUX so special is the prompt understanding. I now gave FLUX 1.1 a prompt "Closeup of a doll house built to resemble a famous room in the TV show Friends" and it gave me one with the sign "Central Perk". I never prompted for the text "Central Perk". A Redditor also discovered that it has an associative understanding of emotions. For example "Rose of passion" and it may draw a flower that is burning, because passion is fiery.
  This is miles ahead of most other image generation models available today.
- drdaeman a year ago
  Yet, it doesn't seem to know how a Tektronix 4010 actually looks like... ;)
  I had similar issues trying to paint a "I cast non-magic missile" meme with a fantasy wizard using a missile launcher. No model out there (I've tried SD, SDXL, FLUX.1dev and now this FLUX1.1pro) knows how a missile launcher looks like (neither as a generic term, nor any specific systems) and even has no clue how it's held, so they all draw really weird contraptions.
  morbicer a year ago
  Isn't it because the shoulder launched weapon is usually called rocket launcher, rpg or bazooka? Never heard it referred as misille launcher.
  drdaeman a year ago
  I've tried all of those and then some (e.g. "ATGM"), plus various specific names (like "FGM-148 Javelin", "M1 Bazooka", or "RPG-7", which are all quite iconic and well-recognized so I thought some of those may appear in training data) - all no bueno. Models are simply unaware about such devices, best of their "guesses" is that it's a weapon, so they draw something rifle- or pistol-shaped.
  And, sure, that's what LoRAs are for. If I can figure out how to train one for FLUX, in a way that would actually produce something meaningful (my pitiful attempts at SDXL LoRA training were... less that stellar, and FLUX is quite different from everything). Although that's probably not worth it for making a meme picture...
- nikcub a year ago
  I've gone from counting fingers on a hand to keys on a keyboard
- PcChip a year ago
  agreed - pretty impressive! https://replicate.com/p/ajfrva4p4hrge0cjaf3bncfwn4
- loufe a year ago
  That is astoundingly good adherence to the description. I already liked and was impressed by Flux1 but that is perhaps the most impressive image generation I've ever seen.
  miohtama a year ago
  Is it going be able to go head-to-head against Midjourney?
  vunderba a year ago
  MJ is by far the worst model for complex prompt ADHERENCE, though it has excellent compositional quality.
  Comparisons of similar prompt using Midjourney 6.1
  https://imgur.com/a/WBnPl7I
  Also, flux (schnell, dev) can be run on your local machine.
  If you really want to use a paid service, Ideogram is probably the best one out there that balances quality with adherence. DALL-E 3 also has good adherence as well though the quality can sometimes be iffy, and it's very puritanical in terms of censorship.
- loxias a year ago
  It's quite good at following a detailed paragraph long description of an scene, which is a double edged sword. A lot of the fun for me with early text to image models was underspecifying an image and then enjoying how the model "invents" it. "Steampunk spaceship", "communist bear", "glass city".
  flux is amazing, but I find it requires a very literal description, which pushes the "creative work" back to the text itself. Which can certainly be a good thing, just a bit less gratifying to non visual types like myself. :)
  I wonder, only somewhat jokingly, if one could make text generators which "imagine" detailed fantastical scenes, suitable for feeding to a text to image model.
  vunderba a year ago
  That's what Fooocus is - it allows you to specify a "text expander" LLM that sits in between the input prompt and the diffusion model.
  https://github.com/lllyasviel/Fooocus
  ilaksh a year ago
  Prompt enhancement is now a standard feature in many image generation tools.
ChrisArchitect a year ago
Announcement post: https://blackforestlabs.ai/announcing-flux-1-1-pro-and-the-b...
(https://news.ycombinator.com/item?id=41730626)
sharkjacobs a year ago
"state of the art" has become such tired marketing jargon.
"our most advanced and efficient model yet"
"a significant step forward in our mission to empower creators"
I get it, you can't sell things if you don't market them, and you can't make a living making things if you don't sell them, but it's exhausting.
- johnfn a year ago
  Flux is state of the art. You can see an ELO-scored leaderboard here:
  https://huggingface.co/spaces/ArtificialAnalysis/Text-to-Ima...
- bemmu a year ago
  Flux genuinely is the best model I’ve tried though. If there is a better one I’d love to know.
  GaggiX a year ago
  Have you tried Ideogram v2?
  SV_BubbleTime a year ago
  Have you run Ideogram offline?
  GaggiX a year ago
  Have you run Flux Pro offline?
  SV_BubbleTime a year ago
  No, only a dozen Flux Dev models different distillations, quantizations, and fine-tunes with LORAs.
  But you keep pretending that close source AI is a sustainable comparison.
  GaggiX a year ago
  Flux Pro (v1 and v1.1) is a close source model.
  SV_BubbleTime a year ago
  Yea thanks.
  That’s why I said I run Flux Dev.
  Kerbonut a year ago
  Flux dev is also closed _source_ but at least weights available. Schnell is open weight.
  SV_BubbleTime a year ago
  Your info is terrible and outdated.
  OpenFlux son
  Kerbonut a year ago
  I’ve been meaning to give it a try. How is tooling support? Drop in replacement?
- halJordan a year ago
  It is state of the art. And it's not like the art has stagnated.
- vunderba a year ago
  Agreed, but the flux dev model is easily the best model out there in terms of overall prompt adherence that can also be run locally.
  Some comparisons against DALL-E 3.
  https://mordenstar.com/blog/flux-comparisons
- arizen a year ago
  - How do copywriters greet each other in the morning?
  - Take your morning to the next level!
- minimaxir a year ago
  The official blog post justifies the marketing copy a bit more with metrics.
  sharkjacobs a year ago
  The point is that the metrics say the thing, this stuff doesn't say actually anything.
  What does "state of the art" mean? That it's using the latest "cutting edge" model technology?
  When Apple releases a new iPhone Pro Max, it's "state of the art". When they release a new iPhone SE, there's an argument to be made that it's not because it uses 2 year old chips. But what would it even mean for BFL to release a model which wasn't "state of the art"
  > our most advanced and efficient model yet
  Yes, likewise, this is how technology companies work. They release something and then the next thing they release is more advanced.
  > a significant step forward in our mission to empower creators
  Going from 12 seconds to 4 seconds is a significant speed boost, but does it move the needle on their mission to empower creators? These are their words, not mine, it's a technical achievement and impressive incremental progress, but are there users out there who are more empowered by this? significantly more empowered!?
  throwaway314155 a year ago
  Holy shit the level of pedantry. State of the art in this context means it out performs all other models to date on standard evaluations, which is precisely what it does.
  Did you miss the first flux release? Black forest labs aren't screwing around. The team consists of many of the _actual_ originators of Stable Diffusion's research (which was effectively co-opted by Emad Mostaque who is likely a sociopath).
  sharkjacobs a year ago
  > State of the art in this context means it out performs all other models to date on standard evaluations, which is precisely what it does.
  That's not what "state of the art" means, and if it did it would still be hollow marketing jargon, because there are specific and meaningful ways to say that FLUX1.1 [pro] outperforms all competitors (and they do say so, later in the press release)
  Your confusion about what "state of the art" means is exactly why marketers still use the phrase even though it has been overused and worn out since at least the 1980's. State of the art means something is "new", and that it is the "latest development", and that it incorporates "cutting edge" technology. The implication is that new is better, and that the "state of the art" is an improvement over what came before. (And to be clear, that's often true! Including in this case!) But that's not what the phrase actually means, it just means that something is new. And every press release is about something new.
  FLUX1.1 [pro] would be state of the art even if it was worse than the previous version. Stable Diffusion 2.0 was state of the art when it was released.
  throwaway314155 a year ago
  I said in this context for a reason. That's how state of the art has been used (in papers, not copy) with regard to deep learning since well before DALL-E 1. I maintain that you're being pedantic about appropriating a term of art to mean something else. Everyone else here knows what the meaning is in context. Just not you.
Der_Einzige a year ago
Far more interesting will be when pony diffusion V7 launches.
No one in the image space wants to admit it, but well over half of your user base wants to generate hardcore NSFW with your models and they mostly don’t care about any other capabilities.
Jackson__ a year ago
Ah, that was one short gravy train even by modern tech company standards. Really wish the space was more competitive and open so it wouldn't just be one company at the top locking their models behind APIs.
skybrian a year ago
It doesn’t get piano keyboards right, but it’s the first image generator I’ve tried that sometimes get “someone playing accordion” mostly right.
When I ask for a man playing accordion, it’s usually a somewhat flawed piano accordion, but If I ask for a woman playing accordion, it’s usually a button accordion. I’ve also seen a few that are half-button, half-piano monstrosities.
Also, if I ask for “someone playing accordion”, it’s always a woman.
- vunderba a year ago
  Periodic data is always hard for generative image systems - particularly if that "cycle" window is relatively large (as would be the case for octaves of a piano).
  skybrian a year ago
  Yeah, it's my informal test to see if a new model has made any progress on that.
whitehexagon a year ago
I'm running Asahi Linux on a 32GB M1 Pro. Any chance of being able to run text-to-image models locally? I've had some success with LLMs, but only the smaller models. No idea where to start with images, everything seems geared towards msft+nvda.
- loxias a year ago
  Try https://github.com/leejet/stable-diffusion.cpp
- LeoPanthera a year ago
  "Draw Things" is a native Mac app for text to image. It's a a lot more advanced than DiffusionBee, it will download the models for you, and it's free. It's also available for iOS. (!)
  smcleod a year ago
  Draw things is neat but it's so damn slow compared to other tools (e.g. invokeai), I'm not sure why it takes so long to generate images with any model?
  liuliu a year ago
  On the same Mac hardware, Draw Things should be the fastest on models such as SDXL / FLUX.1 against other tools based on PyTorch (I stopped benchmarking SD v1.5 results for a while so that might regress a little bit here or there).
  LeoPanthera a year ago
  It's not any slower than invokeai for me. Maybe check the settings, and try using the GPU instead of CoreML.
- collinvandyck76 a year ago
  DiffusionBee will let you do this quite easily.
  edit: nevermind, it's a macos app
  lagniappe a year ago
  Is DiffusionBee still in development? I had stopped using it because it seemed like the dev interest had stalled.
  collinvandyck76 a year ago
  It gets periodic releases, but the source isn't typically updated at the same time.
doctorpangloss a year ago
I'm worried about what happens when more people find out about Ideogram.
There are a lot of things that don't appear in ELO scores. For one, they will not reflect that you cannot prompt women's faces in Flux. We can only speculate why.
- liuliu a year ago
  What do you mean? FLUX.1 prompts women or women faces just fine? Do you mean the skin texture is unrealistic or some other artifacts?
  jjcm a year ago
  Flux tends to gravitate towards a single face archetype for both sexes. For women it's a narrow face with a very slightly cleft chin. Men almost always appear with a very short cut beard or stubble. r/stablediffusion calls it the "flux face", and there are several LoRAs that aim to steer the model away from them.
  undefined a year ago
  [deleted]
  doctorpangloss a year ago
  Flux will not adhere to your detailed description of a woman's face nearly as well as it does for a man, and it doesn't adhere to text descriptions of faces well in general. This is not a technical limitation, this was a choice in the captioning of the model's dataset and maybe other more sophisticated decisions like loss. It exhibits similar flaws with its representation of male versus female celebrities; it also exhibits this flaw when you use language that describes male celebrities versus female celebrities appearances.
  fortran77 a year ago
  I found Flux will barely pay attention to a celebrity name. I like Flux but it makes all realistic human men and women look the same. I tried using celebrity names and it barely made a difference.
  throwaway314155 a year ago
  what they really mean is that it's not useful for generating lewd imagery of women. It was likely nerfed in this regard on purpose because BFL didn't want to be associated with that (however legal it may be).
  doctorpangloss a year ago
  I'm not sure why you're being downvoted because I think this is a misconception that's worth clearing up. There is no aspect of what I'm doing that is lewd or lewd adjacent. I just want control of a character's face for making art for an open source game. While I do not totally understand what specific decisions Flux made that would make their model weak in the regard of specifying the appearance of someone's face, one thing is clear: the humanities people are right, this is like a great example of how censorship and Big Prude has impacted artmaking.
  It is actually making it harder to use the technology to represent women characters, which is so ironic. That said, I could just lEaRn tO dRaW or pAy aN aRtIsT right? The discourse around this is so shitty.
  throwaway314155 a year ago
  I appreciate that you think downvotes shouldn't be used so liberally on this site (i.e. purely for disagreement). I _have_ checked this comment somewhat rigorously since I posted it and it's only ever hovered at around 1-2 points, but never below 0. So not sure what you're referring to.
  In any case, I believe that you're not generating lewd photos (I hope you don't blame me too much for being surprised though, it's _incredibly_ common). I still imagine the reason you're having trouble though is _because_ such descriptions are relatively adjacent enough for whatever censoring measures they used to nerf lewd outputs.
  Censorship is kinda bullshit. But, it's a very expensive toy that they have given away for free (the lesser models, anyway). You can use a LORA (or hell, train your own presumably million-dollar model) to remove these restrictions.
  It's more frustrating when you're paying for it and it's behind an API you have no control over. Which, I realize BFL is doing for their pro model - but this isn't anywhere near as egregious as, say, the level of censoring done behind the API for the DALL-E models. And, at the very least it is understandable from a _company_ perspective. They have both legal and public relations concerns to manage. To say that they should just ignore those facets of running a company feels a little entitled to me.
- giancarlostoro a year ago
  How locked down is it? My problem with a lot of these is I like to make really ridiculous meme type images, but I run into walls for dumb reasons. Like if I want to make something thats "copyrighted" like a mix of certain characters from one franchise or whatever, I cannot sometimes I get told that the model cannot generate copyrighted content, even though courts ruled that AI generated stuff cannot be copyrighted either way...
  I feel like AI should just be treated as fair use as long as its not 100% blatantly a literal clone of the original work.
  doctorpangloss a year ago
  > How locked down is it? ... I get told that the model cannot generate copyrighted... AI should just be treated as fair use
  Ideogram and Flux both have their own broad set of limitations that are non-technical and unpublished. IMO they are not really motivated by legal concerns, other than the lack of transparency itself.
  So maybe the issue is that transparency, and that the hazy legal climate means no transparency. You can't go anywhere and see the detailed list of dataset collection and captioning opinions for proprietary models. Open Model Initiative, trying to make a model, did publish their opinions, and they're not getting sued anytime soon. However, their opinions are an endless source of conflict.
  sdenton4 a year ago
  It's perfectly happy to make an imperial storm trooper riding a dragon, for what it's worth
  jjordan a year ago
  I've been using Venice.ai which offers afaik the most uncensored service currently available, outside of running your own instances. No problem with prompts that include copyrighted terms.
byteknight a year ago
I won't pay for a model, but that cake image looks dang good.
- mainframed a year ago
  Although culinarily incorrect :)
evrim189111 a year ago
I think Flux is better than SDXL and Dall e. I tried the models from here https://apps.apple.com/us/app/art-x-a-i-art-generator-aiart/...
fortran77 a year ago
I've been playing with Flux.Dev and such a big step forward from Stable Diffusion and all the other Generative AIs that could run on consumer GPUs.
I just tried this Flux1.1 pro page (prompt: "A sad Macintosh user who is upset because his computer can't play games") and was very impressed by the detail and "understanding" this model has.
undefined a year ago
[deleted]
jeffbee a year ago
I asked for a simple scene and it drew in the exact same AI girl that every text-to-image model wants to draw, same face, same hair, so generic that a Google reverse image search pulls up thousands of the exact same AI girl. No variety of output at all.
ks2048 a year ago
Is there a good site that compares text-to-image models - showing a bunch of examples of text w/ output on each model?
nirav72 a year ago
Are there any projects that allow for easy setup and hosting Flux locally? Similar to SD projects like InvokeAI or a1111
- vunderba a year ago
  The answer is it really depends on your hardware, but the nice thing is that you can split out the text encoder when using ComfyUI. On a 24gb VRAM card I can run the Q8_0 GGUF version of flux-dev with the T5 FP16 text encoder. The Q8_0 gguf version in particular has very little visual difference from the original fp16 models. A 1024x1024 image takes about 15 seconds to generate.
- undefined a year ago
  [deleted]
- nickthegreek a year ago
  Forge
  https://github.com/lllyasviel/stable-diffusion-webui-forge
  https://www.reddit.com/r/StableDiffusion/comments/1esxkk8/ho...
- doctorpangloss a year ago
  https://huggingface.co/docs/diffusers/main/en/api/pipelines/...
  It's about 6 lines of Python.
- sophrocyne a year ago
  Invoke is model agnostic, and supports Flux, including quantized versions.
- minimaxir a year ago
  Flux is more weird than old SD projects since Flux is extremely resource dependant and won't run on most hardware.
  waffletower a year ago
  Doesn't take a lot of effort to get Flux dev/schnell to run on 3090s unquantized, but I agree that 24gb is the consumer GPU memory limit and there are many with less than that. Flux runs great on modern Mac hardware as well, if you have at least 32gb of unified memory.
  stoobs a year ago
  I'm running Flux dev fine on a 3080 10GB, unquantised, on windows the nvidia drivers have a function to let it spill over into system ram. It runs a little slower, but it's not a deal-breaker unlike nvidia's pricing and power requirements at the moment
  zamadatix a year ago
  What are you using to run it? When I run Flux Dev in Windows using comfy on a 4090 (24 GB) sometimes it all crashes because it runs out of VRAM when I'm doing too much other stuff.
  waffletower a year ago
  Not a good reference for windows -- I use HuggingFace APIs on cog/docker deployments in Linux. I needed to use `PYTORCH_NO_CUDA_MEMORY_CACHING=1 -e PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True` envvars to eliminate memory errors on the 3090s. When I run on the Mac there is enough memory not to require shenanigans. Runs approximately as fast as the 3090s but the 3090s heat my basement and the Mac heats my face.
  drcongo a year ago
  Really? I tried using it in ComfyUI on my Mac Studio, failed, went searching for answers and all I could find said that something something fp8 can't run on a Mac, so I moved on.
  zamadatix a year ago
  If you're looking for a prebuilt "no tinkering" solution https://diffusionbee.com/ is an open source app (Github link at the bottom of the page if you want to see the code) which has a built in button to import Flux models at the bottom of the home screen.
  liuliu a year ago
  I usually don't want to comment on these, but: DiffusionBee's repo https://github.com/divamgupta/diffusionbee-stable-diffusion-... don't have any updates for 9 months except regular binary releases. There is no source code available for their recent builds. I think it is a bit unfair to say it is open-source app at this point given you are using a binary probably far different from the repo.
  drcongo a year ago
  Thanks, I'll take a look.
  waffletower a year ago
  I should have qualified that I run Flux.1 dev and schnell on a Mac via HuggingFace and pytorch, and am not knowledgeable about ComfyUI support for these models. The code required is pretty tiny though.
  ziddoap a year ago
  People have Flux running on pretty much everything at this point, assuming you are comfortable waiting 3+ minutes for a 512x512 image.
  I managed to get it running on an old computer with a 2060 Super, taking ~1.5 minutes per image gen. People are generating on a 1080.
  Filligree a year ago
  The GGUF quantisations do run on most recent hardware, albeit at increasingly concerning quality tradeoffs.
  tripplyons a year ago
  I haven't noticed any quality degradation with the 8-bit GGUF for Flux Dev, but I'm sure the smaller quantizations perform worse.
- leumon a year ago
  Using comfyui with the official flux workflow is easy and works nicely. comfy can also be used via API.
- Mashimo a year ago
  I use InvokeAI to run flux.dev and flux.schnell.
- pdntspa a year ago
  DrawThings on Mac
kindkang2024 a year ago
I really enjoy its service. It's promising for UI design. My advocacy website pages' UI design was bootstrapped using it. It is quite good for developers without much design ability.
Ironically, I am afraid to type the website out and will keep it unknown here. My account could be suspended because of this. It had already reached -1 karma. It's better to keep my account alive.
nubinetwork a year ago
I tried using schnell, it won't fit in a 16gb GPU, and I couldn't get it to run on CPU.
- washadjeffmad a year ago
  Try an fp8: https://huggingface.co/Kijai/flux-fp8/
- TobTobXX a year ago
  I've sucessfully run schnell and dev on a 12G GPU. They do take 40s/60s repectively, but it works. I used ComfyUI and didn't have to tweak anything.
Mashimo a year ago
Oh neat. I wonder if they also improve .schnell and .dev soon. That would be nice :)
undefined a year ago
[deleted]
jchw a year ago
The generated images look impressive of course but I can't help but be mildly amused by the fact that the prompt for the second example image insists strongly that the image should say 1.1:
> ... photo with the text "FLUX 1.1 [Pro]", ..., must say "1.1", ...
...And of course, it does not.
- thisisnotauser a year ago
  [flagged]
ionwake a year ago
Sorry to be a noob, but how does this relate to fastflux.ai which seems to work great and creates an image in less than a second? Is this a new model on a slower host?
simeon989 a year ago
[flagged]
simeon989 a year ago
[flagged]
pieter2222 a year ago
[flagged]
pieter2222 a year ago
[flagged]
jamesc4 a year ago
[flagged]
jamesc4 a year ago
[flagged]
davidddef223 a year ago
[flagged]
pieter2222 a year ago
[flagged]
jamesc4 a year ago
[flagged]
bobdenver8008 a year ago
[flagged]
basitsoomro123 a year ago
[flagged]
melvinmelih a year ago
In case you want to try it out without hassling with the API, I've set up a free tool for it so you can try it out on WhatsApp: https://instatools.ai/products/fluxprovisions