• bee_rider 3 hours ago

    The two images shown in the article using the new method are sort of… stylized or slightly cartoonish in a way that the images they generated without using their method are not. Their images also have a “perfectly framed, looking straight at the camera,” which looks a little artificial. The images not using their method have a more natural look (although, obviously, they have the issue with the duplicated subject).

    I wonder if it is an unavoidable result of their method, or if it is just a little issue (of course it is hard to get infinite compute as an academic, maybe they just need to train more. Is that a thing? I don’t AI).

    • BugsJustFindMe 3 hours ago

      Cartoonish output is a problem across the board. If you explicitly ask Dall-E for a "photograph" of something, you will very often get a result that looks like a cartoonified illustration. Prompt writers resort to specifying exact camera models and lenses to try to constrain the process.

      • undefined 2 hours ago
        [deleted]
        • adamanonymous an hour ago

          There are fine tuned models out there that can generate near photo-realistic results. The base SD models and those offered by the major AI service sites have a more stylized look to them. Probably partially to work on a wider array of prompts that may include non photorealistic subjects, and partially for safety.

      • whywhywhywhy 2 hours ago

        Nothing here seems that impressive and none of the ratios shown are deviating that much from what anything post SDXL can just do anyway.

        Might have been impressed if some extreme letterbox or vertical banner style extreme portrait was shown but everything shown here works fine in SDXL and especially Flux and the cat image doesn't even feature a press conference or journalists

        • mattstir an hour ago

          The original paper [0] this article is based on raises a few questions for me. It compares the authors' new technique against StableDiffusion but fails to specify which version of SD they're using for that comparison. It doesn't mention how example outputs were chosen (were they cherry-picked?). For non-square images, they seem to have specifically chosen resolutions that the other models weren't trained to output (e.g., 384 x 512) without also including ones that they were trained on (e.g., 896 x 1152). I wonder how this new technique would compare with all of that accounted for.

          [0] https://openaccess.thecvf.com/content/CVPR2024/papers/Haji-A...

          • notum 3 hours ago

            Just using "cropped" as a negative prompt eliminates this issue entirely on my end and produces same results as per their owl example in SDXL.

            • undefined 2 hours ago
              [deleted]
              • mhog_hn 3 hours ago

                Any diffusion models out there that work well for generating stylized graphics? Think the stuff on your typical SaaS website

                • GaggiX 2 hours ago

                  Ideogram 2

                  If you want something open source, SDXL or Flux with the right LoRA.

              • GaggiX 2 hours ago

                The aspect ratio problem was solved by NovelAI when they trained SD v1.4 on images with different aspect ratios using a technique they call "Aspect Ratio Bucketing", and after that it became commonly used in the final stage of training.

                https://blog.novelai.net/novelai-improvements-on-stable-diff...

                • gwern 2 hours ago

                  It was also solved by the even easier approach of aspect ratio conditioning, where you just pass in the dimensions of the crop to the NN like SDXL: https://arxiv.org/pdf/2307.01952#page=3&org=stability

                  • GaggiX an hour ago

                    How does this replace "Aspect Ratio Bucketing"? Are they padding the smaller images and masking the attention?

                • refulgentis 3 hours ago

                  The problem as described was solved eons ago. I'm honestly struggling to remember when this was an issue. Certainly pre SD 1.5, maybe 2021?

                  I assume something got lost in translation to PR.

                  • NBJack 3 hours ago

                    1.5 still has this issue, particularly with specific subjects (i.e. the owl) any time you step significantly beyond the stock resolution (i.e. 1024x512). SDXL, while more stable, can also suffer from this.

                    The trouble is really the "window" by which the model operates in.

                    • refulgentis 3 hours ago

                      I use 1.5 hundreds of times a day outside this resolution, it must be the subjects I'm using. And that you mention it, SD XL was awful at it.

                    • isoprophlex 3 hours ago

                      That's academia for you...

                      • refulgentis 3 hours ago

                        I was being polite and shading towards this common interpretation HN has of academic PR, the article contains a quite lengthy technical description.

                    • bongodongobob 2 hours ago

                      What aspect ratio problem? I played around with my midjourney account just now and it flawlessly works with extreme aspect ratios.

                      • undefined an hour ago
                        [deleted]