Comments Page - Meissonic, High-Resolution Text-to-Image Synthesis on consumer graphics cards

mysteria 15 minutes ago
Interesting how pretty much all the example images look like renders/paintings as opposed to photographs. Maybe that's what it's trained on?
fngjdflmdflg 4 hours ago
>Meissonic, with just 1B parameters, offers comparable or superior 1024×1024 high-resolution, aesthetically pleasing images while being able to run on consumer-grade GPUs with only 8GB VRAM without the need for any additional model optimizations. Moreover, Meissonic effortlessly generates images with solid-color backgrounds, a feature that usually demands model fine-tuning or noise offset adjustments in diffusion models.
This looks really cool. Also nice to see another architecture being used for image generation besides diffusion. It seems like every NLP problem can be solved with transformers now: text generation/understanding, image generation/understanding, translation, OCR. Perhaps llama 4/5 will have image generation as well. eidt: llama 3.2 already has image editing, they probably just don't want to release an image generator for other reasons.
jensenbox an hour ago
The images in the PDF are amazing.