« BackGemini 3.1 Prodeepmind.googleSubmitted by PunchTornado 2 hours ago
  • simonw an hour ago

    Title is incorrect, should be "Gemini 3.1 Pro".

    Pretty great pelican: https://gist.github.com/simonw/03a755865021739a3659943a22c12...

    • embedding-shape an hour ago

      It's an excellent demonstration of the main issue I have with the Gemini family of models, they always go "above and beyond" to do a lot of stuff, even if I explicitly prompt against it. In this case, most of the SVG ends up consisting not just of a bike and a pelican, but clouds, a sun, a hat on the pelican and so much more.

      Exactly the same thing happens when you code, it's almost impossible to get Gemini to not do "helpful" drive-by-refactors, and it keeps adding code comments no matter what I say. Very frustrating experience overall.

      • mullingitover 33 minutes ago

        > it's almost impossible to get Gemini to not do "helpful" drive-by-refactors

        Just asking "Explain what this service does?" turns into

        [No response for three minutes...]

        +729 -522

        • cowmoo728 19 minutes ago

          it's also so aggressive about taking out debug log statements and in-progress code. I'll ask it to fill in a new function somewhere else and it will remove all of the half written code from the piece I'm currently working on.

          • chankstein38 11 minutes ago

            I ended up adding a "NEVER REMOVE LOGGING OR DEBUGGING INFO, OPT TO ADD MORE OF IT" to my user instructions and that has _somewhat_ fixed the problem but introduced a new problem where, no matter what I'm talking to it about, it tries to add logging. Even if it's not a code problem. I've had it explain that I could setup an ESP32 with a sensor so that I could get logging from it then write me firmware for it.

          • kylec 23 minutes ago

            "I don't know what did it, but here's what it does now"

          • enobrev an hour ago

            I have the same issue. Even when I ask it to do code-reviews and very explicitly tell it not to change files, it will occasionally just start "fixing" things.

            • tyfon 15 minutes ago

              I was using gemini antigravity in opencode a few weeks ago before they started banning everyone for that and I got into the habit of writing "do x, then wait for instructions".

              That helped quite a bit but it would still go off on it's own from time to time.

              • gavinray an hour ago

                Do you have Personalization Instructions set up for your LLM models?

                You can make their responses fairly dry/brief.

                • embedding-shape 40 minutes ago

                  I'm mostly using them via my own harnesses, so I have full control of the system prompts and so on. And no matter what I try, Gemini keeps "helpfully" adding code comments every now and then. With every other model, "- Don't add code comments" tends to be enough, but with Gemini I'm not sure how I could stop the comments from eventually appearing.

                  • WarmWash 32 minutes ago

                    I'm pretty sure it writes comments for itself, not for the user. I always let the models comment as much as they want, because I feel it makes the context more robust, especially when cycling contexts often to keep them fresh.

                    There is a tradeoff though, as comments do consumer context. But I tend to pretty liberally dispense of instances and start with a fresh window.

                    • embedding-shape 27 minutes ago

                      > I'm pretty sure it writes comments for itself, not for the user

                      Yeah, that sounds worse than "trying to helpful". Read the code instead, why add indirection in that way, just to be able to understand what other models understand without comments?

                  • metal_am 32 minutes ago

                    I'd love to hear some examples!

                    • gavinray 30 minutes ago

                      I use LLM's outside of work primarily for research on academic topics, so mine is:

                        Be a proactive research partner: challenge flawed or unproven ideas with evidence; identify inefficiencies and suggest better alternatives with reasoning; question assumptions to deepen inquiry.
                  • zengineer 37 minutes ago

                    true, whenever I ask Gemini to help me with a prompt for generating an image of XYZ, it generates the image.

                  • SoKamil 5 minutes ago

                    It seems they trained the model to output good svg’s.

                    In their blog post[1], first use case they mention is svg generation. Thus, it might not be any indicator at all anymore.

                    [1] https://blog.google/innovation-and-ai/models-and-research/ge...

                    • AmazingTurtle 3 minutes ago

                      At this point, the pelican benchmark became so widely used that there must be high quality pelicans in the dataset, I presume. What about generating an okapi on a bicycle instead?

                      • MrCheeze 5 minutes ago

                        Does anyone understand why LLMs have gotten so good at this? Their ability to generate accurate SVG shapes seems to greatly outshine what I would expect, given their mediocre spatial understanding in other contexts.

                        • WarmWash 41 minutes ago

                          Less pretty and more practical, it's really good at outputting circuit designs as SVG schematics.

                          https://www.svgviewer.dev/s/cqGvPGML

                          • InitialLastName 19 minutes ago

                            I don't know what of this is the prompt and what was the output, but that's a pretty bad schematic (for both aesthetic and circuit-design reasons).

                            • svnt 3 minutes ago

                              Yes but you concede it is a schematic.

                            • 0_____0 22 minutes ago

                              that's pretty amazing for an LLM but as an EE, if my intern did this i would sigh inwardly and pull up some existing schematics for some brief guidance on symbol layout.

                            • sam_1421 an hour ago

                              Models are soon going to start benchmaxxing generating SVGs of pelicans on bikes

                              • cbsks 16 minutes ago

                                That’s Simon’s goal. “All I’ve ever wanted from life is a genuinely great SVG vector illustration of a pelican riding a bicycle. My dastardly multi-year plan is to trick multiple AI labs into investing vast resources to cheat at my benchmark until I get one.”

                                https://simonwillison.net/2025/Nov/13/training-for-pelicans-...

                                • embedding-shape an hour ago

                                  Soon? I'd be willing to bet it's been included in the training set at least 6 months by now. Not so obvious so it generates always perfect pelicans on bikes, but sufficiently for the "minibench" to be less useful today than in the past.

                                  • Imustaskforhelp 22 minutes ago

                                    Simon can you now unleash the thing where you once said that you can do permutations now so instead of pelican on bike if there are doubts of the model being trained on it, can you do something else as well my friend :d

                                    I love pokemon and in pokemon silver/gold, I think my favourite pokemon is ampharos (mareep third form)

                                    Ampharos has legs and I feel like one can intuitively imagine him riding a bicycle

                                    I tried this and here's the response. https://gist.github.com/SerJaimeLannister/0d91539f868ccbc88a...

                                    I think (its good?) but (also not good?), I really don't know but can observe some issues myself, I haven't compared it with other models in how others do with this task. Also hope pokemon might not sue me for this picture.

                                    It thought for 6 minutes tho and I think its the most time I have seen a model think fwiw in the web version of models at the very least which can be both good and bad imo :/

                                    Edit: Here's direct svg/img link as well https://gist.github.com/SerJaimeLannister/0d91539f868ccbc88a...

                                    To be honest I don't know what to make out of it without more such tests without more pokemons/animals (I might do this tomorrow) so right now I am just gonna share the image as-is.

                                    I have a test tomorrow so I am gonna be busy today but wish me luck for the test :]

                                  • jsheard an hour ago

                                    Simons been doing this exact test for nearly 18 months now, plenty of time for it to be integrated into benchmaxxing suites already.

                                    • stri8ted 33 minutes ago

                                      Exactly. As far as I'm concerned, the benchmark is useless. It's way too easy and rewarding to train on it.

                                      • Legend2440 6 minutes ago

                                        Y'all are way too skeptical, no matter what cool thing AI does you'll make up an excuse for how they must somehow be cheating.

                                        • pixl97 5 minutes ago

                                          I mean if you want to make your own benchmark, simply don't make it public and don't do it often. If your salamander on skis or whatever gets better with time it likely has nothing to do with being benchmaxxed.

                                    • Arcuru an hour ago

                                      Did you stop using the more detailed prompt? I think you described it here: https://simonwillison.net/2025/Nov/18/gemini-3/

                                      • infthi 39 minutes ago

                                        Wonder when will we get something other than a side view

                                        • mikepurvis 26 minutes ago

                                          That would be a especially challenging for vector output. I tried just now on ChatGPT 5.2 to jump straight to an image, with this prompt:

                                          "make me a cartoon image of a pelican riding a bicycle, but make it from a front 3/4 view, that is riding toward the viewer."

                                          The result was basically a head-on view, but I expect if you then put that back in and said, "take this image and vectorize it as an SVG" you'd have a much better time than trying to one-shot the SVG directly from a description.

                                          ... but of course, if that's so, then what's preventing the model from being smart enough to identify this workflow and follow it on its own to get the task completed?

                                        • bredren an hour ago

                                          What is that, a snack in the basket?

                                          • sigmar an hour ago

                                            "integrating a bicycle basket, complete with a fish for the pelican... also ensuring the basket is on top of the bike, and that the fish is correctly positioned with its head up... basket is orange, with a fish inside for fun."

                                            how thoughtful of the ai to include a snack. truly a "thanks for all the fish"

                                            • defen 10 minutes ago

                                              A pelican already has an integrated snack-holder, though. It wouldn't need to put it in the basket.

                                            • WarmWash an hour ago

                                              A fish for the road

                                            • steve_adams_86 an hour ago

                                              Ugh, the gears and chain don't mesh and there's no sprocket on the rear hub

                                              But seriously, I can't believe LLMs are able to one-shot a pelican on a bicycle this well. I wouldn't have guessed this was going to emerge as a capability from LLMs 6 years ago. I see why it does now, but... It still amazes me that they're so good at some things.

                                              • emp17344 27 minutes ago

                                                Is this capability “emergent”, or do AI firms specifically target SVG generation in order to improve it? How would we be able to tell?

                                                • 0_____0 20 minutes ago

                                                  next time you host a party, have people try to draw a bicycle on your whiteboard (you have a whiteboard in your house right? you should, anyway...)

                                                  human adults are generally quite bad at drawing them, unless they spend a lot of time actually thinking about bicycles as objects

                                                • HPsquared 31 minutes ago

                                                  And the left leg is straight while the right leg is bent.

                                                  EDIT: And the chain should pass behind the seat stay.

                                                • calny an hour ago

                                                  Great pelican but what’s up with that fish in the basket?

                                                  • coldtea an hour ago

                                                    It's a pelican. What do you expect a pelican to have in his bike's basket?

                                                    It's a pretty funny and coherent touch!

                                                    • embedding-shape 8 minutes ago

                                                      > What do you expect a pelican to have in his bike's basket?

                                                      Probably stuff it cannot fit in the gullet, or want there. I wouldn't expect a pelican to store fish there, that's for sure.

                                                    • gavinray an hour ago

                                                      Where else are cycling Pelican's meant to keep their fish?

                                                    • mohsen1 an hour ago

                                                      is there something in your prompt about hats? why the pelican always wearing a hat recently?!

                                                      • bigfishrunning an hour ago

                                                        At this point, i think maybe they're training on all of the previous pelicans, and one of them decided to put a hat on it?

                                                        Disclaimer: This is an unsubstantiated claim that i made up

                                                      • xnx an hour ago

                                                        Not even animated? This is 2026.

                                                        • readitalready an hour ago

                                                          Jeff Dean just posted an animated version: https://x.com/JeffDean/status/2024525132266688757

                                                          • benbreen 15 minutes ago

                                                            One underrated thing about the recent frontier models, IMO, is that they are obviating the need for image gen as a standalone thing. Opus 4.6 (and apparently 3.1 Pro as well) doesn't have the ability to generate images but it is so good at making SVG that it basically doesn't matter at this point. And the benefit of SVG is that it can be animated and interactive.

                                                            I find this fascinating because it literally just happened in the past few months. Up until ~summer of 2025, the SVG these models made was consistently buggy and crude. By December of 2026, I was able to get results like this from Opus 4.5 (Henry James: the RPG, made almost entirely with SVG): https://the-ambassadors.vercel.app

                                                            And now it looks like Gemini 3.1 Pro has vaulted past it.

                                                            • embedding-shape 7 minutes ago

                                                              > doesn't have the ability to generate images but it is so good at making SVG that it basically doesn't matter at this point

                                                              Yeah, since the invention of vector images, suddenly no one cares about raster images anymore.

                                                              Obviously not true, but that's how your comment reads right now. "Image" is very different from "Image", and one doesn't automagically replace the other.

                                                              • cachius 3 minutes ago

                                                                2025 that is

                                                              • bigfishrunning 44 minutes ago

                                                                That Ostrich Tho

                                                            • saberience an hour ago

                                                              I hope we keep beating this dead horse some more, I'm still not tired of it.

                                                            • takoid an hour ago

                                                              Shared this in the other Gemini Pro 3.1 thread (https://news.ycombinator.com/item?id=47074735) but wanted to share it here as well.

                                                              I just tested the "generate an SVG of a pelican riding a bicycle" prompt and this is what I got: https://codepen.io/takoid/pen/wBWLOKj

                                                              The model thought for over 5 minutes to produce this. It's not quite photorealistic (some parts are definitely "off"), but this is definitely a significant leap in complexity.

                                                              • onionisafruit 34 minutes ago

                                                                Good to see it wearing a helmet. Their safety team must be on their game.

                                                              • minimaxir an hour ago

                                                                Price is unchanged from Gemini 3 Pro: $2/M input, $12/M output. https://ai.google.dev/gemini-api/docs/pricing

                                                                Knowledge cutoff is unchanged at Jan 2025. Gemini 3.1 Pro supports "medium" thinking where Gemini 3 did not: https://ai.google.dev/gemini-api/docs/gemini-3

                                                                Compare to Opus 4.6's $5/M input, $25/M output. If Gemini 3.1 Pro does indeed have similar performance, the price difference is notable.

                                                                • rancar2 3 minutes ago

                                                                  If we don't see a huge gain on the long-term horizon thinking reflected with the Vendor-Bench 2, I'm not going to switch away from CC. Until Google can beat Anthropic on that front, Claude Code paired with the top long-horizon models will continue to pull away with full stack optimizations at every layer.

                                                                  • plaidfuji 18 minutes ago

                                                                    Sounds like the update is mostly system prompt + changes to orchestration / tool use around the core model, if the knowledge cutoff is unchanged

                                                                    • sigmar 14 minutes ago

                                                                      knowledge cutoff staying the same likely means they didn't do a new pre-train. We already knew there were plans from deepmind to integrate new RL changes in the post training of the weights. https://x.com/ankesh_anand/status/2002017859443233017

                                                                  • nickandbro an hour ago

                                                                    Does well on SVGs outside of "pelican riding on a bicycle" test. Like this prompt:

                                                                    "create a svg of a unicorn playing xbox"

                                                                    https://www.svgviewer.dev/s/NeKACuHj

                                                                    Still some tweaks to the final result, but I am guessing with the ARC-AGI benchmark jumping so much, the model's visual abilities are allowing it to do this well.

                                                                    • simonw an hour ago

                                                                      Interesting how it went a bit more 3D with the style of that one compared to the pelican I got.

                                                                      • andy12_ an hour ago

                                                                        I'm thinking now that as models get better and better at generating SVGs, there could be a point where we can use them to just make arbitrary UIs and interactive media with raw SVGs in realtime (like flash games).

                                                                        • nickandbro an hour ago

                                                                          Or quite literally a game where SVG assets are generated on the fly using this model

                                                                        • roryirvine 10 minutes ago

                                                                          On the other hand, creation of other vector image formats (eg. "create a postscript file showing a walrus brushing its teeth") hasn't improved nearly so much.

                                                                          Perhaps they're deliberately optimising for SVG generation.

                                                                        • tenpoundhammer 27 minutes ago

                                                                          In an attempt to get outside of benchmark gaming I had it make Platypus on a Tricycle. It's not as good as pelican on bicycle. https://www.svgviewer.dev/s/BiRht5hX

                                                                          • textlapse a few seconds ago

                                                                            To really confuse it, ask it to take that tricycle with the platypus on it to a car wash.

                                                                            • dinosor 24 minutes ago

                                                                              For a moment I assumed the output would look like Perry the Platipus from the Disney (I think?) show. It's suprising to me (as a layman) that a show with lots of media that would've made it to the training corpus didn't show up.

                                                                              • 0_____0 20 minutes ago

                                                                                that's better than i thought it would be

                                                                              • davidguetta 32 minutes ago

                                                                                Implementation and Sustainability Hardware: Gemini 3 Pro was trained using Google’s Tensor Processing Units (TPUs). TPUs are specically designed to handle the massive computations involved in training LLMs and can speed up training considerably compared to CPUs. TPUs often come with large amounts of high-bandwidth memory, allowing for the handling of large models and batch sizes during training, which can lead to better model quality. TPU Pods (large clusters of TPUs) also provide a scalable solution for handling the growing complexity of large foundation models. Training can be distributed across multiple TPU devices for faster and more efficient processing.

                                                                                So google doesn't use NVIDIA GPUs at all ?

                                                                                • lejalv 4 minutes ago

                                                                                  Bla bla bla yada sustainability yada often come with large better growing faster...

                                                                                  It's such an uninformative piece of marketing crap

                                                                                  • dekhn 8 minutes ago

                                                                                    When I worked there, there was a mix of training on nvidia GPUs (especially for sparse problems when TPUs weren't as capable), CPUs, and TPUs. I've been gone for a few years but I've heard a few anecdotal statements that some of their researchers have to use nvidia GPUs because the TPUs are busy.

                                                                                    • PunchTornado 27 minutes ago

                                                                                      no. only tpus

                                                                                      • paride5745 19 minutes ago

                                                                                        Another reason to use Gemini then.

                                                                                        Less impact on gamers…

                                                                                      • mijoharas 17 minutes ago

                                                                                        Gemini 3 is still in preview (limited rate limits) and 2.5 is deprecated (still live but won't be for long).[0]

                                                                                        Are Google planning to put any of their models into production any time soon?

                                                                                        Also somewhat funny that some models are deprecated without a suggested alternative(gemini-2.5-flash-lite). Do they suggest people switch to Claude?

                                                                                        [0] https://ai.google.dev/gemini-api/docs/deprecations

                                                                                        • andrewmutz 14 minutes ago

                                                                                          I agree completely. I don't know how anyone can be building on these models when all of them are either deprecated or not actually released yet. As someone who has production systems running on the deprecated models, this situation really causes me grief.

                                                                                        • the_duke an hour ago

                                                                                          Gemini 3 is pretty good, even Flash is very smart for certain things, and fast!

                                                                                          BUT it is not good at all at tool calling and agentic workflows, especially compared to the recent two mini-generations of models (Codex 5.2/5.3, the last two versions of Anthropic models), and also fell behind a bit in reasoning.

                                                                                          I hope they manage to improve things on that front, because then Flash would be great for many tasks.

                                                                                          • chermi an hour ago

                                                                                            You can really notice the tool use problems. They gotta get on that. The agent trend seems real, and powerful. They can't afford to fall behind on it.

                                                                                            • verdverm 37 minutes ago

                                                                                              I don't really have tool usage issues that I don't put under that doesn't follow system prompt instructions consistently

                                                                                              there are these times where it puts a prefix on all function calls, which is weird and I think hallucination, so maybe that one

                                                                                              3.1 hopefully fixes that

                                                                                            • anthonypasq an hour ago

                                                                                              yeah, it seems to me like Gemini is a little behind on the current RL patterns and also they dont seem interested in really creating a dedicated coding model. I think they have so much product surface (search, AI mode, gmail, youtube, chrome etc), they are prioritizing making the model very general. but who knows im just talking out of my ass.

                                                                                              • verdverm 39 minutes ago

                                                                                                These improvements are one of the things specifically called out on the submitted page

                                                                                                • spwa4 an hour ago

                                                                                                  In other words: they just need to motivate their employees while giving in to finance's demands to fire a few thousand every month or so ...

                                                                                                  And don't forget, it's not just direct motivation. You can make yourself indispensable by sabotaging or at least not contributing to your colleagues' efforts. Not helping anyone, by the way, is exactly what your managers want you to do. They will decide what happens, thank you very much, and doing anything outside of your org ... well there's a name for that, isn't there? Betrayal, or perhaps death penalty.

                                                                                                • Robdel12 an hour ago

                                                                                                  I really want to use google’s models but they have the classic Google product problem that we all like to complain about.

                                                                                                  I am legit scared to login and use Gemini CLI because the last time I thought I was using my “free” account allowance via Google workspace. Ended up spending $10 before realizing it was API billing and the UI was so hard to figure out I gave up. I’m sure I can spend 20-40 more mins to sort this out, but ugh, I don’t want to.

                                                                                                  With alllll that said.. is Gemini 3.1 more agentic now? That’s usually where it failed. Very smart and capable models, but hard to apply them? Just me?

                                                                                                  • phamilton 25 minutes ago

                                                                                                    > For those building with a mix of bash and custom tools, Gemini 3.1 Pro Preview comes with a separate endpoint available via the API called gemini-3.1-pro-preview-customtools. This endpoint is better at prioritizing your custom tools (for example view_file or search_code).

                                                                                                    It sounds like there was at least a deliberate attempt to improve it.

                                                                                                    • alpineman an hour ago

                                                                                                      100% agreed. I wish someone would make a test for how reliably the LLMs follow tool use instructions etc. The pelicans are nice but not useful for me to judge how well a model will slot into a production stack.

                                                                                                      • embedding-shape 28 minutes ago

                                                                                                        At first when I got started with using LLMs I read/analyzed benchmarks, looked at what example prompts people used and so on, but many times, a new model does best at the benchmark, and you think it'll be better, but then in real work, it completely drops the ball. Since then I've stopped even reading benchmarks, I don't care an iota about them, they always seem more misdirected than helpful.

                                                                                                        Today I have my own private benchmarks, with tests I run myself, with private test cases I refuse to share publicly. These have been built up during the last 1/1.5 years, whenever I find something that my current model struggles with, then it becomes a new test case to include in the benchmark.

                                                                                                        Nowadays it's as easy as `just bench $provider $model` and it runs my benchmarks against it, and I get a score that actually reflects what I use the models for, and it feels like it more or less matches with actually using the models. I recommend people who use LLMs for serious work to try the same approach, and stop relying on public benchmarks that (seemingly) are all gamed by now.

                                                                                                        • cdelsolar 24 minutes ago

                                                                                                          share

                                                                                                          • embedding-shape 9 minutes ago

                                                                                                            The harness? Trivial to build yourself, ask your LLM for help, it's ~1000 LOC you could hack together in 10-15 minutes.

                                                                                                            As for the test cases themselves, that would obviously defeat the purpose, so no :)

                                                                                                      • pdntspa an hour ago

                                                                                                        You can delete the billing from a given API key

                                                                                                        • Stevvo 33 minutes ago

                                                                                                          You could always use it through Copilot. The credits based billing is pretty simple without surprise charges.

                                                                                                          • surgical_fire 38 minutes ago

                                                                                                            May be very silly of me, but I avoid using Gemini on my personal Google account. I use it at work, because my employer provides it.

                                                                                                            I am scared some automated system may just decide I am doing something bad and terminate my account. I have been moving important things to Proton, but there are some stuff that I couldn't change that would cause me a lot of annoyance. It's not trivial to set up an alternative account just for Gemini, because my Google account is basically on every device I use.

                                                                                                            I mostly use LLMs as coding assistant, learning assistant, and general queries (e.g.: It helped me set up a server for self hosting), so nothing weird.

                                                                                                            • horsawlarway 44 minutes ago

                                                                                                              So much this.

                                                                                                              It's absolutely amazing how hostile Google is to releasing billing options that are reasonable, controllable, or even fucking understandable.

                                                                                                              I want to do relatively simple things like:

                                                                                                              1. Buy shit from you

                                                                                                              2. For a controllable amount (ex - let me pick a limit on costs)

                                                                                                              3. Without spending literally HOURS trying to understand 17 different fucking products, all overlapping, with myriad project configs, api keys that should work, then don't actually work, even though the billing links to the same damn api key page, and says it should work.

                                                                                                              And frankly - you can't do any of it. No controls (at best delayed alerts). No clear access. No real product differentiation pages. No guides or onboarding pages to simplify the matter. No support. SHIT LOADS of completely incorrect and outdated docs, that link to dead pages, or say incorrect things.

                                                                                                              So I won't buy shit from them. Period.

                                                                                                              • sciencejerk 23 minutes ago

                                                                                                                You think AWS is better?

                                                                                                                • 3form 17 minutes ago

                                                                                                                  Exact reason I used none of these platforms for my personal projects, ever.

                                                                                                              • himata4113 an hour ago

                                                                                                                use openrouter instead

                                                                                                              • spankalee 19 minutes ago

                                                                                                                I hope this works better than 3.0 Pro

                                                                                                                I'm a former Googler and know some people near the team, so I mildly root for them to at least do well, but Gemini is consistently the most frustrating model I've used for development.

                                                                                                                It's stunningly good at reasoning, design, and generating the raw code, but it just falls over a lot when actually trying to get things done, especially compared to Claude Opus.

                                                                                                                Within VS Code Copilot Claude will have a good mix of thinking streams and responses to the user. Gemini will almost completely use thinking tokens, and then just do something but not tell you what it did. If you don't look at the thinking tokens you can't tell what happened, but the thinking token stream is crap. It's all "I'm now completely immersed in the problem...". Gemini also frequently gets twisted around, stuck in loops, and unable to make forward progress. It's bad at using tools and tries to edit files in weird ways instead of using the provided text editing tools. In Copilot it, won't stop and ask clarifying questions, though in Gemini CLI it will.

                                                                                                                So I've tried to adopt a plan-in-Gemini, execute-in-Claude approach, but while I'm doing that I might as well just stay in Claude. The experience is just so much better.

                                                                                                                For as much as I hear Google's pulling ahead, Anthropic seems to be to me, from a practical POV. I hope Googlers on Gemini are actually trying these things out in real projects, not just one-shotting a game and calling it a win.

                                                                                                                • knollimar 9 minutes ago

                                                                                                                  Is the thinking token stream obfuscated?

                                                                                                                  Im fully immersed

                                                                                                                • timabdulla 16 minutes ago

                                                                                                                  Google tends to trumpet preview models that aren't actually production-grade. For instance, both 3 Pro and Flash suffer from looping and tool-calling issues.

                                                                                                                  I would love them for to eliminate these issues because just touting benchmark scores isn't enough.

                                                                                                                  • mixel an hour ago

                                                                                                                    Google seems to really pull ahead in this AI race. For me personally they offer the best deal and although the software is not quiet there compared to openai or anthropic (in regards to 1. web GUI, 2. agent-cli). I hope they can fix that in the future and I think once Gemini 4 or whatever launches we will see a huge leap again

                                                                                                                    • rubslopes 22 minutes ago

                                                                                                                      I don't understand this sentiment. It may hold true for other LLM use cases (image generation, creative writing, summarizing large texts), but when it comes to coding specifically, Google is *always* behind OpenAI and Anthropic, despite having virtually infinite processing power, money, and being the ones who started this race in the first place.

                                                                                                                      Until now, I've only ever used Gemini for coding tests. As long as I have access to GPT models or Sonnet/Opus, I never want to use Gemini. Hell, I even prefer Kimi 2.5 over it. I tried it again last week (Gemini Pro 3.0) and, right at the start of the conversation, it made the same mistake it's been making for years: it said "let me just run this command," and then did nothing.

                                                                                                                      My sentiment is actually the opposite of yours: how is Google *not* winning this race?

                                                                                                                      • hobofan a minute ago

                                                                                                                        > despite having virtually infinite processing power, money

                                                                                                                        Just because they have the money doesn't mean that they spend it excessively. OpenAI and Anthropic are both offering coding plans that are possibly severely subsidized, as they are more concerned with growth at all cost, while Google is more concerned with profitability. Google has the bigger warchest and could just wait until the other two run out of money rather than forcing the growth on that product line in unprofitable means.

                                                                                                                      • eknkc 37 minutes ago

                                                                                                                        I hope they fail.

                                                                                                                        I honestly do not wish Google to have the best model out there and be forced to use their incomprehensible subscription / billing / project management whatever shit ever again.

                                                                                                                        I don’t know what their stuff cost. I don’t know why would I use vertex or ai studio. What is included in my subscription what is billed per use.

                                                                                                                        I pray that whatever they build fails and burns.

                                                                                                                        • otherme123 12 minutes ago

                                                                                                                          They all suck. OpenAI ignores scanning limits and disabled routes in robots.txt, after a 429 "Too Many Requests" they retry the same url half a dozen of times from different IPs in the next couple of minutes, and they once DoS'ed my small VPS trying to do a full scan of sitemaps.xml in less than one hour, trying and retrying if any endpoint failed.

                                                                                                                          Google and others at least respects both robots.txt and 429s. They invested years scanning all the internet, so they can now train on what they have stored in their server. OpenAI seems to assume that MY resources are theirs.

                                                                                                                          • dybber 16 minutes ago

                                                                                                                            Eventually the models will be generally be so good that the competition moves from the best model to the best user experience and here I think we can expect others will win, e.g. Microsoft with GitHub and VS Code

                                                                                                                            • eknkc 13 minutes ago

                                                                                                                              That's my hope but Google has unlimited cash to throw at model development and can basically burn more cash can openai and anthropic combined. Might tip the scale in the long run.

                                                                                                                        • josalhor an hour ago

                                                                                                                          I speculated that 3 pro was 3.1... I guess I was wrong. Super impressive numbers here. Good job Google.

                                                                                                                          • refulgentis an hour ago

                                                                                                                            > I speculated that 3 pro was 3.1

                                                                                                                            ?

                                                                                                                            • josalhor 2 minutes ago

                                                                                                                              Sorry... I speculated that 3 deep think is 3.1 pro.. model names are confusing..

                                                                                                                          • Murfalo 15 minutes ago

                                                                                                                            I like to think that all these pelican riding a bicycle comments are unwittingly iteratively creating the optimal cyclist pelican as these comment threads are inevitably incorporated in every training set.

                                                                                                                            • alpineman 7 minutes ago

                                                                                                                              More like half of Google's AI team is hanging out on HN, and they can optimise for that outcome to get a good rep among the dev community.

                                                                                                                            • janalsncm 40 minutes ago

                                                                                                                              This model says it accepts video inputs. I asked it to transcribe a 5 second video of a digital water curtain which spelled “Boo Happy Halloween”, and it came back with “Happy” which wasn’t the first frame, but also is incomplete.

                                                                                                                              This kind of test is good because it requires stitching together info from the whole video.

                                                                                                                              • aabhay 34 minutes ago

                                                                                                                                It reads videos at 1fps by default. You have to set the video resolution to high in ai studio

                                                                                                                              • ArmandoAP an hour ago
                                                                                                                                • azuanrb 29 minutes ago

                                                                                                                                  The CLI needs work, or they should officially allow third-party harnesses. Right now, the CLI experience is noticeably behind other SOTA models. It actually works much better when paired with Opencode.

                                                                                                                                  But with accounts reportedly being banned over ToS issues, similar to Claude Code, it feels risky to rely on it in a serious workflow.

                                                                                                                                  • markerbrod an hour ago
                                                                                                                                    • dxbednarczyk an hour ago

                                                                                                                                      Every time I've used Gemini models for anything besides code or agentic work they lean so far into the RLHF induced bold lettering and bullet point list barf that everything they output reads as if the model was talking _at_ me and not _with_ me. In my Openclaw experiment(s) and in the Gemini web UI, I've specifically added instructions to avoid this type of behavior, but it only seemed to obey those rules when I reminded the model of them.

                                                                                                                                      For conversational contexts, I don't think the (in some cases significantly) better benchmark results compared to a model like Sonnet 4.6 can convince me to switch to Gemini 3.1. Has anyone else had a similar experience, or is this just a me issue?

                                                                                                                                      • augusto-moura an hour ago

                                                                                                                                        Gemini sounds less personal, but I think that is good. From my experience, the quality of response is much higher than ChatGPT or Grok, and it cites real sources. I want to have a mini-wikipedia response for my questions, not a friend's group chat response

                                                                                                                                        • staticman2 38 minutes ago

                                                                                                                                          I'm not familiar with Openclaw and but the trick to solve this would be to embed a style reminder at the bottom of each user message and ideally hide that from the user with the UI.

                                                                                                                                          This is how roleplay apps like Sillytavern customize the experience for power users by allowing hidden style reminders as part of the user message that accompany each chat message.

                                                                                                                                          • gavinray 41 minutes ago

                                                                                                                                            I have the opposite viewpoint:

                                                                                                                                            If a model doesn't optimize the formatting of its output display for readability, I don't want to read it.

                                                                                                                                            Tables, embedded images, use of bulleted lists and bold/italicizing etc.

                                                                                                                                            • InkCanon an hour ago

                                                                                                                                              I think they all output that bold lettering, point by point style output. I strongly suspect it's part of a synthetic data pipeline all these AI companies have, and it improves performance. Claude seems to be the least of them, but it will start writing code at the drop of a hat. What annoys me in Gemini is that it has a really strange tendency to come up with weird analogies, especially in Pro mode. You'll be asking it about something like red black trees and it'll say "Red Black Trees (The F1 of Tree Data Structures)".

                                                                                                                                              • markab21 an hour ago

                                                                                                                                                You just articulated why I struggle to personally connect with Gemini. It feels so unrelatable and exhausting to read its output. I prefer to read Opus/Deepseek/GLM over Gemini, Qwen and the open source GPT models. Maybe it is RLHF that is creating my distaste from using it. (I pay for Gemini; I should be using it more... but the outputs just bug me and feel more work to get actionable insight.)

                                                                                                                                                • verdverm 33 minutes ago

                                                                                                                                                  I have no issues adjusting gemini tone & style with system prompt content

                                                                                                                                                • onlyrealcuzzo an hour ago

                                                                                                                                                  We've gone from yearly releases to quarterly releases.

                                                                                                                                                  If the pace of releases continues to accelerate - by mid 2027 or 2028 we're headed to weekly releases.

                                                                                                                                                  • rubicon33 an hour ago

                                                                                                                                                    But actual progress seems to be slower. These modes are releasing more often but aren’t big leaps.

                                                                                                                                                    • gallerdude 25 minutes ago

                                                                                                                                                      We used to get one annual release which was 2x as good, now we get quarterly releases which are 25% better. So annually, we’re now at 2.4x better.

                                                                                                                                                      • minimaxir 16 minutes ago

                                                                                                                                                        Due to the increasing difficulty of scaling up training, it appears the gains are instead being achieved through better model training which appears to be working well for everyone.

                                                                                                                                                        • wahnfrieden 32 minutes ago

                                                                                                                                                          GPT 5.3 (/Codex) was a huge leap over 5.2 for coding

                                                                                                                                                      • zokier an hour ago

                                                                                                                                                        > Last week, we released a major update to Gemini 3 Deep Think to solve modern challenges across science, research and engineering. Today, we’re releasing the upgraded core intelligence that makes those breakthroughs possible: Gemini 3.1 Pro.

                                                                                                                                                        So this is same but not same as Gemini 3 Deep Think? Keeping track of these different releases is getting pretty ridiculous.

                                                                                                                                                        • WarmWash 37 minutes ago

                                                                                                                                                          Deep Think is a few 3.1 models working together. It was suspected last week that Deep Think was composed using the new 3.1 model.

                                                                                                                                                          • verdverm 35 minutes ago

                                                                                                                                                            3.1 == model

                                                                                                                                                            deep think == turning up thinking knob (I think)

                                                                                                                                                            deep research == agent w/ search

                                                                                                                                                          • 1024core 21 minutes ago

                                                                                                                                                            It's been hugged to death. I keep getting "Something went wrong".

                                                                                                                                                            • jcims an hour ago

                                                                                                                                                              Pelican on a bicycle in drawio - https://imgur.com/a/tNgITTR

                                                                                                                                                              (FWIW I'm finding a lot of utility in LLMs doing diagrams in tools like drawio)

                                                                                                                                                              • pqdbr 12 minutes ago

                                                                                                                                                                How are you prompting it to draw diagrams in drawio

                                                                                                                                                              • impulser_ an hour ago

                                                                                                                                                                Seems like they actually fixed some of the problems with the model. Hallucinations rate seems to be much better. Seems like they also tuned the reasoning maybe that were they got most of the improvements from.

                                                                                                                                                                • whynotminot 35 minutes ago

                                                                                                                                                                  The hallucination rate with the Gemini family has always been my problem with them. Over the last year they’ve made a lot of progress catching the Gemini models up to/near the frontier in general capability and intelligence, but they still felt very late 2024 in terms of hallucination rate.

                                                                                                                                                                  Which made the Gemini models untrustworthy for anything remotely serious, at least in my eyes. If they’ve fixed this or at least significantly improved, that would be a big deal.

                                                                                                                                                                • hsaliak an hour ago

                                                                                                                                                                  The eventual nerfing gives me pause. Flash is awesome. What we really want is gemini-3.1-flash :)

                                                                                                                                                                  • seizethecheese 42 minutes ago

                                                                                                                                                                    I use Gemini flash lite in a side project, and it’s stuck on 2.5. It’s now well behind schedule. Any speculation as to what’s going on?

                                                                                                                                                                    • foruhar 31 minutes ago

                                                                                                                                                                      Gemini-3.0-flash-preview came out right away with the 3.0 release and I was expecting 3.0-flash-lite before a bump on the pro model. I wonder if they have abandoned that part of the Pareto/price-performance.

                                                                                                                                                                    • quacky_batak an hour ago

                                                                                                                                                                      I’m keen to know how and where are you using Gemini.

                                                                                                                                                                      Anthropic is clearly targeted to developers and OpenAI is general go to AI model. Who are the target demographic for Gemini models? ik that they are good and Flash is super impressive. but i’m curious

                                                                                                                                                                      • jdc0589 an hour ago

                                                                                                                                                                        I use it as my main platform right now both for work/swe stuff, and person stuff. It works pretty well, they have the full suite of tools I want from general LLM chat, to notebookLM, to antigravity.

                                                                                                                                                                        My main use-cases outside of SWE generally involve the ability to compare detailed product specs and come up with answers/comparisons/etc... Gemini does really well for that, probably because of the deeper google search index integration.

                                                                                                                                                                        Also I got a year of pro for free with my phone....so thats a big part.

                                                                                                                                                                        • fatherwavelet 19 minutes ago

                                                                                                                                                                          I feel like Gemini 3 was incredible on non-software/coding research. I have learned so much systems biology the last two months it blows my mind.

                                                                                                                                                                          I had only started using Opus 4.6 this week. Sonnet it seems like is much better at having a long conversation with. Gemini is good for knowledge retrieval but I think Opus 4.6 has caught up. The biggest thing that made Gemini worth it for me the last 3 months is I crushed it with questions. I wouldn't have even got 10% of the Opus use that I got from Gemini before being made to slow down.

                                                                                                                                                                          I have a deep research going right now on 3.1 for the first time and I honestly have no idea how I am going to tell if it is better than 3.

                                                                                                                                                                          It seems like agentic coding Gemini wasn't as good but just asking it to write a function, I think it only didn't one shot what I asked it twice. Then fixed the problem on the next prompt.

                                                                                                                                                                          I haven't logged in to bother with chatGPT in about 3 months now.

                                                                                                                                                                          • hunta2097 an hour ago

                                                                                                                                                                            I use the Gemini web interface just as I would ChatGPT. They also have coding environment analogues of Claude-Code in Anti-gravity and Gemini-CLI.

                                                                                                                                                                            When you sign up for the pro tier you also get 2TB of storage, Gemini for workspace and Nest Camera history.

                                                                                                                                                                            If you're in the Google sphere it offers good value for money.

                                                                                                                                                                            • minimaxir an hour ago

                                                                                                                                                                              Gemini has an obvious edge over its competitors in one specific area: Google Search. The other LLMs do have a Web Search tool but none of them are as effective.

                                                                                                                                                                              • jug an hour ago

                                                                                                                                                                                I personally use it as my general purpose and coding model. It's good enough for my coding tasks most of the time, has very good and rapid web search grounding that makes the Google index almost feel like part of its training set, and Google has a family sharing plan with individual quotas for Google AI Pro at $20/month for 5 users which also includes 2 TB in the cloud. Family sharing is a unique feature for Gemini 3 Flash Thinking (300 prompts per day and user) & Pro (100 prompts per day and user).

                                                                                                                                                                                • dinosor an hour ago

                                                                                                                                                                                  I find gemini to be the best at travel planning and for story telling of geographical places. For a road trip, I tried all three mainstream providers and I liked Gemini (also personal preference because Gemini took a verbose approach instead of bullet points from others) for it's responses, ways it discovered stories about places I wanted to explore, places it suggested for me and things it gave me to consider those places in the route.

                                                                                                                                                                                  • dekhn 38 minutes ago

                                                                                                                                                                                    I am a professional software developer who has been programming for 40 years (C, C++, Python, assembly, any number of other languages). I work in ML (infrastructure, not research) and spent a decade working at Google.

                                                                                                                                                                                    In short, I consider Gemini to be a highly capable intern (grad student level) who is smarter and more tenacious than me, but also needs significant guidance to reach a useful goal.

                                                                                                                                                                                    I used Gemini to completely replace the software stack I wrote for my self-built microscope. That includes:

                                                                                                                                                                                    writing a brand new ESP32 console application for controlling all the pins of my ESP32 that drives the LED illuminator. It wrote the entire ESP-IDF project and did not make any major errors. I had to guide with updated prompts a few times but otherwise it wrote the entire project from scratch and ran all the build commands, fixing errors along the way. It also easily made a Python shared library so I can just import this object in my Python code. It saved me ~2-3 days of working through all the ESP-IDF details, and did a better job than I would have.

                                                                                                                                                                                    writing a brand new C++-based Qt camera interface (I have a camera with a special SDK that allows controlling strobe and trigger and other details. It can do 500FPS). It handled all the concurrency and message passing details. I just gave it the SDK PDF documentation for the camera (in mixed english/chinese), and asked it to generate an entire project. I had to spend some time guiding it around making shared libraries but otherwise it wrote the entire project from scratch and I was able to use it to make a GUI to control the camera settings with no additional effort. It ran all the build commands and fixed errors along the way. Saved me another 2-3 days and did a better job than I could have.

                                                                                                                                                                                    Finally, I had it rewrite the entire microscope stack (python with qt) using the two drivers I described above- along with complex functionality like compositing multiple images during scanning, video recording during scanning, mesaurement tools, computer vision support, and a number of other features. This involved a lot more testing on my part, and updating prompts to guide it towards my intended destination (fully functional replacement of my original self-written prototype). When I inspect the code, it definitely did a good job on some parts, while it came up with non-ideal solutions for some problems (for example, it does polling when it could use event-driven callbacks). This saved literally weeks worth of work that would have been a very tedious slog.

                                                                                                                                                                                    From my perspective, it's worked extremely well: doing what I wanted in less time than it would take me (I am a bit of a slow programmer, and I'm doing this in hobby time) and doing a better job (With appropriate guidance) than I could have (even if I'd had a lot of time to work on it). This greatly enhances my enjoyment of my hobby by doing tedious work, allowing me to spend more time on the interesting problems (tracking tardigrades across a petri dish for hours at a time). I used gemini pro 3 for this- it seems to do better than 2.5, and flash seemed to get stuck and loop more quickly.

                                                                                                                                                                                    I have only lightly used other tools, such as ChatGPT/Codex and have never used Claude. I tend to stick to the Google ecosystem for several reasons- but mainly, I think they will end up exceeding the capabilities of their competitors, due to their inherent engineering talent and huge computational resources. But they clearly need to catch up in a lot of areas- for example, the VS Code Gemini extension has serious problems (frequent API call errors, messed up formatting of code/text, infinite loops, etc).

                                                                                                                                                                                    • mehagar 39 minutes ago

                                                                                                                                                                                      I use Gemini for personal stuff such as travel planning and research on how to fix something, which product to buy, etc. My company has as Pro subscription so I use that instead of ChatGPT.

                                                                                                                                                                                      • epolanski an hour ago

                                                                                                                                                                                        Various friends of mine work in non-technology companies (banking, industries, legal, Italy) and in pretty much all of them there's Gemini enterprise + NotebookLM.

                                                                                                                                                                                        In all of them the approach is: this is the solution, now find problems you can apply it to.

                                                                                                                                                                                        • esafak an hour ago

                                                                                                                                                                                          I'd use it for planning, knowledge, and anything visual.

                                                                                                                                                                                          • verdverm 20 minutes ago

                                                                                                                                                                                            I use gemini for everything because I trust google to keep the data I send them safe, because they know how to run prod at scale, and they are more environmentally friendly than everyone else (tpu,us-central1).

                                                                                                                                                                                            This includes my custom agent / copilot / cowork (which uses vertex ai and all models therein). This is where I do more searching now (with genAi grounding) I'm about to work on several micro projects that will hold Ai a little differently.

                                                                                                                                                                                            All that being said, google Ai products suck hard. I hate using every one of them. This is more a reflection on the continued degradation of PM/Design at Big G, from before Ai, but accellationally worse since. I support removing Logan from the head of this shit show

                                                                                                                                                                                            disclaimer: long time g-stan, not so stan any more

                                                                                                                                                                                          • pawelduda an hour ago

                                                                                                                                                                                            It's safe to assume they'll be releasing improved Gemini Flash soon? The current one is so good & fast I rarely switch to pro anymore

                                                                                                                                                                                            • derac 9 minutes ago

                                                                                                                                                                                              When 3 came out they mentioned that flash included many improvements that didn't make it into pro (via an hn comment). I imagine this release includes those.

                                                                                                                                                                                            • eric15342335 an hour ago

                                                                                                                                                                                              My first impression is that the model sounds slightly more human and a little more praising. Still comparing the ability.

                                                                                                                                                                                              • makeavish an hour ago

                                                                                                                                                                                                Great model until it gets nerfed. I wish they had a higher paid tier to use non nerfed model.

                                                                                                                                                                                                • Mond_ 38 minutes ago

                                                                                                                                                                                                  Bad news, John Google told me they already quantized it immediately after the benchmarks were done and it sucks now.

                                                                                                                                                                                                  I miss when Gemini 3.1 was good. :(

                                                                                                                                                                                                  • spyckie2 an hour ago

                                                                                                                                                                                                    I think there is a pattern it will always be nerfed the few weeks before launching a new model. Probably because they are throwing a bunch of compute at the new model.

                                                                                                                                                                                                    • makeavish 39 minutes ago

                                                                                                                                                                                                      Yeah maybe that but atleast let us know about this Or have dynamic limits? Nerfing breaks trust. Though I am not sure if they actually nerf it intentionally. Haven't heard from any credible source. I did experience in my workflow though.

                                                                                                                                                                                                    • xnx an hour ago

                                                                                                                                                                                                      What are you talking about?

                                                                                                                                                                                                    • johnwheeler 15 minutes ago

                                                                                                                                                                                                      I know Google has anti-gravity but do they have anything like Claude code as far as user interface terminal basically TUI?

                                                                                                                                                                                                    • matrix2596 an hour ago

                                                                                                                                                                                                      Gemini 3.1 Pro is based on Gemini 3 Pro

                                                                                                                                                                                                      • skerit an hour ago

                                                                                                                                                                                                        Lol, and this line:

                                                                                                                                                                                                        > Geminin 3.1 Pro can comprehend vast datasets

                                                                                                                                                                                                        Someone was in a hurry to get this out the door.

                                                                                                                                                                                                      • LZ_Khan an hour ago

                                                                                                                                                                                                        biggest problem is that it's slow. also safety seems overtuned at the moment. getting some really silly refusals. everything else is pretty good.

                                                                                                                                                                                                        • throwaw12 10 minutes ago

                                                                                                                                                                                                          Can we switch from Claude Code to Google yet?

                                                                                                                                                                                                          Benchmarks are saying: just try

                                                                                                                                                                                                          But real world could be different

                                                                                                                                                                                                          • naiv an hour ago

                                                                                                                                                                                                            ok , so they are scared that 5.3 (pro) will be released today/tomorrow and blow it out of the water and rushed it while they could still reference 5.2 benchmarks.

                                                                                                                                                                                                            • PunchTornado an hour ago

                                                                                                                                                                                                              I don't think models blow other models anymore. We have the big 3 which are neck to neck in most benchmarks and the rest. I doubt that 5.3 will blow the others.

                                                                                                                                                                                                              • scld 40 minutes ago

                                                                                                                                                                                                                easy now

                                                                                                                                                                                                            • mustaphah an hour ago

                                                                                                                                                                                                              Google is terrible at marketing, but this feels like a big step forward.

                                                                                                                                                                                                              As per the announcement, Gemini 3.1 Pro score 68.5% on Terminal-Bench 2.0, which makes it the top performer on the Terminus 2 harness [1]. That harness is a "neutral agent scaffold," built by researchers at Terminal-Bench to compare different LLMs in the same standardized setup (same tools, prompts, etc.).

                                                                                                                                                                                                              It's also taken top model place on both the Intelligence Index & Coding Index of Artificial Analysis [2], but on their Agentic Index, it's still lagging behind Opus 4.6, GLM-5, Sonnet 4.6, and GPT-5.2.

                                                                                                                                                                                                              ---

                                                                                                                                                                                                              [1] https://www.tbench.ai/leaderboard/terminal-bench/2.0?agents=...

                                                                                                                                                                                                              [2] https://artificialanalysis.ai

                                                                                                                                                                                                              • saberience an hour ago

                                                                                                                                                                                                                Benchmarks aren't everything.

                                                                                                                                                                                                                Gemini consistently has the best benchmarks but the worst actual real-world results.

                                                                                                                                                                                                                Every time they announce the best benchmarks I try again at using their tools and products and each time I immediately go back to Claude and Codex models because Google is just so terrible at building actual products.

                                                                                                                                                                                                                They are good at research and benchmaxxing, but the day to day usage of the products and tools is horrible.

                                                                                                                                                                                                                Try using Google Antigravity and you will not make it an hour before switching back to Codex or Claude Code, it's so incredibly shitty.

                                                                                                                                                                                                                • mustaphah 41 minutes ago

                                                                                                                                                                                                                  That's been my experience too; can't disagree. Still, when it comes to tasks that require deep intelligence (esp. mathematical reasoning [1]), Gemini has consistently been the best.

                                                                                                                                                                                                                  [1] https://arxiv.org/abs/2602.10177

                                                                                                                                                                                                                  • gregorygoc 42 minutes ago

                                                                                                                                                                                                                    What’s so shitty about it?

                                                                                                                                                                                                                • PunchTornado an hour ago

                                                                                                                                                                                                                  The biggest increase is LiveCodeBench Pro: 2887. The rest are in line with Opus 4.6 or slightly better or slightly worse.

                                                                                                                                                                                                                  • shmoogy an hour ago

                                                                                                                                                                                                                    but is it still terrible at tool calls in actual agentic flows?

                                                                                                                                                                                                                  • jeffbee an hour ago

                                                                                                                                                                                                                    Relatedly, Gemini chat seems to be if not down then extremely slow.

                                                                                                                                                                                                                    • sergiotapia an hour ago

                                                                                                                                                                                                                      To use in OpenCode, you can update the models it has:

                                                                                                                                                                                                                          opencode models --refresh
                                                                                                                                                                                                                      
                                                                                                                                                                                                                      Then /models and choose Gemini 3.1 Pro

                                                                                                                                                                                                                      You can use the model through OpenCode Zen right away and avoid that Google UI craziness.

                                                                                                                                                                                                                      ---

                                                                                                                                                                                                                      It is quite pricey! Good speed and nailed all my tasks so far. For example:

                                                                                                                                                                                                                          @app-api/app/controllers/api/availability_controller.rb 
                                                                                                                                                                                                                          @.claude/skills/healthie/SKILL.md 
                                                                                                                                                                                                                      
                                                                                                                                                                                                                          Find Alex's id, and add him to the block list, leave a comment 
                                                                                                                                                                                                                          that he has churned and left the company. we can't disable him 
                                                                                                                                                                                                                          properly on the Healthie EMR for now so 
                                                                                                                                                                                                                          this dumb block will be added as a quick fix.
                                                                                                                                                                                                                      
                                                                                                                                                                                                                      Result was:

                                                                                                                                                                                                                          29,392 tokens
                                                                                                                                                                                                                          $0.27 spent
                                                                                                                                                                                                                      
                                                                                                                                                                                                                      So relatively small task, hitting an API, using one of my skills, but a quarter. Pricey!
                                                                                                                                                                                                                      • gbalduzzi 37 minutes ago

                                                                                                                                                                                                                        I don't see it even after refresh. Are you using the opencode-gemini-auth plugin as well?

                                                                                                                                                                                                                        • sergiotapia 23 minutes ago

                                                                                                                                                                                                                          No I am not just vanilla OpenCode. I do have OpenCode Zen credits, and I did opencode login whatever their command is to auth against opencode itself. Maybe that's the reason I see these premium models.

                                                                                                                                                                                                                      • nautilus12 39 minutes ago

                                                                                                                                                                                                                        Ok, why don't you work on getting 3.0 out of preview first? 10 min response time is pretty heinous

                                                                                                                                                                                                                        • mucai82 19 minutes ago

                                                                                                                                                                                                                          I agree, according to Googles terms you are not allowed to use the preview model for production use cases. And 3.0 has been in preview for a loooong time now :(

                                                                                                                                                                                                                        • dude250711 an hour ago

                                                                                                                                                                                                                          I hereby allow you to release models not at the same time as your competitors.

                                                                                                                                                                                                                          • sigmar an hour ago

                                                                                                                                                                                                                            It is super interesting that this is the same thing that happened in November (ie all labs shipping around the same week 11/12-11/23).

                                                                                                                                                                                                                            • zozbot234 21 minutes ago

                                                                                                                                                                                                                              They're just throwing a big Chinese New Year celebration.

                                                                                                                                                                                                                          • ChrisArchitect an hour ago