• antirez a day ago

    Very good move. In my experience, for system programming at least, GPT 5.4 xhigh is vastly superior to Claude Opus 4.6 max effort. I ran many brutal tests, including reconstructing for QEMU the SCSI controller (not longer accessible) of a SVSY UNIX of the early 90s used in a 386. Side by side, always re-mirroring the source trees each time one did a breakthrough in the implementation. Well, GPT 5.4 single handed did it all, while Opus continued to take wrong paths. The same for my Redis bug tracking and development. But 200$ is too much for many people (right now, at least: the reality is that if frontier LLMs are not democratized, we will end paying like a house rent to a few providers), and also while GPT 5.4 is much stronger, it is slower and less sharp when the thing to do is simple, so many people went for Claude (also because of better marketing and ethical concerns, even if my POV is different on that side: both companies sell LLM models with similar capabilities and similar internal IP protection and so forth, to me they look very similar in practical terms). This will surely change things, and many people will end with a Claude 5x account + a Codex 5x account I bet.

    • dweekly a day ago

      GPT 5.4 is the surly physics PhD post-doc who slowly and angrily sits in a basement to write brilliant, undocumented, uncommented code that encapsulates a breakthrough algorithm.

      Opus 4.6 is the L5 new hire SWE keen to prove their chops and quickly turn out totally reasonable code with putatively defensible reasons for doing it that way (that are sometimes tragically wrong) and then catch an after-work yoga class with you.

      • pdntspa a day ago

        Who replies to you with fucking emoji brainrot

        • ponector a day ago

          You are absolutely right!

          • Gud 17 hours ago

            You can tell it to be no nonsense

          • fragmede a day ago

            > and then catch an after-work yoga class with you.

            That's cute, but do you mean something concrete with this, aka are there some non-coding prompting you use it for that you're referring to with that or is it simply a throwaway line about L5 SWEs (at a FAANG).

            (FWIW, I find myself using ChatGPT for non-coding prompting for some reason, like random questions like if oil is fungible and not Claude, for some reason.)

            • dghlsakjg a day ago

              It’s an analogy about the “personalities” of the models.

              They are saying that Claude is more of a team player and conformist. It isn’t really much deeper than that.

              • joncrane a day ago

                I think the point they are trying to make is the golden retriever vibe/energy you get from Claude gives "after work yoga."

              • simianwords a day ago

                GPT is also cautious and Defensive but opus is agreeable.

              • Tiberium a day ago

                Thanks for confirming my impressions, it's been like 4 months now that I've arrived at the same conclusions. GPT models are just better at any kind of low-level work: reverse engineering including understanding what the decompiled code/assembly does, renaming that decompiled code (functions/types), any kind of C/C++, way more reliable security research (Opus will find way more, but most will turn out to be false positives). I've had GPT create non-trivial custom decompilers for me for binaries built with specific compilers (it's a much simpler task than what IDA Pro/Ghidra are doing but still complex), and modify existing Java decompilers.

                Regarding speed, I don't use xhigh that often, and surprisingly for me GPT 5.4 high is faster than Claude 4.6 Opus high (unless you enable fast mode for Opus).

                Of course I still use Opus for frontend, for some small scripts, and for criticizing GPT's code style, especially in Python (getattr).

                • antirez a day ago

                  In the SCSI controller work I mentioned, a very big part of the work was indeed reasoning about assembly code and how IRQs and completion of DMAs worked and so forth. Opus, even if TOOLS.md had the disassembler and it was asked to use it many times, didn't even bothered much. GPT 5.4 did instead a very great reverse engineering work, also it was a lot more sensible to my high level suggestions, like: work in that way to make more isolated progresses and so forth.

                  • amluto a day ago

                    GPT 5.4 is remarkably good at figuring out machine code using just binutils. Amusingly, I watched it start downloading ghidra, observe that the download was taking a while, and then mostly succeed at its assignment with objdump :)

                  • beering a day ago

                    Codex also gives you a lot more usage for $20/mon than Claude, so there’s not also that fear that high or xhigh reasoning will eat up all your quota. It really comes down to whether you want to try to save some time or not. (I default to xhigh because it’s still fast enough for me.)

                  • Asyne a day ago

                    +1 to this, I've found GPT/Codex models consistently stronger in engineering tasks (such as debugging complex, cross-systems issues, concurrency problems, etc).

                    I use both OpenAI and Anthropic models, though for different purposes, what surprises me is how underrated GPT still feels (or, alternatively, how overhyped Anthropic models can be) given how capable it is in these scenarios. There also seems to be relatively little recognition of this in the broader community (like your recent YouTube video). My guess is that demand skews toward general codegen rather than the kind of deep debugging and systems work where these differences really show.

                    • mediaman a day ago

                      It's surprising to me how much LLM "personality" seems to matter to people, more than actual capability.

                      I do turn to Anthropic for ideation and non-tech things. But I find little reason to use it over codex for engineering tasks. Sometimes for planning, but even there, 5.4 is more critical of my questionable ideas, and will often come up with simpler ways to do things (especially when prompted), which I appreciate.

                      And I don't do hard-tech things! I've chosen a b2b field where I can provide competent products for a niche that is underserved and where long term relationships matter, simply because I'm not some brilliant engineer who can completely reinvent how something is done. I'm not writing kernels or complex ML stacks. So I don't really understand what everyone is building where they don't see the limits of Opus. Maybe small greenfield projects with few users.

                      • randomNumber7 a day ago

                        > I'm not some brilliant engineer who can completely reinvent how something is done

                        With an honest evaluation of your own capabilities you are already far above average. Also its hard to see the insane amount of work that often was necessary to invent the brilliant stuff and most people can not shit that out consistently.

                        • fcarraldo a day ago

                          > It's surprising to me how much LLM "personality" seems to matter to people, more than actual capability. > I do turn to Anthropic for ideation and non-tech things. But I find little reason to use it over codex for engineering tasks. Sometimes for planning, but even there, 5.4 is more critical of my questionable ideas, and will often come up with simpler ways to do things (especially when prompted), which I appreciate.

                          Aren't you saying here that the LLM personality matters to you, too? Being critical of you is a personality attribute, not a capabilities one.

                          • lo_zamoyski a day ago

                            Not necessarily. Criticism is the analysis, evaluation, or judgment of the qualities of something. This is a matter of intellectual act. However, you could say that being habitually critical can be partly a result of "personality" or temperament.

                            (Of course, strictly speaking, LLMs have neither temperament, "personality", nor intellect, but we understand these terms are used in an analogical or figurative fashion.)

                        • beering a day ago

                          Or rather, it’s hard to ask everyone to side-by-side compare both products on their use cases. So the choice really comes down to word-of-mouth even though their use cases may be better served by Codex.

                          • dvfjsdhgfv a day ago

                            I use codex for cleaning after cloude and it always finds so many bugs, some of them quite obvious.

                          • thisisit a day ago

                            My non scientific tests has been that GPT models follow the prompts literally. Every time I give it an example, it uses the example in literal sense instead of using it to enhance its understanding of the ask. This is a good thing if I want it to follow instructions but bad if I want it to be creative. I have to tell it that the examples I gave are just examples and not to be used in output. I feel comfortable using it when I have everything mapped out.

                            Claude on the other hand can be creative. It understands that examples are for reference purposes only. But there are times it decides to off on a tangent on its own and decide not to follow instructions closely. I find it useful for bouncing off ideas or test something new,

                            The other thing I notice is Claude has slightly better UI design sensibilities even if you don’t give instructions. GPT on the other hand needs instructions otherwise every UI element will be so huge you need to double scroll to find buttons.

                            • veber-alex a day ago

                              This is also what I noticed.

                              GPT doesn't know how to get creative, you need to tell it exactly what to do and what code you want it to write.

                              For Claude you can be more general and it will look up solutions for you outside of the scope you gave it.

                              I presonaly prefer Claude.

                              • sixothree a day ago

                                I think you might benefit from the "superpower" plugin. Add the word "brainstorm" before your prompt and it does a little bit better at figuring out how you want things.

                              • postalcoder a day ago

                                What I like most about gpt coding models is how predictable of a lever that thinking effort is.

                                Xhigh will gather all the necessary context. low gathers the minimum necessary context.

                                That doesn’t work as well with me for Opus. Even at max effort it’ll overlook files necessary to understanding implementations. It’s really annoying when you point that out and you get hit with an”you’re absolutely right”.

                                Codex isn’t the greatest one shot horse in the race but, once you figure out how to harness it, it’s hard to go back to other models.

                                • munksbeer 11 hours ago

                                  > right now, at least: the reality is that if frontier LLMs are not democratized, we will end paying like a house rent to a few providers

                                  This part of your comment has slipped through but is very worrying for me. I _think_ we're passing the point now where programmers are accepting that LLMs writing code are the real deal. Lots of antagonism along the way, but the reality is these things are good, and getting better all the time.

                                  What this means in reality, in my opinion, is that if you're an independent programmer, or smaller company trying to compete with others to earn a living, you're almost certainly going to have to use coding agents, which means your competitiveness in the market is going to be gated by the big model providers until we have more options. If you somehow get banned from a few of them, which seems like it can happen through no fault of your own, you're going to be seriously negatively impacted.

                                  That's quite worrying having gatekeepers to our industry where it was previously in our own hands.

                                  • osti a day ago

                                    Yup I've mentioned this in another thread, I got gpt 5.4xhigh to improve the throughout of a very complex non typical CUDA kernel by 20x. This was through a combination of architecture changes and then do low level optimizations, it did the profiling all by itself. I was extremely impressed.

                                    • bob1029 a day ago

                                      GPT5.4 with any effort level is scary when you combine it with tricks like symbolic recursion. I actually had to reduce the effort level to get the model to stop trying to one shot everything. I struggled to come up with BS test cases it couldn't dunk in some clever way. Turning down the reasoning effort made it explore the space better.

                                      • rolls-reus a day ago

                                        can you explain what you mean by symbolic recursion tricks in this context?

                                        • bob1029 a day ago

                                          The model can call a copy of itself as a tool (i.e., we maintain actual stack frames in the hosting layer). Explicit tools are made available: Call(prompt) & Return(result).

                                          The user's conversation happens at level 0. Any actual tool use is only permitted at stack depths > 0. When the model calls the Return tool at stack depth 0 we end that logical turn of conversation and the argument to the tool is presented to the user. The user can then continue the conversation if desired with all prior top level conversation available in-scope.

                                          It's effectively the exact same experience as ChatGPT, but each time the user types a message an entire depth-first search process kicks off that can take several minutes to complete each time.

                                          • krackers 16 hours ago

                                            How is this different from a standard tool-call agentic loop, or subagents?

                                            • bob1029 16 hours ago

                                              Each stack frame has its own isolated context. This pushes the token pressure down the stack. The top level conversation can go on for days in this arrangement. There is no need for summarization or other tricks.

                                              • krackers 15 hours ago

                                                Is this related to the paper on Recursive Language Models? I remember it mentioned something similar about "symbolic recursion", but the way you describe it makes it sound too simple, why is there an entire paper about it?

                                                • bob1029 15 hours ago

                                                  The RLM paper did inspire me to try it. This is where the term comes from. "Symbolic" should be taken to mean "deterministic" or "out of band" in this context. A lot of other recursive LLM schemes rely on the recursion being in the token stream (i.e.. "make believe you have a call stack and work through this problem recursively"). Clearly this pales in comparison to actual recursion with a real stack.

                                      • nealmueller 6 hours ago

                                        Price change is ChatGPT not Codex, you may be mixing them up, Codex (for coding) remains $200

                                        • SunshineTheCat a day ago

                                          1000%. I have been running claude's work through codex for about a week now and it's insane the number of mistakes it catches. Not really sure why I've been doing this, just interesting to watch I guess.

                                          Not to mention a billion times more usage than you get with claude, dollar for dollar.

                                          • sixothree 2 minutes ago

                                            Funny, I've been doing the same thing. I've also been giving them both the same task and seeing who does a better job.

                                            I think it's all of this controversy around usage limits and model nerfing that made me start doing this.

                                            In the end though, I _much_ prefer working with claude because it understands the task at hand so much better and I feel like I understand the results better. It's just that codex is doing a better job at the actual coding lately.

                                            • scrollop a day ago

                                              It's widely reported that opus has been greatly reduced for a number of weeks since Mythos was released internally

                                            • zozbot234 a day ago

                                              The $100/mo giving access to GPT Pro (with reduced usage) is a nice counter to the just teased Claude Mythos. But GPT 5.4 xhigh being able to perform that kind of low-level reconstruction task is very impressive already.

                                              • sho_hn a day ago
                                                • aerhardt a day ago

                                                  I completely agree with you on both the technical and ethical reasoning.

                                                  Thank you for speaking out. I think it's important that reputable engineers like you do so. The Claude gang gaslighting is unhinged right now. It would be none of my concern but I have to deal with it in the real world - my customers are susceptible to these memes. I'm sure others have to deal with similar IRL consequences, too.

                                                  • TacticalCoder 20 hours ago

                                                    I use Claude Code / Anthropic models but...

                                                    > I ran many brutal tests, including reconstructing for QEMU the SCSI controller (not longer accessible) of a SVSY UNIX of the early 90s used in a 386.

                                                    QEMU is one project that, for a variety of reasons, said that atm they simply refuse any code written by a LLM. Is this just as a test? Or just for you? Or do you think QEMU shall accept that patch?

                                                  • mrdependable a day ago

                                                    It's interesting seeing all the ChatGPT users in this thread, knowing what we know about OpenAI. Either they don't care about what OpenAI does, don't know their reputation, or feel like their use is too insignificant to matter.

                                                    • Yizahi 11 hours ago

                                                      Absolutely not surprising. Just ask HN users what browser are they using and the answer will be Chrome or Chrome clone in 99% of cases. I even got a reply once along the lines "why do you use Firefox?". I was at a loss for words.

                                                      I also observe exact same pattern in two different countries among experienced IT workers. They mostly don't care at all about any non-tech implications of the services or employees they are using. Creepto, gambling, tax evasion, supporting monopolies, etc. - all fair game.

                                                      PS: I'm guilty of the same too, in other areas. But at least I'm selfaware about my transgressions.

                                                      • BAHKA 7 hours ago

                                                        GPT user here ;) And Chrome user too :-D I can tell you straight away why I choose Chrome (or Chromium-based browsers) over Firefox. In fact it is very simple reason - speed. Just run Speedometer 3.1 and observe. On my machine Chrome produces around 45, Firefox stucks at 34. MotionMark - the difference is like x3. I have to use very heavy websites on a daily basis (like Salesforce) and I can literally feel the difference.

                                                      • burnte a day ago

                                                        If Sam Altman told me what time it was I'd check my watch and probably still not believe him.

                                                        • mootothemax a day ago

                                                          For what it’s worth, I cancelled my ChatGPT subscription, and every time I try debugging a Linux system issue, I feel sad that Claude is sooooooo confidently bad at it.

                                                          Claude is noticeably poor for my use case on this particular issue. That said, I imagine I’m not alone in refusing to continue paying OpenAI. We’re in for a wild ride.

                                                          • randomNumber7 a day ago

                                                            Do you know game theory? If you look at it through this perspective this doesn't sound like a good strategy.

                                                            Basically the classical prisoner dilemma. The other devs with less moral can then outperform you.

                                                            It could be a valid strategy if you can increase your crediblity with this relinquishment.

                                                            • mootothemax a day ago

                                                              > Do you know game theory?

                                                              Never heard of it. The food there good?

                                                              > The other devs with less moral can then outperform you.

                                                              I long for the days where it’s only my moral compass holding me back.

                                                              • kakacik a day ago

                                                                Life is more than just empty status games and money hoarding at (almost) all cost. In fact, a good life lived well (TM) is anything but that.

                                                                But I write this on mostly US forum full of faangs and similar so i dont expect strong agreement.

                                                                • selfhoster11 11 hours ago

                                                                  I am on this very forum as an explicit effort to counterbalance that very view. You have my strong agreement.

                                                                  • jemmyw a day ago

                                                                    A lot of folks here will be startup types though, and while there is the idea that you'll make it big, I think day to day people work at startups for the satisfaction.

                                                                    • randomNumber7 a day ago

                                                                      Someone who is never rational is equally bad as someone who claims there is nothing else in humans.

                                                                • monkeyisland a day ago

                                                                  Could you spell it out? I pay a $20/m OpenAI subscription and I haven't read the reasons why I might want to stop.

                                                                  • freely0085 a day ago
                                                                    • rcbdev 17 hours ago

                                                                      Paywall.

                                                                    • Yizahi 11 hours ago

                                                                      Sam is a pathological liar. He is also trying to build a monopoly which is never a good thing. And finally he is trying to get humans into a binary choice where either there is massive unemployment due to overly successful LLMs resulting in a crisis, or there is a crisis because of the failed LLM expectations.

                                                                      tl;dr: If you need to pay for LLM for work, at least don't pay to the market leader.

                                                                    • Apr1026 21 hours ago

                                                                      There is a saying in India that roughly translates to - Everyone is naked in this bath house.

                                                                      All AI players are 50 shades of evil and are only concerned about their profits.

                                                                      Instead of virtue signaling it's best to use the tools that work best for your needs.

                                                                      • Yizahi 10 hours ago

                                                                        You are halfway to the correct answer. You are correctly recognizing that evil comes into an infinite spectrum of severity and many actors are evil at different levels at the same time. Now take the next step and recognize that fighting said evil also comes into many different levels of severity. It is not just either clear 100% win or do nothing and immediately resign without a fight. There are many intermediate levels of fight in between two maxima. For example as a small step, one can continue using and paying to LLM corpos, but at least avoid the worst one of them, which OAI objectively is today.

                                                                        • arw0n 12 hours ago

                                                                          This is moral relativism at its finest, and just plain wrong. I'm not willing to go so far as to call Anthropic a good player, but they are surprisingly often willing to put their money where their mouth is. Obviously everything can be interpreted as a PR move as well, but we just lack context to know true intentions. Personally I have repeatedly sold being a good org as a PR move, it is the easiest way to do good in a capitalist environment. The success of such a sales pitch significantly relies on the moral values (or lack thereof) of the other decision makers at the company.

                                                                          With Project Glasswing for example, I'm impressed at how generally well thought out it is, and very much appreciate that they donate a lot of money to OSS. I would have liked them to extend the Project to smaller players as well, but power centralisation is an inherent problem of AI, not something that is unique to Anthropic.

                                                                          • nickvec 20 hours ago

                                                                            There are varying degrees of evil though. Saying that Anthropic and OpenAI are both "evil" to the same degree is disingenuous (in my opinion) given Altman's sociopathic behavior.

                                                                            • joquarky 7 hours ago

                                                                              You would have to practically live in a hippie commune to avoid buying products and services from companies led by sociopaths.

                                                                          • MattGaiser a day ago

                                                                            What has the tech industry ever resisted on moral or reputational grounds?

                                                                            • sonofhans a day ago

                                                                              This is sarcastically-stated but an excellent point, and an honest answer will come up with a vanishingly small list. We geeks may think we care about Important Things, but our industry cares for nothing but money and power — morality is a hindrance to the accumulation of those.

                                                                              • pdntspa a day ago

                                                                                Worse, they exploit our curiosity and open-mindedness to build their empires for them. Which we willingly do because cool shiny shit.

                                                                                Nerd-sniping as a weapon of oppression

                                                                                • rcbdev 17 hours ago

                                                                                  > cool shiny shit

                                                                                  You mean absurdly high compensation for very comfortable, low-stress office work? People work to feed their families, not just because working is 'cool'.

                                                                                  • MattGaiser a day ago

                                                                                    A lot of it is simply that they are far more open to the idea of curiousity as having value than most people.

                                                                                  • Throaway199999 a day ago

                                                                                    llm made this post?

                                                                                    • NewsaHackO a day ago

                                                                                      Because of the em-dash? Unfortunately, some writing hipsters created this "uh actually we were writing emdashes first, it's dramatic increase of use since llm proliferation in the 2020s shouldn't mean we can't use it!" movement. This has lead to purposeful use of emdashes to bait people to call them lllms. You can tell because the spaces around it most likely is because they had to copy and paste it from somewhere else as they (like most humans on non macs) don't actually know how to write an emdash otherwise.

                                                                                      • sonofhans a day ago

                                                                                        Wow, that’s some pretty farfetched speculation. I’ve been using double-dash as punctuation since I started writing on a computer in the 1980s. I like that MacOS connects them for me.

                                                                                        Consider for a moment how different your assumptions are from reality. Can you learn from that?

                                                                                        • recursive a day ago
                                                                                    • abound a day ago

                                                                                      One example I can think of is Google + Project Maven [1], where Google was partnering with the DoD but "withdrew in 2018 after internal protests". Though they've since partnered with the DoD on other initiatives [2].

                                                                                      [1] https://en.wikipedia.org/wiki/Project_Maven

                                                                                      [2] https://www.reuters.com/business/autos-transportation/us-dep...

                                                                                      • MattGaiser a day ago

                                                                                        That is probably the most notable example and in the end, those few still lost.

                                                                                      • selfhoster11 11 hours ago

                                                                                        There was Lavabit, though that's an example of just one such event.

                                                                                        Edit: and to some extent Apple, at least in the past.

                                                                                        • burnte a day ago

                                                                                          Maybe it's time to start.

                                                                                        • yrds96 21 hours ago

                                                                                          I don't think people paying for AI at this moment are concerned about moral or ethics.

                                                                                          • Devasta a day ago

                                                                                            The difference between Silicon Valley and Wall Street is that Wall Street knows they are lying when they justify the awful things they do in the name of enriching themselves.

                                                                                            • rs_rs_rs_rs_rs a day ago

                                                                                              What's with the holier-than-thou attitude? Why do you think you're better than someone using chatgpt?

                                                                                              • simianwords a day ago

                                                                                                People love to roleplay as activists because it gives their life some meaning and illusion of control

                                                                                                • archagon a day ago

                                                                                                  And people love to roleplay as nihilists because it means they don't have to be responsible for anything.

                                                                                                • Morromist a day ago

                                                                                                  If an AI company has done unethical things do you think it is inappropriate to discuss that? Take Grok: among other things it created sexualized images of underaged women without their consent, not by accident but as a feature. Is that just something you want to ignore? In response the people in charge merely restricted the feature to paid subscribers instead of removing it.

                                                                                                  Do you think people who mention grok creating CSAM is a holier-than-thou attitude? Do you not think the people who ignore that are worse than other people?

                                                                                                  • ThalesX a day ago

                                                                                                    I for one am appalled at TCP/IP because it facilitates so much unethical behavior. I of course am holier than thou because I do not ignore this and am a voice that raises awareness. I shall not be silenced!

                                                                                                    • user34283 a day ago

                                                                                                      I don't see a comment in this thread concretely discussing said unethical things.

                                                                                                      Not sure why you felt the need to switch the topic to Grok. About its nudification incident, it seems a bit far stretched to say that malicious actors bypassing its safety controls was not an accident.

                                                                                                      Initially, the image features were restricted to paying subscribers to prevent abuse by anonymous actors; this obviously happened while they were tightening safety controls to stop abuse.

                                                                                                      If you're going to bring up that old topic, at least try to get the facts straight.

                                                                                                      • Morromist a day ago

                                                                                                        I switched to grok because its a very cut and dry case of an ai company having poor ethics.

                                                                                                        To me it seems a LOT of a stretch to think that the people behind grok belived their safty controls worked, but you can belive that if you wish. Deepfakes of non-consenting adults were trending on X all the time, elon even appears to have shared them himself, which is pretty bad even if they're all just adults, and I'm sure you belive that they belived the AI could tell the difference between an underage person and an adult perfectly, although it seems clear they didn't test it very much.

                                                                                                        • jonny_eh a day ago

                                                                                                          to grok or from?

                                                                                                  • somenameforme a day ago

                                                                                                    I assume in any sort of thread on a topic like this there is going to be inorganic activity. These companies are all fighting rather hard to try to gain marketshare, potentially worth $trillions, with a product fully capable of producing endless reasonably compelling content to populate an account, a website, or any other basic proof of identity one might ever want.

                                                                                                    It's probably never been the case that plurality of views meant anything since online is a bubble to begin with, filtered by endless biases wherever we happen to be reading, making it an even more fringe bubble, but the advent of AI has pushed it all over the edge to the point that perceived pluralities are just completely and utterly meaningless. Somewhat depressing for a one who enjoys online chat as a pasttime, but it's the reality of the world now.

                                                                                                    • rs_rs_rs_rs_rs a day ago

                                                                                                      Yeah yeah yeah, everyone's a bot expect you with all the right opinions...

                                                                                                      • somenameforme 10 hours ago

                                                                                                        This issue is independent of topic or side. Astroturfing is real. For instance you obviously don't just take Amazon reviews at face value. In the past doing such things in social media, including forums like this, was much more difficult because you need to generate an entire persona around an account to make it not an immediately obvious inorganic account.

                                                                                                        And so the cost:reward there was relatively poor leaving it to things like militaries and governments to carry it out for influence campaigns and what not. But LLMs have now completely changed the game. You can easily create an arbitrarily large number of passably believable personas and backstories, autonomously, with no real limitations on scale.

                                                                                                        This is obviously going to be abused when the stakes are sufficiently high. And in this case we're talking about a market that these companies likely believe to be worth trillions of dollars. And they can likely even convince themselves that what they're doing isn't immoral pretty easily, in the same way they convinced themselves that letting their software be used to kill people by the all-so-ethical US military is perfectly cool. So why in the world wouldn't they 'inform people of the strengths of their product' on a wide scale?

                                                                                                        • Barbing a day ago

                                                                                                          hehe :)

                                                                                                          What's a good way to think about this? Because it does cross my mind about the billions of dollars at play - at the same time, I'm not a pessimist. I think my middle ground is kind of just the usual, taking things with a grain of salt. I mean, I chose to reply to this comment in good faith it's human to human, commenter to unpaid/unaffiliated commenter.

                                                                                                          I hope I keep that faith. I hope our billions of neighbors on the web enable me to keep that faith over the coming years. Definitely uncertain about the future of the web but want to love it like I've loved it 1990s-today. (Guess I should volunteer w/the EFF while job hunting, try for for-purpose jobs...)

                                                                                                          • jmull a day ago

                                                                                                            I don’t know why you dismiss it. There is plenty of astroturfing here, bots and otherwise.

                                                                                                            I believe the rule around here is to not assume everyone who disagrees with you or has opinions you don’t understand is a shill. Perhaps there’s a bit of that in the post you replied to, but to me seems mostly about mourning the loss of quality conversations online.

                                                                                                            Gotta say, I agree. Not that things were ever great, but it’s really in the crapper now.

                                                                                                      • patates a day ago

                                                                                                        5.4, in my own testing, was almost always ahead of Opus 4.6 for reviews and planning. I'm on plus plan on openai, so I couldn't test it so deeply. Anyone who had more experience on both could perhaps chime in? Pros/cons compared to Opus? I'm invested in Claude ecosystem but the recent quality and session limits decrease have me on the edge.

                                                                                                        • azuanrb a day ago

                                                                                                          Same for me. I'm on $20 plan for both and I use them both interchangeably. Similar "intelligence" imo. Just different way of doing things, that's all. But Claude is getting worse in terms of token usage so I've cancelled my plan last month.

                                                                                                          • conradkay a day ago

                                                                                                            Yeah it's probably a bit better overall. 5.4 is a month newer than Opus 4.6

                                                                                                            My guess is that 5.5 will come out soon and be significantly better so you'd want to be using Codex then, but then when Opus 5 comes out probably back to claude code

                                                                                                            Also 5.4 has fast mode, and higher usage limits since it's cheaper

                                                                                                            • psadauskas a day ago

                                                                                                              I use opencode, so can toggle between Claude and Codex fairly easily, and do so whenever one of them is having problems (until yesterday, that is, when Claude blocked opencode for good, and I cancelled my account). This means I'm using the same prompts and instructions for both.

                                                                                                              Personally, it seems like I have to redirect Opus/Sonnet much less often. GPT felt pretty "dense", it was more likely to ignore earlier instructions in the session, I had to remind it more often, and when I reviewed the code it produced I had to make more corrections that seemed obvious.

                                                                                                              Entirely subjective, but I also find I prefer Claude's "personality" to ChatGPT, but I couldn't point to any specific differences.

                                                                                                              • giwook a day ago

                                                                                                                Do you mind elaborating on your experience here?

                                                                                                                Just curious as I've often heard that Claude was superior for planning/architecture work while ChatGPT was superior for actual implementation and finding bugs.

                                                                                                                • patates a day ago

                                                                                                                  Claude makes more detailed plans that seem better if you just skim them, but when analyzed, has a lot of errors, usually.

                                                                                                                  It compensates for most during implementation if you make it use TDD by using superpower et al, or just telling it to do so.

                                                                                                                  GPT 5.4 makes more simple plans (compared to superpowers - a plugin from the official claude plugin marketplace - not the plan mode), but can better fill the details while implementing.

                                                                                                                  Plan mode in Claude Code got much better in the last months, but the lacking details cannot be compensated by the model during the implementation.

                                                                                                                  So my workflow has been:

                                                                                                                  Make claude plan with superpowers:brainstorm, review the spec, make updates, give the spec to gpt, usually to witness grave errors found by gpt, spec gets updates, another manual review, (many iterations later), final spec is written, write the plan, gpt finds mind boggling errors, (many iterations later), claude agent swarm implements, gpt finds even more errors, I find errors, fix fix fix, manual code review and red tests from me, tests get fixed (many iterations later) finally something usable with stylistic issues at most (human opinion)!

                                                                                                                  This happens with the most complex features that'd be a nightmare to implement even for the most experienced programmers of course. For basic things, most SOTA modals can one-shot anyway.

                                                                                                                  • giwook a day ago

                                                                                                                    Interesting. Have you ever had Claude re-review its plan after having it draft the original plan? Or do you give it to GPT right away to review?

                                                                                                                    Just curious as I'm trying to branch out from using Claude for everything, and I've been following a somewhat similar workflow to yours, except just having Claude review and re-review its plan (sometimes using different roles, e.g. system architect vs SWE vs QA eng) and it will similarly identify issues that it missed originally.

                                                                                                                    But now I'm curious to try this while weaving in more GPT.

                                                                                                              • satvikpendem a day ago

                                                                                                                The era of subsidization is over, it seems.

                                                                                                                For my money, on the code side at least, GitHub Copilot on VSCode is still the most cost effective option, 10 bucks for 300 requests gets me all I need, especially when I use OpenAI models which are counted as 1x vs Opus which is 3x. I've stopped using all other tools like Claude Code etc.

                                                                                                                • giwook a day ago

                                                                                                                  I use both GH Copilot as well as CC extensively and it does seem more economical, though I wonder how long this will last as I imagine Github has also been subsidizing LLM usage extensively.

                                                                                                                  FWIW it feels like GH Copilot is a cheaper version of OpenRouter but with trade-offs like being locked into VSCode and the Microsoft ecosystem overall. I already use VSCode though and otherwise I don't see much downside to using GH Copilot outside of that.

                                                                                                                  • treesknees a day ago

                                                                                                                    You’re not locked into vscode. There are plugins for other IDEs, and a ‘copilot’ cli tool very similar to Claude Code’s cli tool.

                                                                                                                    I also wouldn’t say you’re locked into Microsoft’s ecosystem. At work we just have skills that allow for interaction with Bitbucket and other internal tooling. You’re not forced to use GitHub at all.

                                                                                                                    • satvikpendem a day ago

                                                                                                                      I'm hopeful because Microsoft already has a partnership and owns much of OpenAI so can get their models at cost to host on Azure with they already do, so they can pass on the savings to the user. This is why Opus is 3x as expensive in Copilot, because Microsoft needs to buy API usage from Anthropic directly.

                                                                                                                      • treesknees a day ago

                                                                                                                        I don’t think it’s API costs. Their Sonnet 4.6 is just 1x premium request which matches the 1x cost of the various GPT Codex models.

                                                                                                                        • satvikpendem a day ago

                                                                                                                          Sonnet is the worse model though, therefore it's expected that it is cheaper, the comparison would be Opus and GPT. That Anthropic's worse model is the same request cost as the best OpenAI model is what I mean when talking about Microsoft flexing their partnership.

                                                                                                                      • WithinReason a day ago
                                                                                                                        • sassymuffinz a day ago

                                                                                                                          You could use something like [https://opencode.ai](OpenCode) which supports integration with Copilot.

                                                                                                                          • lossyalgo a day ago

                                                                                                                            > but with trade-offs like being locked into VSCode and the Microsoft ecosystem overall

                                                                                                                            You can use GH Copilot with most of Jetbrains IDEs.

                                                                                                                          • KellyCriterion a day ago

                                                                                                                            OT:

                                                                                                                            In general I view VS Code and VS.NET Community + SQL Server free universe as the most effective option :) I think these products are great actually.

                                                                                                                            • sassymuffinz a day ago

                                                                                                                              I tried Claude Code for a week straight recently to see what all the hype was about and while it pumped out a bunch of reasonable looking code and features I ended up feeling completely disconnected from my codebase and uncomfortable.

                                                                                                                              Cancelled the plan I had with them and happily went back to just coding like normal in VSCode with occasional dips into Copilot when a need arose or for rubber ducking and planning. Feels much better as I'm in full control and not trusting the magic black box to get it right or getting fatigue from reading thousands of lines of generated code.

                                                                                                                              Anyone who says they're able to review thousands of lines effectively that Claude might slop out in a day are lying to themselves.

                                                                                                                              • torben-friis a day ago

                                                                                                                                >Anyone who says they're able to review thousands of lines effectively that Claude might slop out in a day are lying to themselves.

                                                                                                                                The amount you can review before burning out is now the reasonable limit, for the same reason that a car is supposed to stay at the speed you can handle and not the max speed of the engine.

                                                                                                                                Of course, many people are secretly skipping reviews and some dare to publicly advocate for getting rid of them entirely.

                                                                                                                                • sassymuffinz a day ago

                                                                                                                                  > For the same reason that a car is supposed to stay at the speed you can handle and not the max speed of the engine.

                                                                                                                                  As we know with driving, sensible drivers stick to the speed limit most of the time, but there's a good percentage of knuckle draggers who just love speeding, some people get drunk, some they just drive the wrong way down the highway entirely. Either way it's usually the sensible people who end up suffering.

                                                                                                                                  • ethbr1 a day ago

                                                                                                                                    > The amount you can review before burning out is now the reasonable limit

                                                                                                                                    I realized this is the crux of our moment, because a variant of Amdahl's law applies to AI code gen.

                                                                                                                                    {time gained} = {time saved via gen AI} - {time spent in human review}

                                                                                                                                    There's no way that results in a positive number with 100% human review coverage, which means that human review coverage is headed to < 100% (ideally as low as possible).

                                                                                                                                    • torben-friis a day ago

                                                                                                                                      I'm not sure that's as certain, I think just by virtue or LLMs being better copypasting/integrated stackoverflow you can get a speed boost (prompts like "generate a similar test to this one checking condition X").

                                                                                                                                      The question is whether humans can sensibly judge the break even point and not generate faster than that. It's very easy to get lost in the woods and suddenly have a bunch of generated stuff you no longer grok.

                                                                                                                                      • ethbr1 10 hours ago

                                                                                                                                        But it boils down to 'can you review code faster than you write it'?

                                                                                                                                        To which, yes, everyone does in practice, but that's because we were relying on juniors and peers being human and making only human-style mistakes.

                                                                                                                                  • coreyburnsdev a day ago

                                                                                                                                    why not just use it to review your codebase/commits/prs? you don't have to let it write a bunch of code for you neccessarily.

                                                                                                                                    • recursive a day ago

                                                                                                                                      I wouldn't trust it. When I do check its work, I often find factual or corectness errors. No way it's going to be the last step of defense against its own mistakes. I mean for me. Other people seem to have more luck. I'm probably still holding it wrong.

                                                                                                                                      • sassymuffinz a day ago

                                                                                                                                        That's my point - it's great as a tool to talk something through or rubber duck it, but as soon as you just let it loose to slop out thousands of lines a day and never read them all you're really doing is filling your base with thousands of lines of technical debt.

                                                                                                                                      • bossyTeacher a day ago

                                                                                                                                        I think most people doing what is now called agentic development, aren't following most established dev methodologies and are to a great extent playing it by vibe.

                                                                                                                                        The codebase disconnect is real.

                                                                                                                                        We are like blue collar workers that need to hit the gym to maintain the body that our cavemen ancestors could maintain by doing their daily duties.

                                                                                                                                        Codebase gym sessions might become a thing.

                                                                                                                                        • bluegatty a day ago

                                                                                                                                          I don't like calling a posture 'ignorant' , but I think that's what we have here. I don't mean that as an insult.

                                                                                                                                          It's likely you didn't learn how to use the tool properly, and I'd suggest 'trying again' because not using AI soon will be tantamount to digging holes with shovels instead of using construction equipment. Yes, we still need our 'core skill's but, we're not going to be able to live without the leverage of AI.

                                                                                                                                          Yes - AI can generate slop, and probably too many Engineers do that.

                                                                                                                                          Yes - you can 'feel a loss of control' but that's where you have to find your comfort zone.

                                                                                                                                          It's generally a bad idea to produce 'huge amounts of code' - unless it's perfectly consistent with a design, and he architecture is derived from well-known conventions.

                                                                                                                                          Start by using it as an 'assistant' aka research, fill in all the extra bits, and get your testing going.

                                                                                                                                          You'll probably want to guide the architecture, and at least keep an eye on the test code.

                                                                                                                                          Then it's a matter of how much further 'up' you can go,

                                                                                                                                          There are few situations in which we should be 'accepting' large amounts of code, but some of it can be reviewed quickly.

                                                                                                                                          The AI, already now in 2026 can write better code than you at the algorithmic level - it will be tight, clean, 'by the book' and far lesss likley to have erros.

                                                                                                                                          It fails at the architectural and modular level still, that will probably change.

                                                                                                                                          The AI 'makes a clean cut' in the wood, tighter to the line than any carpenter could - like a power tool.

                                                                                                                                          A carpenter that does not use power tools is an 'artisnal craft person' , not really building functional things.

                                                                                                                                          This is the era of motor cars, there is really no option - I don't say that because I'm pro or anti anything, AI is often way over-hyped - that's something else entirely.

                                                                                                                                          It's like the web / cloud etc. it's just 'imminent'.

                                                                                                                                          So try again, experiment, stay open minded.

                                                                                                                                          • 000ooo000 a day ago

                                                                                                                                            >It's likely you didn't learn how to use the tool properly

                                                                                                                                            Yeah this gets rolled out every time. Boosters love to pitch LLM dev as some difficult, new 'skill' that must be learned, mastered, revered.

                                                                                                                                            • bluegatty 21 hours ago

                                                                                                                                              New tools and techniques have to be learned.

                                                                                                                                              The entire industry is moving towards integrating these new platforms, because they obviously work.

                                                                                                                                              It's perfectly reasonable to find problems with AI use, perfectly reasonable to 'not actually want' to use it, but it's basically irresponsible to reject the notion outright.

                                                                                                                                            • sassymuffinz a day ago

                                                                                                                                              Like I said I still use Copilot as needed, I just don't trust Claude to go off on its own and generate a mountain of technical debt that I can't 100% trust.

                                                                                                                                              To use your own analogy, there's plenty of carpenters still around for when someone needs something doing properly and bespoke, even though we can all go to Ikea, or any other flat pack furniture company, to get wobbly furniture cheaply at any time.

                                                                                                                                              I'd rather be the last carpenter charging a liveable wage, working on interesting problems for clients who appreciate a human touch than just pumping out mountains of slop to keep up with the broligarchy. If that makes me ignorant that's fine, but I'll be happily enjoying the craft while you're worrying about your metrics.

                                                                                                                                              • bluegatty a day ago

                                                                                                                                                You're offering to deliver parcels by horse - thinking that somehow your 'delivering is better because it's more natural' and that your customers will appreciate it, over the 'smog' that the cars create.

                                                                                                                                                Or in other words - 'non existent'.

                                                                                                                                                It is arrogant and luddite to suggest that 'using AI is not doing it properly' or that anyone will care.

                                                                                                                                                They care that it's done well - that's it.

                                                                                                                                                FYI, the code that AI produces is probably better than what you produce - at least a functional level.

                                                                                                                                                'Artisanility' is worthless in 'code' - there are no 'winding staircases' for us to custom build, as a master carpenter would.

                                                                                                                                                Where you can continue to 'write code by hand' is for very arcane, things, but even then you're still going to have to use AI for a lot of things in support of that.

                                                                                                                                                So if you want to get into compiler design - sure.

                                                                                                                                                But still - without mastery of AI, you'll be left behind.

                                                                                                                                                At least with horses, there's a naturalist component, with 'code' - nobody cares at all. There's zero interest in it, there's not 'organic' angle to sell.

                                                                                                                                                • sassymuffinz a day ago

                                                                                                                                                  Maybe in your industry but in mine working with small and medium businesses they value reliability above everything else. They don't give a shit whether you use AI or not as long as it's stable and works and are prepared to pay a premium for someone who knows what they're doing.

                                                                                                                                                  If you want to have a race to the bottom and be Sam Altman's lap dog, that your business.

                                                                                                                                                  • eudamoniac a day ago

                                                                                                                                                    I really don't understand why you people always say these things so matter of factly. I'd put a lot of money (and do, in the markets) on you being wrong. I'm pretty sure in ten years I will not have a problem keeping a software job without using AI.

                                                                                                                                                    • sassymuffinz a day ago

                                                                                                                                                      Maybe it's because of the environment they work in or their understanding of other people because of the business they're in.

                                                                                                                                                      Your average 50 year old business owner doesn't understand AI at all and doesn't care to know, he's too busy thinking about getting a new order for 5000 widgets that he invented. What he needs is a website with inventory management, some sort of email marketing software, some sort of CRM, maybe a dashboard or something. What he wants to do is pick up the phone to someone and get them to take care of it for a reasonable price.

                                                                                                                                                      AI is coming for programmers with no social skills, but it isn't coming for the human relationship side of the business where you need to have a few meetings to work out what they want to achieve, build a plan that works long term, have a call with other third parties or their vendors etc to alleviate pain points and then build a project around the business needs that won't crash every five minutes and leak their internal information because Claude decided security was optional.

                                                                                                                                                      Half my job is understanding what they need and then instead of accepting their original scope, building a brand new scope in collaboration with them to fit the business needs long term. If one of these guys just wants to plow the original scope into Claude and let it rip then the customer isn't getting what they need.

                                                                                                                                                  • bitwize a day ago

                                                                                                                                                    No one is paying a liveable wage for purely human-authored code anymore. This is the job now, and you are far more effective with these tools than without. If you still have an issue with their output, that's a PEBKAC and you need to upskill and/or attitude adjust. Stop thinking like a programmer and start thinking like a business person. Delegate! It doesn't matter if the machine wrote code just the way you would have, only that it gets you closer to the goal, and the machine can help with vetting and assuring that it does. If you choose to remain stubborn and closed-minded, what you will find is that clients will not care about the "human touch" in their code, and some AI-assisted consultant will come along and deliver more for less money, drinking your entire fucking milkshake.

                                                                                                                                                    In 2005, Tim Bryce wrote that programmers were by and large a lazy, discipline-averse lot who are of average intelligence at best but get very precious about their "craft", not realizing that it's only a small part of a greater whole and it's the business people who drive actual value in a company. AI is proving him 100% correct.

                                                                                                                                                    • sassymuffinz a day ago

                                                                                                                                                      Interesting - I'm still making a very liveable wage building projects for small and medium companies, because they don't know their AI from their elbow and don't want to know, they have their own business to run.

                                                                                                                                                      You forget that templates and off the shelf SAAS products have been around forever and yet I'm still here getting work because there's always a catch and it always shits the bed.

                                                                                                                                                      You mention that I must have a user/skill issue because the AI can't be trusted, I had to explain multiple times to Claude during my work that it had left a very obvious security hole in a controller and in a different policy. Stop pretending it's some sort of super intelligence, they can't even do a timer bro and OpenAI is laughing at you while taking your money.

                                                                                                                                                      • swdf a day ago

                                                                                                                                                        "Stop thinking like a programmer and start thinking like a business person."

                                                                                                                                                        Lmao software engineers are engineers because its not their job to be the business guy. Man you have been here since 2007 but you sound like an absolute bozo.

                                                                                                                                                        FYI I am a CEO and I would never expect my engineers to be thinking like a business person - thats my job. Their job is to go make my vision a reality whilst ensuring the product is trusted and so on.

                                                                                                                                                        • eudamoniac a day ago

                                                                                                                                                          This sort of angry post, with many demands "attitude adjust" "delegate" and invectives "close-minded" "lazy" never appeared for any other technology shift. React devs never posted like this about jQuery devs. Mobile app devs never posted like this about mobile web devs. Yet tons of AI users post like this about non-AI-using devs.

                                                                                                                                                          Is it some kind of fear or doubt? It's a strange phenomenon.

                                                                                                                                                          Like for example I strongly believe Typescript is better than Javascript and needs to be used instead for any serious project. But if someone says they don't like it, I cannot imagine myself writing a post like yours about it. First of all I don't care what they use, but second of all if I really wanted to convince them it would not look anything like this. Your post and many like it reads like anger and condescension and incredulity.

                                                                                                                                                          • sassymuffinz a day ago

                                                                                                                                                            He's drunk the kool aid and forgotten that some of us have been working in this industry for decades and got along just fine without AI, while he's busy debugging his technical debt and getting sued for leaking customer data I'll just be over here quietly enjoying coding for customers who like dealing with human beings and not black box robots.

                                                                                                                                                            • swdf a day ago

                                                                                                                                                              Indeed it is quite bizarre. Why are they so emotionally charged? I dont quite get it. Frankly if they are so confident in what they say, why not just watch from the side lines and laugh at the people who get bulldozed?

                                                                                                                                                              Me thinks something far more bizarro is aloof.

                                                                                                                                                              Im not even a SWE btw so I have no financial interest here, but I can see how bizarre his post is.

                                                                                                                                                    • dismalaf a day ago

                                                                                                                                                      > The era of subsidization is over

                                                                                                                                                      Of course it is. Returns are diminishing, AGI isn't happening with current techniques but it is good enough to sell, so it's time to monetize. I just got an email from OpenAI as well about ads in their free tier (I signed up once out of curiosity).

                                                                                                                                                      • bluegatty a day ago

                                                                                                                                                        Subsidization is not nearly over

                                                                                                                                                        It's true AGI is 'not happening' but it doesn't matter.

                                                                                                                                                        Demand for AI is explosive, sales are skyrocketing.

                                                                                                                                                        We have another 5-8 years of this crazy investment stuff.

                                                                                                                                                        Altman will step aside before they turn into a 'normal company'.

                                                                                                                                                        Like they did at Uber.

                                                                                                                                                        • rvz a day ago

                                                                                                                                                          > AGI isn't happening with current techniques but it is good enough to sell, so it's time to monetize.

                                                                                                                                                          Or perhaps it was a scam in the first place for an IPO.

                                                                                                                                                        • deadbabe a day ago

                                                                                                                                                          Not over yet. More hikes will come. It will reach $1000.

                                                                                                                                                          • satvikpendem a day ago

                                                                                                                                                            That's what I said by subsidization being over.

                                                                                                                                                            • zamadatix a day ago

                                                                                                                                                              Can you expand what you mean by "subsidization being over" in terms of the plan prices?

                                                                                                                                                              - Plus is still the same $20

                                                                                                                                                              - 20x Pro is still the same $200

                                                                                                                                                              - This is a new 5x tier is $100

                                                                                                                                                              https://help.openai.com/en/articles/9793128-what-is-chatgpt-... is probably a better direct comparison of the 3

                                                                                                                                                              • conradkay a day ago

                                                                                                                                                                Codex had 2x usage until April 1, I think when that ended there were a lot more people (like myself) who were fine on $20 but now want more usage

                                                                                                                                                                • satvikpendem a day ago

                                                                                                                                                                  They're trying to slowly move up market. I assume soon the $20 will get its own restrictions in the future (and/or ads) to get people to pay $100.

                                                                                                                                                                  • zamadatix a day ago

                                                                                                                                                                    The same could have been claimed in 2024 when they introduced the $200 Pro plan. Nothing is over yet just because of what could possibly happen next.

                                                                                                                                                                    • satvikpendem a day ago

                                                                                                                                                                      Yes, and now they (will) have ads in the lower tiers and eventually will add them in higher tiers as well. It's no different to Netflix's methods. Therefore the statement would have been proven right if it were claimed back then.

                                                                                                                                                                      • zamadatix a day ago

                                                                                                                                                                        But these are, again, additional tiers.

                                                                                                                                                                        You can't just say because they've added more things the old things are over - the old things actually have to go away first. Eventually they may get there (or not). It may be another few years (or not). Nothing is actually now over though any more than it was now over in 2024.

                                                                                                                                                                        • satvikpendem a day ago

                                                                                                                                                                          Additional tiers to get people to move up as they enshittify the lower tiers. We already see it in other companies as well as OpenAI themselves so my inference is based on that, not to wait and see until they do indeed enshittify it.

                                                                                                                                                                          • zamadatix a day ago

                                                                                                                                                                            That the inference would say these existing paid tiers should have already enshittified with the 2024 $200 Pro announcement is precisely why one does need to wait & see.

                                                                                                                                                                • operatingthetan a day ago

                                                                                                                                                                  The subsidization being "over" would mean we are paying their actual cost or more.

                                                                                                                                                                  • satvikpendem a day ago

                                                                                                                                                                    I see what you mean, I misread your initial reply.

                                                                                                                                                              • creddit a day ago

                                                                                                                                                                Pro used to be $200.

                                                                                                                                                              • 2001zhaozhao a day ago

                                                                                                                                                                The title is misleading. The only thing they seem to have done was add a $100 plan identical to Claude's, which gives 5x usage of ChatGPT Plus. There is still a $200 plan that gives 20x usage.

                                                                                                                                                                • jstummbillig a day ago

                                                                                                                                                                  That is not the "only" thing: You get access to GPT-5.4 pro.

                                                                                                                                                                  • giwook a day ago

                                                                                                                                                                    Just to clarify, one does not get access to the pro model on the Pro plan?

                                                                                                                                                                    • carbocation a day ago

                                                                                                                                                                      The $20 Plus plan still exists, and does not give access to the pro model.

                                                                                                                                                                      The $200 Pro plan still exists, and does give access to the pro model.

                                                                                                                                                                      What is new is a $100 Pro plan that does give access to the pro model, with lower usage limits than the $200 Pro plan.

                                                                                                                                                                      • dimmke a day ago

                                                                                                                                                                        This is still worse than Anthropic's right? Because you get access to their top model even at the $20 price point

                                                                                                                                                                        • Tiberium a day ago

                                                                                                                                                                          It's not worse, Anthropic simply has no equivalent model (if you don't consider Mythos) of GPT 5.4 Pro. Google does though: Gemini 3.1 Deep Think.

                                                                                                                                                                          GPT 5.4 Pro is extremely slow but thorough, so it's not meant for the usual agentic work, rather for research or solving hard bugs/math problems when you provide it all the context.

                                                                                                                                                                          • giwook a day ago

                                                                                                                                                                            I'm genuinely asking, when you say Gemini 3.1 DT is an equivalent model of GPT 5.4 Pro, is there a specific benchmark/comparison you're referring to or is this more anecdotal?

                                                                                                                                                                            And do you mean to say that you don't really use GPT 5.4 Pro unless it's for a hard bug? Curious which models you use for system design/architecture/planning vs execution of a plan/design.

                                                                                                                                                                            TIA! I'm still trying to figure out an optimal system for leveraging all of the LLMs available to us as I've just been throwing 100% of my work at Claude Code in recent months but would like to branch out.

                                                                                                                                                                            • simianwords a day ago

                                                                                                                                                                              Pro and DT model are equivalents because

                                                                                                                                                                              - internally same architecture of best of N

                                                                                                                                                                              - not available in the code harness like Codex, only in the UI (gpt has API)

                                                                                                                                                                              - GPT-5.4 pro is extremely expensive: $30.00 input vs $180.00 output

                                                                                                                                                                              - both DT and Pro are really good at solving math problems

                                                                                                                                                                        • irishcoffee a day ago

                                                                                                                                                                          So, reading the tea leaves, they're either losing subscribers for the $200 plan, or they're not following the same hockey stick path of growth they thought they were... maybe?

                                                                                                                                                                          Edit: I wonder if this is actually compute-bound as the impetus

                                                                                                                                                                          • tedsanders a day ago

                                                                                                                                                                            Nope, it's just that a lot of people (especially those using Codex) asked us for a medium-sized $100 plan. $20 felt too restrictive and $200 felt like a big jump.

                                                                                                                                                                            Pricing strategy is always a bit of an art, without a perfect optimum for everyone:

                                                                                                                                                                            - pay-per-token makes every query feel stressful

                                                                                                                                                                            - a single plan overcharges light users and annoyingly blocks heavy users

                                                                                                                                                                            - a zillion plans are confusing / annoying to navigate and change

                                                                                                                                                                            This change mostly just adds a medium-sized plan for people doing medium-sized amounts of work. People were asking for this, and we're happy to deliver.

                                                                                                                                                                            (I work at OpenAI.)

                                                                                                                                                                            • aryehof 17 hours ago

                                                                                                                                                                              Did you modify the Plus plans usage recently or as part of this introduction? Given that Pro plans usage are multiples of it (5x/20x) and given reports of less Plus usage, clarification would be appreciated?

                                                                                                                                                                              Transparency on this sort of thing is the best way to address negative company sentiment.

                                                                                                                                                                              • tedsanders 16 hours ago

                                                                                                                                                                                I'm honestly not sure, as I don't work on it. My understanding from afar is:

                                                                                                                                                                                - There was a 2x promotion in March that ended on April 2, so limits have felt tighter since then

                                                                                                                                                                                - We sometimes reset rate limits after bugs or milestones or because Tibo feels generous, which can make some days feel different than others (they are typically announced here: https://x.com/thsottiaux)

                                                                                                                                                                                - Recently Plus was tweaked to have a smaller 5h limit but an increased weekly limit

                                                                                                                                                                                - Lastly, as part of the new Pro launch, the $100 & $200 Pro tiers are getting a 2x promotion, meaning they are temporarily 10x/40x instead of 5x/20x

                                                                                                                                                                                I've asked our team to clarify the pricing page. Agree it's not clear.

                                                                                                                                                                              • irishcoffee a day ago

                                                                                                                                                                                Thanks for the response. I tried to phrase my postulations as just that, I didn’t intend to be an accusatory.

                                                                                                                                                                                You like the job? How’s the day-to-day go? Yanking tickets or more organic?

                                                                                                                                                                                • tedsanders a day ago

                                                                                                                                                                                  All good, I interpreted it as postulation and not accusation. :)

                                                                                                                                                                                  I do like the job! Much more organic than yanking tickets, though I'm on the model training side of things, rather than product side. Always a balance between short-term sprints patching bad behaviors for the next model vs long-term investments in infra and science that make future work easier. Sometimes the negative press gets to me a bit (it's a very different feeling than 2022 or 2023), but my goal is just to make the most useful product I can for people. It's been wild how much Codex has already changed my day-to-day work, I'm so curious to see what it looks like in 2030 or 2040.

                                                                                                                                                                                  • irishcoffee 10 hours ago

                                                                                                                                                                                    What kind of bad behaviors? How is the whole SDLC lifecycle there? I imagine, given that this tech is kind of redefining how software is being written, it's not your standard workflow pipeline? Are there code reviews at all? Have you been in any particularly interesting meetings about how you're trying to "shape" the models?

                                                                                                                                                                                    I won't misrepresent myself, I've never spent a penny on any of these services. I am just super curious what it's like to work at one of these frontrunner companies. I bet it's pretty neat.

                                                                                                                                                                              • alyxya a day ago

                                                                                                                                                                                Plenty of people wanted to spend more than $20 but less than $200 for a plan. It's long overdue IMO.

                                                                                                                                                                            • patates a day ago

                                                                                                                                                                              Plus plan doesn't get the pro model, which is (AFAICT) the same 5.4 model but thinks like a lot.

                                                                                                                                                                              • jgalt212 a day ago

                                                                                                                                                                                You're trying to make words mean what we all think they mean. Stop foisting your Textualism upon us!

                                                                                                                                                                              • J_Shelby_J a day ago

                                                                                                                                                                                Will they fix the pro model so it actually finishes the last step instead of hanging for 10-20m doing nothing?

                                                                                                                                                                                It’s only use case now is when you can walk away for an hour.

                                                                                                                                                                                • taoh a day ago

                                                                                                                                                                                  Does GPT-5.4 pro give a much better result in some circumstances? What're their typical uses in your experience?

                                                                                                                                                                                  • dyauspitr a day ago

                                                                                                                                                                                    If you want it to deeply research something pro is great. I had a problem I just couldn’t find with my oven so I gave it a lot of information and it went off on its own for about 2 hours and then gave me what I needed to fix the problem (fan was turning off too quickly which was causing the panel to overheat). I have no idea how it figured it out and I couldn’t find anything after hours of googling so it was very impressive. I even went and googled for it once I knew what the problem was and I still couldn’t find the solution that it came up with.

                                                                                                                                                                                    • taoh a day ago

                                                                                                                                                                                      Thanks for sharing this experience. Does it cost a lot of token in the deep analysis - which will make the $100 plan much quicker to drain all budgets.

                                                                                                                                                                                      • dyauspitr a day ago

                                                                                                                                                                                        I think it’s going to be very hard to blow through your tokens just using chat. I mostly bought the plan so I could use Codex and on the $200 a month plan I’ve basically been using it 15 hours a day almost nonstop and I don’t run out of tokens for the week.

                                                                                                                                                                                • exitb a day ago

                                                                                                                                                                                  Notably, up until now Pro had 6x usage of Plus. So the title is only slightly misleading.

                                                                                                                                                                                  On the other hand, the benchmark of Plus usage seems to be to be all over the place, so it’s difficult to say now how does the usage compare to the old Pro.

                                                                                                                                                                                  • strongpigeon a day ago

                                                                                                                                                                                    You’re right. I missed the “From $100”. Edited title.

                                                                                                                                                                                    • creamyhorror 19 hours ago

                                                                                                                                                                                      r/codex is reporting that $20 (Plus) seems to have had its usage limit reduced (some people are saying it feels like 1/3 the previous limit now). The theory[1] is that reducing $20's limit lets them claim $200 has 20x $20's limit (and $100 has 10x).

                                                                                                                                                                                      If that's true, then the value comparison is not so positive for Codex any more

                                                                                                                                                                                      [1] https://old.reddit.com/r/codex/comments/1sgxy71/so_did_they_...

                                                                                                                                                                                      • selectively a day ago

                                                                                                                                                                                        Oh. Yikes.

                                                                                                                                                                                      • pseudosavant a day ago

                                                                                                                                                                                        That has me quite tempted. In general, I stay under the Plus limits, but I do watch my consumption. I could use `/fast` mode all of the time, with extra high reasoning, and use gpt-5.4-pro for especially complex tasks. It wasn't worth 10x the price to me before, but 5x is approachable.

                                                                                                                                                                                        • jstummbillig a day ago

                                                                                                                                                                                          I think you currently can't use pro inside codex, or can you?

                                                                                                                                                                                          • pseudosavant a day ago

                                                                                                                                                                                            Good question. I'd hope/expect that you could, but that doesn't mean much.

                                                                                                                                                                                            • simianwords a day ago

                                                                                                                                                                                              no you can't use pro in the harness.

                                                                                                                                                                                              • pseudosavant a day ago

                                                                                                                                                                                                That is disappointing. I find myself mostly using Codex these days. ChatGPT still for one off questions/prompts, but usually complex problems require context, like files and tools I already have on my system, and Codex is way better for that.

                                                                                                                                                                                                • simianwords 15 hours ago

                                                                                                                                                                                                  I think pro is really subsidised and they don't want to encourage it in the harness.

                                                                                                                                                                                            • codybontecou a day ago

                                                                                                                                                                                              That's an odd choice. I wonder why.

                                                                                                                                                                                          • gizmodo59 a day ago

                                                                                                                                                                                            Gpt 5.4 high with fast mode in codex app is hands down the best way to do anything coding or non coding. If you have not tried it you are missing out. 100$ well spent. Claude code is too hyped up on HN.

                                                                                                                                                                                            • sourcecodeplz a day ago

                                                                                                                                                                                              I like that they kept limited access to Codex even on free tier.

                                                                                                                                                                                              LE: Someone said this is how the tiers are now counted:

                                                                                                                                                                                              "Essentially if old plus is 1x then new limits are: Plus - 0.3x Pro $100 - 1.5x Pro $200 - 6x (unchanged)"

                                                                                                                                                                                              • xur17 a day ago

                                                                                                                                                                                                Any idea way "5x or 20x more usage" means?

                                                                                                                                                                                                • josh_p a day ago

                                                                                                                                                                                                  What’s the difference between the two Pro plans?

                                                                                                                                                                                                  Both Pro plans include the same core capabilities. The main difference is usage allowance: Pro $100 unlocks 5x higher usage than Plus (and 10x Codex usage vs. Plus for a limited time), while Pro $200 unlocks 20x usage than Plus.

                                                                                                                                                                                                  From their faq

                                                                                                                                                                                                  • terramex a day ago

                                                                                                                                                                                                    5x more usage than in Plus is 100$

                                                                                                                                                                                                    20x more usage than in Plus is 200$

                                                                                                                                                                                                    I see this when I try to upgrade my Plus subscription.

                                                                                                                                                                                                    • reed1234 a day ago

                                                                                                                                                                                                      If you pay 200 you get 20x

                                                                                                                                                                                                      • recursive a day ago

                                                                                                                                                                                                        The price is $100 according to this post. Where is there an option for $200?

                                                                                                                                                                                                        • orphea a day ago

                                                                                                                                                                                                          You choose on checkout. There it says

                                                                                                                                                                                                              Plan details
                                                                                                                                                                                                          
                                                                                                                                                                                                              5x more usage than Plus        20x more usage than Plus
                                                                                                                                                                                                              $120/month                     $200/month
                                                                                                                                                                                                          • recursive a day ago

                                                                                                                                                                                                            So curious that the cost in the comparison is just a flat $100, not "$100 or $200" and yet the usage has the "or". Surely just a lapse in copy editing.

                                                                                                                                                                                                            • AstroBen a day ago

                                                                                                                                                                                                              Surely they weren't trying to be deceptive... surely.

                                                                                                                                                                                                              • conradkay a day ago

                                                                                                                                                                                                                Anthropic is the exact same way, I think they're just trying to avoid having 5 different subscription tiers visible. Probably needing 20x is very niche

                                                                                                                                                                                                              • layer8 a day ago

                                                                                                                                                                                                                It states “From $100”. Standard pricing speak.

                                                                                                                                                                                                                • recursive a day ago

                                                                                                                                                                                                                  Unfortunately also standard pricing speak to make the "From" 20% the font size and decreased contrast. Maybe they learned it from car marketing.

                                                                                                                                                                                                            • AstroBen a day ago

                                                                                                                                                                                                              seems like this $100 replaced the $200 plan

                                                                                                                                                                                                              So.. cheaper?

                                                                                                                                                                                                              • readitalready a day ago

                                                                                                                                                                                                                No, the same $200 plan is still there. They hid it behind the $100 click-through.

                                                                                                                                                                                                                This just adds a $100 plan that's 1/4 the usage of the $200 plan..

                                                                                                                                                                                                          • recursive a day ago

                                                                                                                                                                                                            I assume it means 5x if they get to choose. They're the ones enforcing the limits.

                                                                                                                                                                                                            • rvz a day ago

                                                                                                                                                                                                              Suppose you enter a casino and the owner welcomes you in and sees that you are a frequent loyal s̶p̶e̶n̶d̶e̶r̶ customer (with the amount of tokens you are spending a month) with an existing membership.

                                                                                                                                                                                                              With this new VIP membership that comes with 5x or 20x usage, if you spend $100 you get 5x. $200 you get 20x and you get to spin the wheel and use the slot machines unlimited times even at peak hours more than most without any restrictions, 24/7, no waiting for hours with priority.

                                                                                                                                                                                                              So spend more to get more abundance and more simultaneous spins at the wheel.

                                                                                                                                                                                                              Except if you're trying to abuse the slot machines themselves or sharing or reselling your membership to other customers who want a spin at the roulette wheel; but were previously banned. [0]

                                                                                                                                                                                                              [0] https://help.openai.com/en/articles/9793128-about-chatgpt-pr...

                                                                                                                                                                                                              • gib444 a day ago

                                                                                                                                                                                                                Exactly! :)

                                                                                                                                                                                                              • laacz a day ago

                                                                                                                                                                                                                They are actively exploiting the compute shortages of Anthropic. In our team we're pushing for more or less vanilla and portability, since the best harness today might not be the best one in 6 months.

                                                                                                                                                                                                                • AbstractH24 10 hours ago

                                                                                                                                                                                                                  I deleted my ChatGPT account after alll that stuff with the pentagon. Have I missed anything?

                                                                                                                                                                                                                  • gmig a day ago

                                                                                                                                                                                                                    This is an additional offering to the existing plan.

                                                                                                                                                                                                                    5x=$100 20x=$200

                                                                                                                                                                                                                    • rossant a day ago

                                                                                                                                                                                                                      How much was it before?

                                                                                                                                                                                                                      • coreyburnsdev a day ago

                                                                                                                                                                                                                        $20 or $200 plan, now we have a middle plan.

                                                                                                                                                                                                                      • disiplus a day ago

                                                                                                                                                                                                                        It looks like its called prolite.

                                                                                                                                                                                                                        https://snipboard.io/jmGKfM.jpg

                                                                                                                                                                                                                        • daft_pink a day ago

                                                                                                                                                                                                                          It says 5x or 20x more usage, so does that mean they have copied clause and have a 5x for 100 and 20x for 200?

                                                                                                                                                                                                                          • Brosper 15 hours ago

                                                                                                                                                                                                                            It's the end of cheap AI. (It's my opinion.)

                                                                                                                                                                                                                            It helps them cut the subsidization of tokens. Then they will release Pro x2, which could be the same as the old $200 but with fewer tokens.

                                                                                                                                                                                                                            • bottlepalm a day ago

                                                                                                                                                                                                                              Are you allowed to run your own autonomous agents with it outside of Codex, like OpenClaw and others?

                                                                                                                                                                                                                              • yoniknak a day ago

                                                                                                                                                                                                                                I’ve used both a fair amount, and for actual coding work I still prefer Codex over Opus.

                                                                                                                                                                                                                                • koolba a day ago

                                                                                                                                                                                                                                  Does this give you something different than the $20/mo plan when using codex?

                                                                                                                                                                                                                                  • Tiberium a day ago

                                                                                                                                                                                                                                    Yes, it's 5x more usage than Plus, and with the current promotions you actually get 10x more usage than Plus on the $100 plan until May 31st.

                                                                                                                                                                                                                                    Same for the $200 plan, it's still 2x its normal usage until that date.

                                                                                                                                                                                                                                  • I_am_tiberius a day ago

                                                                                                                                                                                                                                    For me it's not the price. It's the fact that they obviously read my prompts and may even use a derived version of my data for training. As it's very clear in the meantime that SAMA lies most of the time, there's just no way I can trust this company in any way.

                                                                                                                                                                                                                                    • asadm a day ago

                                                                                                                                                                                                                                      are your prompts that important that you would not use SOTA models just to protect them?

                                                                                                                                                                                                                                      For me, they are just a means to an end and disposable.

                                                                                                                                                                                                                                      • Toutouxc a day ago

                                                                                                                                                                                                                                        So.. would you mind releasing all your code on GitHub?

                                                                                                                                                                                                                                    • MallocVoidstar a day ago

                                                                                                                                                                                                                                      https://x.com/OpenAI/status/2042296046009626989

                                                                                                                                                                                                                                      >Our existing $200 Pro tier still remains our highest usage option.

                                                                                                                                                                                                                                      • jerrygoyal 19 hours ago

                                                                                                                                                                                                                                        Astroturfing here is beyond control

                                                                                                                                                                                                                                        • hackable_sand a day ago

                                                                                                                                                                                                                                          Can you guys remind me again why you're doing this?

                                                                                                                                                                                                                                          • christkv a day ago

                                                                                                                                                                                                                                            I wish these plans had burst mode where I could set default plan size and max plan size and just scale up for a month automatically if needed but automatically drop back to my default plan at the next billing cycle

                                                                                                                                                                                                                                          • jedisct1 a day ago

                                                                                                                                                                                                                                            Awesome news.

                                                                                                                                                                                                                                            And that includes usage of the API with any agent without risking being banned. OpenAI is also very supportive of open source software.

                                                                                                                                                                                                                                            I'm using GPT-5.4 with Swival (https://swival.dev) for a while, alongside local models, and it's absolutely fantastic.

                                                                                                                                                                                                                                            • bossyTeacher a day ago

                                                                                                                                                                                                                                              It really feels like LLMs will mostly become tools for tech workers rather than the kind of civilization-level transformation sama has been peddling. Every single comment here seems to confirm the above.

                                                                                                                                                                                                                                              • wolttam a day ago

                                                                                                                                                                                                                                                This is kind of the goldilocks zone for LLMs right now.

                                                                                                                                                                                                                                                I wouldn't mistake this for any kind of capability plateau. There is a massive push towards making transformers the engine of humanoid (and other kinds of) robotics, we just haven't reached the hype moment for those yet.

                                                                                                                                                                                                                                                • bossyTeacher a day ago

                                                                                                                                                                                                                                                  > I wouldn't mistake this for any kind of capability plateau. There is a massive push towards making transformers the engine of humanoid (and other kinds of) robotics, we just haven't reached the hype moment for those yet.

                                                                                                                                                                                                                                                  Problem is that the fuel to get this train going relies on investors money. Investors aren't going to be happy with the quote I took from your message.

                                                                                                                                                                                                                                                  And that's the real bet really, can the industry turn the spark into fire before the investor money runs out?

                                                                                                                                                                                                                                                  • wolttam a day ago

                                                                                                                                                                                                                                                    Development towards those goals will continue with or without massive investor capital, see Google’s involvement with Boston Dynamics.

                                                                                                                                                                                                                                                    And plenty of very wealthy folks see the writing on the wall wrt robotics.

                                                                                                                                                                                                                                                    • swdf a day ago

                                                                                                                                                                                                                                                      "Development towards those goals will continue with or without massive investor capital, see Google’s involvement with Boston Dynamics"

                                                                                                                                                                                                                                                      Lol google holds and invests its shareholders cash on their behalf. Using that money comes with a cost - they demand a rate of return.

                                                                                                                                                                                                                                                      Swear most of you need to take a corporate finance 101 class.

                                                                                                                                                                                                                                                      • wolttam 4 hours ago

                                                                                                                                                                                                                                                        But Google is not a scrappy startup. They're sitting on over $100 bil in cash.

                                                                                                                                                                                                                                                • randomNumber7 a day ago

                                                                                                                                                                                                                                                  > Every single comment here seems to confirm the above.

                                                                                                                                                                                                                                                  The population on Hacker News heavily skewed towards tech workers so I wouldn't draw a conclusion from that.

                                                                                                                                                                                                                                                  • swdf a day ago

                                                                                                                                                                                                                                                    Thats exactly what they will be and they will not earn excess returns. So the valuations of OAI and Anthropic are bizarro.

                                                                                                                                                                                                                                                    And I dont see open source development stopping either.

                                                                                                                                                                                                                                                    • 000ooo000 a day ago

                                                                                                                                                                                                                                                      Wouldn't put too much weight on HN comments boosting AI. Lots of brand new accounts, obvious LLM drivel ("it's not X, it's Y"). I just tried to reply to one to call out how overt it was and the comment was already killed, so they're definitely here.

                                                                                                                                                                                                                                                      • dude250711 a day ago

                                                                                                                                                                                                                                                        I have heard a CTO had a major success building a side project over a weekend.

                                                                                                                                                                                                                                                        • bossyTeacher a day ago

                                                                                                                                                                                                                                                          Is that CTO in the room with us?

                                                                                                                                                                                                                                                      • righthand a day ago

                                                                                                                                                                                                                                                        This is like the 2010s hosting price wars.

                                                                                                                                                                                                                                                        • varispeed a day ago

                                                                                                                                                                                                                                                          What is the difference between Pro and normal mode apart from the fact the Pro takes ages to finish? I see not much difference in output quality.

                                                                                                                                                                                                                                                          • d3rockk a day ago

                                                                                                                                                                                                                                                            Now they just need every iPhone owner on the planet to subscribe, and this AI bubble will officially be unpoppable!

                                                                                                                                                                                                                                                            • flextheruler a day ago

                                                                                                                                                                                                                                                              Tell me you're losing market share to competitors without telling me you're losing market share to competitors

                                                                                                                                                                                                                                                              • Archerlm a day ago

                                                                                                                                                                                                                                                                just a rumor, but i heard altman was adding a timer which required the R&D dept. to triple

                                                                                                                                                                                                                                                                • throwatdem12311 a day ago

                                                                                                                                                                                                                                                                  I heard it’ll take about a year. Timers are a hard problem to solve.

                                                                                                                                                                                                                                                                  • sassymuffinz a day ago

                                                                                                                                                                                                                                                                    While they work on getting it to tell the time they'll just be over there listing targets for military strikes.

                                                                                                                                                                                                                                                                • selectively a day ago

                                                                                                                                                                                                                                                                  Price drops are nice. Unfortunately, the quality differential versus the competitor is night and day.

                                                                                                                                                                                                                                                                  And everyone serious uses the API rate billing anyway.

                                                                                                                                                                                                                                                                  • aerhardt a day ago

                                                                                                                                                                                                                                                                    > the quality differential versus the competitor is night and day.

                                                                                                                                                                                                                                                                    This myth about the inferiority of ChatGPT and Codex is becoming a meme.

                                                                                                                                                                                                                                                                    I have active subscriptions to both. I am throwing at Codex all kinds of data engineering, web development and machine learning problems, have been working on non-tech tasks in the "Karpathy Obsidian Wiki" [1] style before he posted about it.

                                                                                                                                                                                                                                                                    Not only does Codex crush Claude on cost, it's also significantly better at adherence and overall quality. Claude is there on my Mac, gathering dust, to the point I am thinking of not renewing the sub.

                                                                                                                                                                                                                                                                    There are plenty of fellow HNers here who feel the same from what I read in the flamewars. I suspect none of us really has a horse in this race and many are half-competent (in other threads, they mention they do things like embedded programming, distributed DL systems, etc.)

                                                                                                                                                                                                                                                                    I'm starting to suspect a vast majority of people pushing the narrative that Claude is vastly better haven't even tried the 5.3 / 5.4 models and are doing it out of sheer tribalism.

                                                                                                                                                                                                                                                                    [1] https://gist.github.com/karpathy/442a6bf555914893e9891c11519...

                                                                                                                                                                                                                                                                    • selectively a day ago

                                                                                                                                                                                                                                                                      I have access to effectively infinite API tokens for all models from Anthropic as well as OpenAI. The differential in performance in complex tasks is vast and strongly in favor of Opus, in my experience. I do not use the official harnesses for either model, though - as they are not my taste.

                                                                                                                                                                                                                                                                      Codex is closer to my taste, as it is at least a native app and not typescript slop. But the model is just not up to snuff.

                                                                                                                                                                                                                                                                    • hyperionultra a day ago

                                                                                                                                                                                                                                                                      Disagree. I use codex extensively. It just works so well with vscode and python. Claude with ridiculous limits - thanks no. For some even xAI is good fit.

                                                                                                                                                                                                                                                                      • randomNumber7 a day ago

                                                                                                                                                                                                                                                                        > For some even xAI is good fit.

                                                                                                                                                                                                                                                                        Grok makes sense if you want s.th. less censored that is not biased towards woke ideology.

                                                                                                                                                                                                                                                                        I don't see how this matters for coding though. I only use it to give me a summary of recent news (so I don't have to actually read the bs newspapers and X posts myself).

                                                                                                                                                                                                                                                                      • nilkn a day ago

                                                                                                                                                                                                                                                                        This take is out-of-date by months (which is an eternity in this space). Codex today has caught up and is very much on par with CC.

                                                                                                                                                                                                                                                                        • satvikpendem a day ago

                                                                                                                                                                                                                                                                          I prefer and use 5.4 over Opus, it's simply better, faster, and doesn't glaze me like Claude models want to do for some reason.