Comments Page - CodeMender: an AI agent for code security

« Back CodeMender: an AI agent for code securitydeepmind.googleSubmitted by ravenical 15 hours ago

sobiolite 13 hours ago
I wonder if we're going to end up in an arms race between AIs masquerading as contributors (and security researchers) trying to introduce vulnerabilities into popular libraries, and AIs trying to detect and fix them.
- torginus 3 hours ago
  If you are doing security of all things - why wouldn't you verify the provenance of your tooling and libs?
- sublinear 13 hours ago
  Why would it be like that instead of the way we already handle low-trust environments?
  Projects that get a lot of attention already put up barriers to new contributions, and the ones that get less attention will continue to get less attention.
  The review process cannot be left to AI because it will introduce uncertainty nobody wants to be held responsible for.
  If anything, the people who have always seen code as a mere means to an end will finally come to a forced decision: either stop fucking around or get out of the way.
  An adversarial web is ultimately good for software quality, but less open than it used to be. I'm not even sure if that's a bad thing.
  sobiolite 12 hours ago
  What I'm suggesting is: what if AIs get so good at crafting vulnerable (but apparently innocent) code than human review cannot reliably catch them?
  And saying "ones that get less attention will continue to get less attention" is like imagining that only popular email addresses get spammed. Once malice is automated, everyone gets attention.
  courseofaction 12 hours ago
  Significantly easier to detect than create? Not quite NP, but intuitively an AI which can create such an exploit could also detect it.
  The economics is more about how much the defender is willing to spend in advance protection vs the expected value of a security failure
  cookiengineer 9 hours ago
  I think the issue I have with this argument is that it's not a logical conclusion that's based on technological choice.
  It's an argument about affordability and the economics behind it, which puts more burden on the (open source) supply chain which is already stressed to its limit. Maintainers simply don't have the money to keep up with foreign state actors. Heck, they don't even have money for food at this point, and have to work another job to be able to do open source in their free time.
  I know there are exceptions, but they are veeeery marginal. The norm is: open source is unpaid, tedious, and hard work to do. It will get harder if you just look at the sheer amount of slopcode pull requests that plague a lot of projects already.
  The trend is likely going to be more blocked pull requests by default rather than having to read and evaluate each of them.
summarity 4 hours ago
If you want to get reliable automated fixes today, I'd encourage you to enable code scanning on your repo. It's free for open-source repos and includes Copilot Autofix (also for free).
We've already seen more than 100,000 fixes applied with Autofix in the last 6 months, and we're constantly improving it. It's powered by CodeQL, our deterministic and in-depth static analysis engine, which also recently gained support for Rust.
To enable go to your repo -> Security -> code scanning.
Read more about how autofix works here: https://docs.github.com/en/code-security/code-scanning/manag...
And stay tuned for GitHub Universe in a few weeks for other relevant announcements ;).
Disclaimer: I'm the Product lead on detection & remediation engines at GitHub
- inemesitaffia 3 hours ago
  Please tell your people about 2FA SMS delivery issues to certain West African countries. I'd rather have it via email or have the option of WhatsApp
  I was fine before 2FA and I'm willing to pay to go without. Same username
  Can't scan my code if I can't access my account
nickpinkston 13 hours ago
I'm optimistic that it's easier to find/solve vulnerabilities via auto pen-testing / patching, and other security measures, than it will be to find/exploit vulnerabilities after - ie defense is easier in an auto-security world.
Does anyone disagree?
This is purely my intuition, but I'm interested in how others are thinking about it.
All this with the mega caveat of this assuming very widespread adoption of these defenses, which we know won't be true and auto-hacking may be rampant for a while.
- closeparen 10 hours ago
  If you can compromise an employee desktop and put a too-cheap-to-meter intelligence equivalent to a medium-skilled software developer in there to handcraft an attack on whatever internal applications they have access to, it's kind of over. This kind of stuff isn’t normally hardened against custom or creative attacks. Cybersecurity rests on bot attacks having known signatures, and sophisticated human attackers having better things to do with their time.
  squigz 8 hours ago
  Why not put a more powerful agent in there to handcraft defences?
- dotancohen 2 hours ago
  In many small companies (e.g. startups), the attackers are far more experienced and skilled than are the defenders. For attacking specific targets, they also have the leisure of choosing the timing of the attack - maybe the CTO just boarded a four hour flight?
- NitpickLawyer 6 hours ago
  > I'm optimistic that it's easier to find/solve vulnerabilities via auto pen-testing / patching, and other security measures, than it will be to find/exploit vulnerabilities after - ie defense is easier in an auto-security world.
  I somewhat share the feeling that this is where it's going, but not sure if fixing will be easier. In "meatbag" red vs. blue teams, reds have it easier as they only have to make it once, blue has to always be right.
  I do imagine something adversarial being the new standard, though. We'll have red vs blue agents that constantly work on owning the other side.
- manquer 13 hours ago
  In open source codebases perhaps, either because big tech would be generous enough to run and generate PRs(if they are welcome ) for those issues.
  In proprietary/closed source it depends on ability to spend the money these tools would end up costing.
  As there is more and more vibe coded apps there will be more security bugs because app owners just don’t know better or don’t care to fix them .
  This happened when rise of Wordpress and other cmses and their plugin ecosystem or languages like early PHP or for that matter even C opened up software development to wider communities.
  On average we will see more issues not less.
- courseofaction 12 hours ago
  I've also thought this for scam perpetration vs mitigation. An AI listening to grandma's call would surely detect most confidence or pig butchering scams (or suggest how to verify), and be able to cast doubt on the caller's intentions or inform a trusted relative before the scammer can build up rapport. Security and surveillance concerns notwithstanding.
- Joel_Mckay 12 hours ago
  In general, most modern vulnerabilities are initially identified with fuzzing systems under abnormal conditions. Whether these issues may be consistently exploited can be probabilistic in nature, and thus repeatability with a POC dataset is already difficult.
  That being said, most modern exploits are already auto-generated though brute-force, as nothing more complex is required.
  >Does anyone disagree?
  CVE agents already pose a serious threat vector in and of itself.
  1. Models can't currently be made inherently trustworthy, and the people claiming otherwise are selling something.
  "Sleeper Agents in Large Language Models - Computerphile"
  https://www.youtube.com/watch?v=wL22URoMZjo
  2. LLMs can negatively impact logical function in human users. However, people feel 20% more productive, and that makes their contributed work dangerous.
  3. People are already bad at reconciling their instincts and rational evaluation. Adding additional logical impairments is not wise:
  https://www.youtube.com/watch?v=-Pc3IuVNuO0
  4. Auto merging vulnerabilities into opensource is already a concern, as it falls into the ambiguous "Malicious sabotage" or "Incompetent noob" classifications. How do we know someone or some models intent? We can't, and thus the code base could turn into an incoherent mess for human readers.
  Mitigating risk:
  i. Offline agents should only have read-access to advise on identified problem patterns.
  ii. Code should never be cut-and-pasted, but rather evaluated for its meaning.
  iii. Assume a system is already compromised, and consider how to handle the situation. In this line of reasoning, the policy choices should become clear.
  Best of luck, =3
narmiouh 13 hours ago
Not a fan of future products being announced as if they are here but are basically is still in "Internal Research" stages. I'm not sure who this is really helping? except creating unnecessary anticipation which we kinda all know are in this loop lately of "yes it works great, but".
sigmar 13 hours ago
4.5 million lines of code for one fix is impressive for an LLM agent, but there's so little detail in this post otherwise. Perhaps this is a tease to what will be released on Thursday...
- wrs 13 hours ago
  That's how I read it at first too, but I think the more probable interpretation is that it was a fix to a project that has 4.5M lines of code.
  sigmar 13 hours ago
  oh, that would definitely make more sense.
Yoric 5 hours ago
Does anybody know how such LLMs are trained/fine-tuned?
bgwalter 13 hours ago
So it is a secret tool, they will "gradually reach out to interested maintainers of critical open source projects with CodeMender-generated patches", then they "hope to release CodeMender as a tool that can be used by all software developers".
Why is everything in "AI" shrouded in mystery, hidden behind $200 monthly payments and has glossy announcements. Just release the damn thing and let us test it. You know, like the software we write and that you steal from us.
mmaunder 13 hours ago
Can we just flag this since it’s not actually a thing available to anyone?
zb3 13 hours ago
DeepMind = not available for use
- esafak 13 hours ago
  It's lost its charm.
blibble 13 hours ago
what an annoying page
pointless videos, without enough time to read the code