ChatGPT o3 found a Linux Kernel vulnerability. "The future" has an 8% success rate, and a 28% chance of false positives.

wizardbeard@lemmy.dbzer0.com · edit-2 27 days ago

ChatGPT o3 found a Linux Kernel vulnerability. "The future" has an 8% success rate, and a 28% chance of false positives.

scruiser@awful.systems · 27 days ago

Of course, part of that wiring will be figuring out how to deal with the the signal to noise ratio of ~1:50 in this case, but that’s something we are already making progress at.

This line annoys me… LLMs excel at making signal-shaped noise, so separating out an absurd number of false positives (and investigating false negatives further) is very difficult. It probably requires that you have some sort of actually reliable verifier, and if you have that, why bother with LLMs in the first place instead of just using that verifier directly?

killingspark@feddit.org · 27 days ago

Trying to take anything positive from this:

Maybe someone with the skills of verifying a flagged code path now doesn’t have to roam the codebase for candidates? So while they still do the tedious work of verifying, the mundane task of finding candidates is now automatic?

Not sure if this is a real world usecase…

scruiser@awful.systems · 27 days ago

As the other comments have pointed out, an automated search for this category of bugs (done without LLMs) would do the same job much faster, with much less computational resources, without any bullshit or hallucinations in the way. The LLM isn’t actually a value add compared to existing tools.

DickFiasco@lemm.ee · 27 days ago

Additionally, we already have tools like Valgrind that would have uncovered the use-after-free bug.

diz@awful.systems · 24 days ago

Can’t be assed to read the bs but sometimes the use after free only happens in some rarely executed code path, or only when one branch is executed then later another branch. So you still may need fuzzing to trigger use after free for Valgrind to detect.

Sailor Sega Saturn@awful.systems · edit-2 27 days ago

LLMs: now as effective as enumerating use-after-frees as grep "free" source.cc.

flaviat@awful.systems · 27 days ago

Yet another LLM guy claiming it solved a problem when in fact it was already solved, with it being told almost exactly where and what to look for. Cold reading for use-after-frees.

David Gerard@awful.systems · 27 days ago

He did fuzzing but boiling more oceans to find a vuln he’d already found

diz@awful.systems · edit-2 25 days ago

I swear I’m gonna plug an LLM into a rather traditional solver I’m writing. I may tuck deep into the paper a point how it’s quite slow to use an LLM to mutate solutions in a genetic algorithm or a swarm solver. And in any case non LLM would be default.

Normally I wouldn’t sink that low but I got mouths to feed, and frankly, fuck it, they can persist in this madness for much longer than I can stay solvent.

This is as if there was a mass delusion that a pseudorandom number generator can serve as an oracle, predicting the future. Doing any kind of Monte Carlo simulation of something like weather in that world would of course confirm all the dumb shit.

wizardbeard@lemmy.dbzer0.com · 25 days ago

While I dislike further proliferation of the AI idiocy, I have mouths to feed too, and I’ve definitely seen that strategy work. Good luck and god speed.

My workplace has started using multiple new software/systems over the past few years that advertised heavily on their AI features automating away a bunch of the grunt work of running say, a NOC monitoring solution.

Then we got hands on with the thing and learned that the automation was all various “normal” algorithms and automation, and there was like two optional features where you could have an AI analyze data and try to analyze trends instead of using the actual statiscal algorithms it would use by default. Even the sales people running our interactive demos steered us clear of the AI stuff. We cared that it made things easier for us, not the specifics of how, so it was all roses.

ChatGPT o3 found a Linux Kernel vulnerability. "The future" has an 8% success rate, and a 28% chance of false positives.

ChatGPT o3 found a Linux Kernel vulnerability. "The future" has an 8% success rate, and a 28% chance of false positives.

How I used o3 to find CVE-2025-37899, a remote zeroday vulnerability in the Linux kernel’s SMB implementation