This blog post has been reported on and distorted by a lot of tech news sites using it to wax delusional about AI’s future role in vulnerability detection.

But they all gloss over the critical bit: in fairly ideal circumstances where the AI was being directed to the vuln, it had only an 8% success rate, and a whopping 28% false positive rate!

  • scruiser@awful.systems
    link
    fedilink
    English
    arrow-up
    42
    ·
    3 days ago

    Of course, part of that wiring will be figuring out how to deal with the the signal to noise ratio of ~1:50 in this case, but that’s something we are already making progress at.

    This line annoys me… LLMs excel at making signal-shaped noise, so separating out an absurd number of false positives (and investigating false negatives further) is very difficult. It probably requires that you have some sort of actually reliable verifier, and if you have that, why bother with LLMs in the first place instead of just using that verifier directly?

    • killingspark@feddit.org
      link
      fedilink
      English
      arrow-up
      14
      ·
      3 days ago

      Trying to take anything positive from this:

      Maybe someone with the skills of verifying a flagged code path now doesn’t have to roam the codebase for candidates? So while they still do the tedious work of verifying, the mundane task of finding candidates is now automatic?

      Not sure if this is a real world usecase…

      • scruiser@awful.systems
        link
        fedilink
        English
        arrow-up
        11
        ·
        2 days ago

        As the other comments have pointed out, an automated search for this category of bugs (done without LLMs) would do the same job much faster, with much less computational resources, without any bullshit or hallucinations in the way. The LLM isn’t actually a value add compared to existing tools.

    • diz@awful.systems
      link
      fedilink
      English
      arrow-up
      8
      ·
      edit-2
      22 hours ago

      I swear I’m gonna plug an LLM into a rather traditional solver I’m writing. I may tuck deep into the paper a point how it’s quite slow to use an LLM to mutate solutions in a genetic algorithm or a swarm solver. And in any case non LLM would be default.

      Normally I wouldn’t sink that low but I got mouths to feed, and frankly, fuck it, they can persist in this madness for much longer than I can stay solvent.

      This is as if there was a mass delusion that a pseudorandom number generator can serve as an oracle, predicting the future. Doing any kind of Monte Carlo simulation of something like weather in that world would of course confirm all the dumb shit.

      • wizardbeard@lemmy.dbzer0.comOP
        link
        fedilink
        English
        arrow-up
        5
        ·
        16 hours ago

        While I dislike further proliferation of the AI idiocy, I have mouths to feed too, and I’ve definitely seen that strategy work. Good luck and god speed.

        My workplace has started using multiple new software/systems over the past few years that advertised heavily on their AI features automating away a bunch of the grunt work of running say, a NOC monitoring solution.

        Then we got hands on with the thing and learned that the automation was all various “normal” algorithms and automation, and there was like two optional features where you could have an AI analyze data and try to analyze trends instead of using the actual statiscal algorithms it would use by default. Even the sales people running our interactive demos steered us clear of the AI stuff. We cared that it made things easier for us, not the specifics of how, so it was all roses.

  • DickFiasco@lemm.ee
    link
    fedilink
    English
    arrow-up
    29
    ·
    3 days ago

    Additionally, we already have tools like Valgrind that would have uncovered the use-after-free bug.