• scruiser@awful.systems
    link
    fedilink
    English
    arrow-up
    8
    ·
    15 hours ago

    Unlike with coding, there are no simple “tests” to try out whether an AI’s answer is correct or not.

    So for most actual practical software development, writing tests is in fact an entire job in and of itself and its a tricky one because covering even a fraction of the use cases and complexity the software will actually face when deployed is really hard. So simply letting the LLMs brute force trial-and-error their code through a bunch of tests won’t actually get you good working code.

    AlphaEvolve kind of did this, but it was testing very specific, well defined, well constrained algorithms that could have very specific evaluation written for them and it was using an evolutionary algorithm to guide the trial and error process. They don’t say exactly in their paper, but that probably meant generating code hundreds or thousands or even tens of thousands of times to generate relatively short sections of code.

    I’ve noticed a trend where people assume other fields have problems LLMs can handle, but the actually competent experts in that field know why LLMs fail at key pieces.

    • HedyL@awful.systems
      link
      fedilink
      English
      arrow-up
      2
      ·
      14 hours ago

      I’ve noticed a trend where people assume other fields have problems LLMs can handle, but the actually competent experts in that field know why LLMs fail at key pieces.

      I am fully aware of this. However, in my experience, it is sometimes the IT departments themselves that push these chatbots onto others in the most aggressive way. I don’t know whether they found them to be useful for their own purposes (and therefore assume this must apply to everyone else as well) or whether they are just pushing LLMs because this is what management expects them to do.