But as you can see in the paper I linked, ELIZA passes the Turing test in their experiment about 20% of the time (that is to say, it doesn’t pass; passing is 50% in this test) whereas the best LLMs pass about 70% of the time (that is to say, they are significantly more convincing at being human than real humans).
That 20% figure is just a clear indication how shit people are at conducting such a test, and that was basically my original point. 2 in 10 times people were convinced by a particularly echoey room.
If a person murders people only two days out of 10, they’re a murderer, in order to not be a murderer they need to never do that.
Reliably correct is when you’re correct always. Demonstrably incorrect is when you’re incorrect even sometimes.
OK, sounds like we broadly agree then.
But as you can see in the paper I linked, ELIZA passes the Turing test in their experiment about 20% of the time (that is to say, it doesn’t pass; passing is 50% in this test) whereas the best LLMs pass about 70% of the time (that is to say, they are significantly more convincing at being human than real humans).
That 20% figure is just a clear indication how shit people are at conducting such a test, and that was basically my original point. 2 in 10 times people were convinced by a particularly echoey room.
If an LLM is correct 2 in 10 times, would you call it “reliably correct”?
If a person murders people only two days out of 10, they’re a murderer, in order to not be a murderer they need to never do that.
Reliably correct is when you’re correct always. Demonstrably incorrect is when you’re incorrect even sometimes.