• Goun@lemmy.ml
    link
    fedilink
    arrow-up
    1
    ·
    10 days ago

    Can someone explain what this actually is? It’s a python script that generates… screenshots? I don’t get it

    • Sphks@lemmy.dbzer0.com
      link
      fedilink
      arrow-up
      6
      ·
      edit-2
      10 days ago

      It is based on image generators. Like Dall-e and others (more precisely videos generators like Sora). Ai based image generator take an input, like random noise, and try to fill the gaps according to one direction (usually a text like “a cat playing saxophone”). The AI have been taught what cats look like, what saxophones looks like, and what playing saxophone looks like.

      Here, the AI has been taught what Minecraft first person view looks like. With hours and hours of videos of someone playing, maybe bots.

      Now, if you type the forward arrow, let’s zoom the picture by spreading the pixels from the center of the screen. There is blank between these pixels. Get the AI fill the blank from what it thinks Minecraft should look like. Repeat for each frame and you can go forward. Do similar things for the other commands (turn left, jump…). This way you can explore the world infinitely and the AI invents the world in real time.

      I have not looked at the details, but I think that the issue is that there is no memory of the world other than what you see on the screen. If you look at the left you see something, you look at the right, then look at the left again, you see a different world. Edit. Yeah that’s an issue shown in the article.