Fair enough, though another possibility I see is that the automated training process for LLMs used OCR for those papers (Or an already existing text version in the internet was using bad OCR) and those papers with the mashed word were written partially or fully by an LLM.
Either way, the blanket term “AI” sucks and it’s honestly getting kind of annoying. Same with how much LLMs are used.
With Hetzner you can get “Storage Boxes” and mount those. They are pretty cheap for 1tb and 5tb. From my experience they are fast enough for media streaming for multiple people.