So, I was having a phone conversation with my boss yesterday. The topic was a clustered filesystem that can hold huge amounts of data, and how we would best allow local users to access this data without yhe storage cluster becoming overloaded (because of various reasons, the I/O is relatively slow. This was built for quantity, not speed).
Rights now there’s an SMB share, and we’re looking at replacing that so that we can have better control over the data throughput. My suggestion is to simply spin up an FTP server.
Then my boss asks: “I’m just curious, but would rsync or NFS work as a protocol instead?”
Well, it’s a valid question, so the only thing I could do was reply with the honest answer as to why I chose FTP. Paraphrased and translated:
“Because some 20 years ago my then username carried a lot of recognition in certain communities revolving around software and media distribution, whose rights holders would not necessarily approve of said distribution. We used FTP, because when you’re on an ADSL from 2002, you want to have as much fine control as you can to make sure your internet connection doesn’t get flooded with requests. One connection at a time, and only one file at a time, which would be ideal in our particular case.”
The response I got was a chuckle and that he couldn’t think of a better endorsement of FTP as a preferred transfer protocol.
So there you have it - My career revolves a lot of skills that I picked up whole sailing the high seas. And coincidentally, my career now also involves literally sailing the high seas as these storage clusters are used on survey ships.
Was IPFS considered? I’ve tried it myself but it seems like an unstable product and I’m not sure if it’s living up to its promise…
Unusable in our case
Ceph?
No. Beegfs.
Do check seaweedfs too! Haven’t tried it (yet) but their ‘erasure coding’ reads as super sophisticated to me ;)
I wonder how it compares to beegfs
Lol @ “some 20 years ago … ADSL from 2002”. Thanks for making me feel old!
Horses were domesticated some 6000 years ago. I feel so old!
Using FTP (I assume you mean SFTP) will buy you some performance, as would other protocols that are faster and requiring less compute than SMB.
I predict whatever solution you use will only buy you time. Usage is bound to increase so you’ll still hit the performance limits for the hardware platform at some point, unless you can constrain the simultaneous connections. File sizes will impact scalability a lot as well.
You can’t guess this one. You need to test.
tl;dr - I suspect you can’t win.