Statement From thphd
After communicating with our engineers on site, I can confirm that our server has experienced a major disk failure, which unfortunately looks quite like the one happened earlier to pincong.rocks/xsden.
Sign of failure: ~1 minute database timeout every half an hour, and as time goes on this duration gets shorter and shorter. The exact same problem happenend to pincong.rocks a few days before their catastrophic data loss. Programs like 2047 are write heavy by nature(several hundreds of GBs per day, mostly due to the hackernews ranking algorithm), and can decrease the life of a specific portion of the disk. optimizations have been added to mitigate this problem, expect more to come.
We believe having the harddrive working at elevated temperatures for an extended period of time, while rewriting a small section on the surface, have caused the surface to overheat and lost its magnetic properties(or stuff like that). After some tests we found the harddrive works just fine, can read/write at full speed, except a small section in the middle where 2047's virtual harddisk file resides. hardware checksum errors have prevented any software from reading that section.
Other than that, we're using a WD drive, and the mean life expectancy of a commercial harddrive is 3-5 years, longer for HITACHI, shorter for WD. This was a strategic mistake (we should have placed the VM on our HITACHI drives).
It will take about 48 hours to bring 2047 back online, since we have a lot of hardware replacement to be done, and a lot of checks to perform.
regarding data, this is infact the forth time we experience a catastrophic failure, the first time we lost about 10 days of data, the second and the third time we recovered without a scratch from our 10-minute backups.
10-minute backups are GREAT. but the script that was supposed to continuously copy the 10-minute backups to another safer storage stopped working some time ago, and as a result, we now only have data before 2021-09-04 23:57:10. that's about 11 days behind.
That's entirely my fault and I'm extremely sorry to have everyone of you bear the consequences. in the future we will make our backup process more robust to more kinds of failures.
if you registered your account within the 11 day period, you will not be able to log in. just register again. if you want your account back, we will restore it for you on a case by case basis.
threads and posts during this 11-day period will be restored from 3rd party sources. we've done that before, and we'll do it again.
if you saw a backup of your thread and post online(google cache / archive.org), save it as html (choose 'only html'). that will be of great help later, in case we missed it during the process.
best wishes,
thphd
reach me via matrix: @thphd:matrix.org
2021 09 16
北京市五道口计算机技术有限公司
磁盘已炸,数据已丢,各位勿忧