Downtime Over
Hello, Everyone.
I'd like to talk about what happened this week, what the current state of recovery is, and what you can do to help. If you don't want to read, the bottom line is: we need your help! Our webserver has gone through a bit of shock. The dirty secret is that we have always kept the donation goal significantly less than our actual costs. In the past, I felt like we should get by on what we can, without asking more of people. Perhaps that was the right attitude a few years ago, but now the community has really grown. That's awesome! But it means we have to be more proactive and responsible about our infrastructure. So, if you want to help, please donate! We need to upgrade our hardware, backup capabilities, and more. I'll be talking more over the next few weeks as we bring things online and start on longer-term improvements. What Happened Early Wednesday morning, all AlliedModders Websites became very slow. We'd come to recognize this as an intermittent problem, usually causing site errors, and always characterized by extremely high disk I/O wait times. What we didn't realize is that our primary hard drive had been failing, and on Wednesday it failed completely. We did not have RAID, so it quickly became a worst-case situation. We had partial backups, but I didn't know what was included. The backup system wouldn't let me see without having a working operating system. So I decided the best decision was to keep the server offline, and try to copy as much data as I could before the drive completely failed. But, the drive quickly degraded so much that I decided it was best not to attempt anything further. Meanwhile, the communication channel with our provider wasn't good. I now know how to deal with this better in the future, but suffice to say we wasted a lot of time. I didn't want to replace the drive without first securing physical ownership of the old one, in order to send it to a recovery service. We got that negotiated on Thursday night. Then we had the drive replaced and an identical one added for RAID-1. Very, very early Friday morning, I reinstalled the operating system and restored our partial backups. Recovery The damage report is pretty good. Our partial backups had enough to restore:
This list isn't comprehensive. Our partial backups don't have anything that could otherwise be easily recovered, so a lot of our infrastructure may simply be broken. Files might be missing, pages might not work, services might be down, etc. I will try to list those in a second post, and cross them off as they come back online. Why didn't you do X, Y, Z, etc? I've gotten a lot of suggestions, rants, complaints from people about various things over the past few days. Why didn't we have RAID? Why didn't we do complete backups? Why don't we switch hosting? Some of it has been really helpful. I especially owe MatthiasVance, asherkin, devicenull and others in #smdevs and #sourcemod for their advice. It's important to put this site into perspective. It started out of my first college dorm room. It was a computer sitting next to my desktop, made from scrap parts. When it broke, we had our first donations drive to buy a new server. In 2005, we started renting a dedicated server. There was no way I could afford it as a college student, and we worked out a deal with SteamFriends (then, GameConnect) to be sponsored. That ended in 2006. We've always ran things on a tight budget, and our whole motif is kind of, "We're scrappy, but we get things done!" We didn't have any backups at all until 2008. Off-site backup charges by the GB, so I was pretty selective in choosing what to backup. We didn't have a drive fail until 2010. But it's clear we as a community have grown really big, and that's awesome. We almost always meet the donation goal, which is a spectular testament to how much people care about the project. It sucks when things like this happen. So immediately, here's what I'm doing:
Thanks for your patience and support. I'll answer questions in this thread, or e-mail if you're more comfortable through that. |
Re: Downtime
Not yet functional:
|
Re: Downtime Over
I would like to be the first to congratulate you on dealing with this calmly (stressful event is stressful :P) and getting stuff up and running again as quickly as possible.
Nice Work! |
Re: Downtime Over
Nicely Done :)
|
Re: Downtime Over
Thanks for your hard work, BAILOPAN. I will be gladly to donate.
|
Re: Downtime Over
What happened to the stuff the backups didn't include ?
Rautamiekka File Server is glad to help by storing any data. |
Re: Downtime Over
good to see you back :)
|
Re: Downtime Over
good to see you back :)
|
Re: Downtime Over
Good to see the websites are back online!
Thanks everyone helping for their hard work :) |
Re: Downtime Over
Bail,
Let me know if you guys need any help with hardware. I sometimes have access to older servers that our company throws away... -Mal. |
All times are GMT -4. The time now is 23:24. |
Powered by vBulletin®
Copyright ©2000 - 2024, vBulletin Solutions, Inc.