Thread: Downtime Over
View Single Post
BAILOPAN
Join Date: Jan 2004
Old 02-09-2011 , 02:08   Re: Downtime Over
#195

vaan123, thanks for your support!

An update on the recovery status. The lab called me a day ago to say that they're still working on it, and they're hoping for "this week" (again), and that they might have to settle for a partial recovery. At this point, I'm not going to speculate or plan anything until we know what we have back, and it looks like that's going to take at least a few more days. If we go much longer without the bug tracker, I will throw up a new one.

In the meantime, we have made excellent progress on the 500 errors. The problem is MySQL, but we don't quite understand why yet. It looks like the forum tables become locked for about two minutes, during which MySQL stops handling queries, which bubbles up to PHP, mod_fcgid, then Apache, causing the site to error. It happens about three to five times at night, and two to three times during the day (CST). During those two minutes, MySQL is reading massive amounts of data off the drive (driving utilization up to 100%).

We don't understand why this is happening yet, but we're close. Hopefully I'll be able to say more soon. At least at this point, we can definitively say that replacing any piece of our infrastructure would not have solved, nor will it solve the problem.
__________________
egg

Last edited by BAILOPAN; 02-09-2011 at 02:11.
BAILOPAN is offline