Last week was another one of those weeks where my server decided not to cooperate. On Wednesday morning, I copied 7 GB of photos to my server so my wife could tag them and upload some of them (not 7 GB, of course) around 8:15 am. Every 2 hours, my server automatically backs up to a secondary hard drive. Apparently this process (using rsync) combined with other stuff going on (that’s my current theory) caused my server to have a fit and crashed around 10:30 am. Then the process kept continuing as it kept trying to copy the files every 2 hours. Then at around 1 am, my server backs up everything to tar/gzipped files which caused it to crash again due to the load getting way too high combined with higher external temperatures. I finally figured out what was going on around 2:15 am when I was up with the little tike and got my server stablized by excluded the photos directory. Then on Friday when I was doing backups, there was some corruption on my backup drive which caused the CPU load to spike to 14 (normal is about 0.25 or less) and prevented me from unmounting the drive. A quick reboot got me up and running again. I reformatted the backup drive, did backups to it and everything has been running smoothly. During this whole fiasco, I increased the case fan speed to the max and brought the temperature down a bit.
Phew, everything is working again with very little downtime. I still refuse to co-locate my server somewhere as I like to have complete control over things and am extremely paranoid about backups (see my post on what I actually do.
Soon I should probably replace the main drives in my machine as they’re over a year and a half old. While that may not sound old, they’ve been running 24/7 since I installed them. Since the drives are in a RAID 1 configuration, replacing them should just be a matter of shutting down, yanking one drive, putting in a new drive, formatting it, let the RAID rebuild and then repeating the process. Anything I can do to prevent another crisis is well worth it.