The Mothership called

No, I’m not talking about aliens, I’m talking about Apple Computer. I got a cold call from a researcher at Apple who saw my resumé on Linked In. (I’ve thought that Linked In was kind of a joke, but now I might reconsider.) Apparently my resumé looked interesting for a couple of positions. Unfortunately the big sticking point is my unwillingness to leave San Diego and move to Cupertino. I know my in-laws would love for us to move, but it just isn’t happening. I got a follow up call from an Apple recruiter as they’re looking for people that do human interface. Now, if I was willing to move, I could possibly get a job there. The recruiter will pass my name along to the group that does contract hiring, so who knows, maybe I can become a contractor.

Missing Sync for Palm OS 5 Shipped!

For the last 6 or 7 months, I’ve been working on Missing Sync for Palm OS. We have finally shipped and it is a big relief. I acted not only as the lead engineer, but as product manager due to others at Mark/Space being quite busy. Needless to say, at times, I’ve been overwhelmed taking on two major roles. I think that this product has killer features. It has much better syncing support for Calendar, Contacts, and Tasks, as well as the features I worked on: overhauled UI, auto sync vs sharing detection, iTunes conduit, folder sync conduit, iPhoto to handheld, as well as a number of bug fixes. If you’re a Macintosh and Palm OS user, I highly recommend you purchase or upgrade! If you’re on the fence about upgrading or purchasing, trust me, it is well worth the money as it is leaps and bounds better than Palm’s HotSync Manager, Apple’s iSync Palm Conduit and even Missing Sync for Palm OS 4.

Backup scheme

I think that I finally have my backup scheme worked out. My server meltdown had me re-think the strategy. However, my meltdown, while time consuming to get everything working again, really only cost me one day’s worth of data that I was able to restore from another machine. My backups sort of worked, except for the monolithic “dump” archive that I created. This archive got corrupted and caused the restore to fail. Luckily I had created tar/gzipped backups and had the files.

So my new backup strategy is quite involved. First off, I have 2 drives doing RAID 1 which will protect against hard drive failure (I hope). Next, I have a third drive in my server that every hour does an rsync of the main drive. Next, I every night, a nightly rsync is done of the main drive to the third drive. Next, I have a TrayDock that I do an rsync to every few days. Then I take the dock to my safe deposit box. I have a second tray for it and rotate those backups. I also do nightly tar/gzipped backups of important stuff to the third drive which then get copied to the TrayDocks. Lastly, I periodically copy the tar/gzipped archives to a TrayDock attached to my PowerBook (I have 2 trays for my PowerBook).

I think that I should be covered in terms of backups. While there is still the potential for downtime, I’ll sleep better knowing that I can restore any one file or files without having to rely on a massive dump file still working. Yes, I may be paranoid, but my livelihood depends on data on my server.

RAM

So I went back to The Chip Merchant to exchange my RAM. The guy said that it was unlikely that both RAM modules were bad and went off to talk to his tech.He came back and said that my report of bad RAM wasn’t the first and there appears to be an issue with the combination of RAM, motherboard, and processor they sold me, so they gave me a different type of RAM and told me that the DDR 400 could only operate at 333, but the BIOS’s setting of Auto should detect it. I popped the RAM in, set the BIOS to a RAM frequency of 333 and will cross my fingers. It only took me a few days of futzing around to come to the same conclusion.

Stable server?

Now that my server is rebuilt, my problem is that it keeps crashing kernel panicking and I saw segmentation faults all over the place. All roads point to hardware problems. So how do i solve this? Well, first off, my old memory modules work in the new machine. I installed one of them (512 MB) and the machine seemed to stay up all night with one exception. I noticed that it had rebooted at 5:32 am. In all the other crashing, it never once rebooted. That got me thinking that the UPS I plugged the machine into (an old one) wasn’t powerful enough and a surge that put the system on battery failed to move it to battery and the server restarted. At least, that’s what I hope happened. So I got to thinking, how could 2 brand new memory modules fail. I remembered that when I was handed the memory, they were in adjoining pouches. I checked the serial numbers and they were 12 apart meaning that they most likely came from the same batch and if a batch was bad, both modules could be bad. So this evening I used a program called Memtest86 which supposedly thoroughly tests RAM. I popped in each new RAM modules one at a time and after less than a minute, each module showed thousands of errors. Then I put both in and after 20 minutes I saw 500+ errors; I’m not sure why the results were different with 1 vs. 2, but it convinced me that there was a real problem. I then tested my 2 old memory modules (slower, but the same capacity) and after an hour, they showed no errors.

Now I’m running the server with the old RAM and will see what happens. On Monday, I’ll go back to The Chip Merchant and get the RAM replaced.

I wish all this just worked and I didn’t have to futz with it.

Server Recovery

This sure has been a nightmare to get my server running again adequately. I got almost everything working yesterday and today I tackled converting to software RAID1 so that I have a mirror. With most Linux tasks, there is some help on the web. A co-worker pointed me to a document for “crazy sysadmins”. I didn’t think that applied to me, until I re-read it several times and realized that it is almost what I need. I followed the directions and was stoked that things were going smoothly. Then came the hard part, rebooting. I always have problems with grub, fstab, etc. After much Google searching and futzing, I figured out the solution…I had to rebuild the ram disk image that got loaded so that it knows to boot off the RAID. This normally wouldn’t be necessary, but the default Fedora Core 3 install used an LVM volume and the old initrd file was based on that. So, I figured out that:

mkinitrd -v –preload=raid1 –fstab=/mnt/newroot/etc/fstab initrd-2.6.12-1.1378_FC3.img 2.6.12-1.1378_FC3

worked. It’s hard to tell from the documentation what is going on, but if you don’t specify the fstab file, it uses the current active one which happens to have the LVM mess in it.Just to make sure I didn’t screw anything up, I removed the original drive and setup a clean drive as the second drive for the RAID (I bought 4 drives with the idea that 2 were for the RAID and 2 were hot swappable spares).In about 40 minutes when the drives finish mirroring, I’ll restart the server and see what happens.I’m now convinced more than ever that sysadmins (at least those that run Linux/UNIX machines) don’t make enough money. It is extremely frustrating to have a server crash and then to have trouble restoring it. I also forgot to mention that one of the times I was restarting the server, it tripped my UPS and somehow killed the UPS. The UPS definitely has enough capacity for the server, but something went haywire and I have to get the UPS replaced. A new one will be here in 5-7 business days. I do have a spare, but it’s significantly smaller.

Server crashed again

This time, my backup was corrupted and the server seemed hosed, so I got a new one and started rebuilding from backups. Unfortunately the backup appears to be corrupt (I think it was the drive as I restored parts later from another backup from last week and the files came across fine). I still have a long way to go, but mail and web are back up. I hate computers.

Server crashed

When I went to check my email this morning, it failed which was odd. When I started investigating, my server said that the file system was mounted read-only. I lugged a monitor and keyboard into the other room and started taking a look at the server. I was unable to repair the file system, so I went down the path of reformatting the primary drive and restoring from the secondary drive. I’ve never restored the drive before, so I had to use my Mac to do a ‘man restore’ to figure out what I needed to do. The good news is that the corruption appears to have started around 6 am and my backup is done daily at 4 am. At most, I should only lose a few hours worth of stuff. However, this kind of file corruption is worrisome and I’ll have to keep an eye on things. Nothing like having to spend hours fixing a server on a Monday morning.

Server Downtime this morning

My server was offline for a few hours this morning due to no fault of my own. My cable provider had some network issues that finally got resolved by around 8 am. I called them (Time Warner Cable) around 6:15 am when I woke up and discovered the issue. When I spoke to the help desk, the tech (national help desk) was pretty useless. He ran the standard diagnostics, but luckily didn’t ask me to reboot my computer as I know that would have been a waste of time. I specifically asked if their were network problems and he said he didn’t know of any. I received a call around 10:15 am from a tech in the local office and he said that they resolved the problem around 9:45 am. I appreciate getting calls back from them when I have issues, however, when I call and speak to the national help desk, they’re not always helpful. Overall, I’m pleased with Time Warner as a provider…they just need to give their techs updates on network issues.

User Interface

Yesterday I had the privilege (or was it?) of having Missing Sync for Palm OS reviewed by a user interface guru at Apple. He had some good ideas on how we can improve the user interface that I think will be quite beneficial to the product. It is quite intimidating having the product I’ve been working on for more than a year and a half critiqued and ripped apart. After thinking about some of the things said, I have to respectfully disagree with some of the comments made as there isn’t one type of user interface for all applications and even Apple’s own applications seem to contradict many of the things the guru said.

I really appreciated the time that was spent reviewing the app; however, it was only slightly more enjoyable than going to the dentist.