First look at OCR

A recent comment on my blog struck a nerve where the commenter said that OCR would basically put a competitor to ReceiptWallet above it. While I still don’t believe that OCR is all that useful for receipts (if there is one mistake when you’re generally only entering 3 small pieces of data, you’ve wasted time because you have to review each entry carefully), I took a look at an open source OCR package. While this code is a bit rusty, there has been some recent work on it. My first test was a Rite-Aid receipt where I was looking to see if it could read 3 pieces of data, the merchant name, date, and total. It failed on the merchant name because it was a graphic, however, it picked up the date and total in such a way that I could parse the data and grab what I needed. I then tried 2 other receipts, both from Costco and the results were completely miserable such that I couldn’t get anything from them. I’ll keep plugging away and testing to see if my results are better.

In addition, I put in a request for a quote for a commercial OCR engine. However, I suspect that it will be cost prohibitive. If it costs $5,000-$10,000 upfront plus a per copy licensing fee, I can’t afford that as it would completely wipe out any profit unless I significantly increased the cost of ReceiptWallet.

If anyone has more information on OCR engines for the Mac (commercial or open source), please let me know.

Goldilocks and the 3 keyboards

I somehow managed to spill stuff on my keyboard (not the internal one in my MacBook Pro) and have been looking for a replacement for awhile. I dried it out enough so that it has mostly worked, but I kept missing keys. So I bought an Apple keyboard from CompUSA a few months back (when they still had a San Diego store) and had 2 problems with it; first off the unit was defective as not all the keys worked and second, it was too small for the keyboard tray that I have (it has spring loaded holders on the side). I returned it and ignored the problem for awhile. On Friday I went to my most hated store, Fry’s, and got another keyboard. It looked OK, but when I got it home, I discovered that the command key was too small. My choice of wired keyboards is quite small for the Mac; I want wired so that I don’t need drivers and it works with my KVM switch. I also want a Mac keyboard as the Windows keyboards that are Mac compatible have the command and option keys switched unless you install the drivers.

Back to Fry’s today and I picked up a macally keyboard. It seems to work well, doesn’t require drivers (they have drivers, but I have no idea what they’re for) and fits my keyboard tray. The only problem is that I paid $10 over retail. It’s not worth my time to return it to Fry’s and then order it online (or maybe it is as it is $45 from Amazon with Free shipping).

Running my own server vs hosting elsewhere

With all my problems with my server, you’d think I’d give up and just host my server somewhere else and make it someone else’s problem. After a little thinking, I’ve come up with the pros and cons of running my own server.Pros————–Physical security of serverMusic server – I need a server to run my SqueezeboxesUnlimited bandwidth (at 1 MBps upstream, but unlimited)Full backups – I backup the server to an external drive and take it offsiteLots of storage (currently at 300 GB, but only about 15% used)It’s a learning experience5 static IP addressesFull control over serverAbility to run my own PBX and have good voice qualityCons————Electricity usageI have to keep it runningLimited upstream bandwidthPotentially more costly (it costs about $50 per month for me to host my own server); however for the hardware I have, a hosting service might charge more for RAID 1 and backupsWhile there are more items I’m missing here, it’s pretty clear to me that running my own server is the right way to go. My issues last week appeared to stem from drives that were failing; it just happened that when I was putting in the new drives, the drive I left running was failing causing some issues. I’m now up and running on my 2 new drives and have been for about 5 days now. Lesson for the future…replace drives about once a year if they need it or not as the drives are relatively cheap ($150 for both drives) and run 24/7 which puts a lot of wear and tear on the drives. Now the question is, what do I do with the old drives? Would a reformat on the drives do the trick and keep them running? What less mission critical role can I put them in?

The new world of IMAP

I realize that IMAP isn’t new, but I just started using it yesterday. I’ve resisted for years as I’ve always had one main store for my email. However, lately, I’m checking email using web based email and everything I’ve heard about IMAP make it sound like it is perfect for using in multiple locations. So far it’s working OK, but I’m getting used to it. For instance, if Mail on my Mac filters messages out of the Inbox, they’re pulled off the IMAP server which kind of defeats the purpose of using IMAP. Most of my email is filtered. I created a rule in Mail that moves all messages < 1 day old into my Inbox and made it my first filter. This should leave everything in my inbox, but when I run the filters over my inbox manually later, it should put everything in the right spot.

Hopefully I’ll get used to this soon; I’ve been using POP3 for close to 15 years ever since I started using Eudora in college. It’s hard to change my thinking, but since I started using POP3 over SSL, I could no longer telnet into my server and issue commands (I know the POP3 commands as I’ve written 2 POP3 email clients in my life), there really is no advantage to me using POP3.

Useless application?

I downloaded and installed Little Snitch today to give it a spin as it was part of the bundle of software I bought it (which brings me to the question, why did I buy it?, but that’s another story). The program is well implemented, but just about every application makes outgoing network connections these days, so it is always popping up basically saying “xyz application wants to connect to abc server. Allow?” After click always allow more than a few dozen times, I finally disabled it (for now). I’m sure after I train it, it will be less annoying, but how can I really tell what applications are making legitimate requests and which are not? This seems like the little boy who cried wolf.

Speeding up Ruby

When I setup my store, I sued FastCGI to get acceptable performance which was fine as I don’t get many hits. Today I installed BrowseBack because it came as part of the software bundle I purchase through MacHeist/MacUpdate. As I hit my store a lot to check stats and such, BrowseBack kept loading the store which spiked the server load to 4 and required me to kill BrowseBack to get my server back to normal. While this isn’t normally a problem, it exposed a potential kink in my server’s armor. So I went looking for an alternative. I’ve read about using lighttpd by proxying requests from Apache. Setting it up was straightforward and seemed to work well, until a customer complained that he couldn’t purchase. I tried it myself (again) and it worked fine. I had someone else try again and it failed. Hmmm. After much tinkering, I figured out the problem, my store code required an https connection otherwise it redirected to https which was fine, except that coming from the proxy, it was always an http connection. Since I already do a redirect on my store so that any http://store.receiptwallet.com requests goes to https://store.receiptwallet.com, I can be assured that all requests are secure. So I commented that line out of my store and everything is working fine.

(On a side note, the reason I couldn’t see it myself is that it was checking for local requests and since I access my server using a private IP address range, it basically got flagged as local, so it never required the SSL connection.)

Am I being told something?

I went to install a new hard drive in my server today to pre-emptively avoid a disaster as the drives in my server have been running 24/7 for 1.5 years and the drives are so cheap, that replacing it should save me time later. I thought all I had to do was power down, yank one drive, power up, partition the new drive, and rebuild the RAID. That is great in theory, but my RAID had errors so when I rebooted, it failed to reboot. I ran fsck on it (after failing to read what the screen told me to do for too long) and let it repair lots of little problems. After that, I was able to reboot and rebuild the RAID. In about an hour, my RAID will be rebuilt and I’ll let it sit for a few days before I go ahead and replace the second drive. With all the problems I have with my server, you’d think I’d learn to just use a hosting service and let someone else manage the hardware. Oh well.

An adventure in shopping

Yesterday, my wife and I went shopping for some outdoor umbrellas. Not just any umbrellas, but ones that would extend over our pool (I went swimming on Saturday and my wife and son sat under one of our regular umbrellas on the side of the pool and we thought it would be a good idea to get some shade actually over the pool). We went to Home Expo and found nothing; next stop was Home Depot where we found 2 umbrellas, but no bases for them. Then came Costco, Lowes, and another Home Depot. Nothing. (We always tend to miss the right time in the season to buy.) We looked online and I found several, some with free shipping. Problem was if we didn’t like them, returning them would be problematic due to the weight. I found the same umbrella we saw at Home Depot on their online site for the same price and free shipping. However, online purchases can’t be returned at the stores. Hmmm. The listing has an in store SKU, so I started calling stores. The store we went to said they had 5 available and would visually check the stock. After being transferred around for a bit, I finally spoke to someone who was yelling to someone that had a clue. Turns out they had 2 without bases (I could have told them that). They didn’t know where the other 3 or the bases were.) I’m not sure how you lose bases (there were 2 boxes per umbrella) that weight 114 pounds! I found another store (35 minutes away) that said they actually had 6 and set one aside for me (I asked for two). We drive up there (this was the first adventure with the little on in my car as his seat is on one side in my car and in the center in my wife’s car, so we could fold down one seat) and managed to get 2 umbrellas in my car. Wow, those are heavy. After getting home and having a neighbor help me pull them out of the car, I put them together and am quite pleased.So, it’s a good thing we got them at the “end of the season” otherwise I would have overlooked this model (it was 25% off). Too bad it was such a pain to get what I wanted. Such is life in the big city.

A week of server woes

Last week was another one of those weeks where my server decided not to cooperate. On Wednesday morning, I copied 7 GB of photos to my server so my wife could tag them and upload some of them (not 7 GB, of course) around 8:15 am. Every 2 hours, my server automatically backs up to a secondary hard drive. Apparently this process (using rsync) combined with other stuff going on (that’s my current theory) caused my server to have a fit and crashed around 10:30 am. Then the process kept continuing as it kept trying to copy the files every 2 hours. Then at around 1 am, my server backs up everything to tar/gzipped files which caused it to crash again due to the load getting way too high combined with higher external temperatures. I finally figured out what was going on around 2:15 am when I was up with the little tike and got my server stablized by excluded the photos directory. Then on Friday when I was doing backups, there was some corruption on my backup drive which caused the CPU load to spike to 14 (normal is about 0.25 or less) and prevented me from unmounting the drive. A quick reboot got me up and running again. I reformatted the backup drive, did backups to it and everything has been running smoothly. During this whole fiasco, I increased the case fan speed to the max and brought the temperature down a bit.

Phew, everything is working again with very little downtime. I still refuse to co-locate my server somewhere as I like to have complete control over things and am extremely paranoid about backups (see my post on what I actually do.

Soon I should probably replace the main drives in my machine as they’re over a year and a half old. While that may not sound old, they’ve been running 24/7 since I installed them. Since the drives are in a RAID 1 configuration, replacing them should just be a matter of shutting down, yanking one drive, putting in a new drive, formatting it, let the RAID rebuild and then repeating the process. Anything I can do to prevent another crisis is well worth it.