A Most Hilariously Bootleg Optimization.

- -

I've been working on a project with my dear friends at VSA for the past month or so. It's finally online now, which is awesome. Along the way, I came up with a hilariously obvious bug, then an hilariously bootleg fix, followed by an hilariously bootleg optimization to the fix.

The application is an ATM locator for the STAR network, written using the Google Maps API like no other. I was storing the ATM locations, as latitudes and longitudes, in a MySQL database, so of course I indexed on those columns. This led to a pretty hilarious bug where no matter where when the map was zoomed out to any reasonable level, the only markers that would show up were the ones at the very bottom and left of the screen. This, obviously, was because those columns were indexed; MySQL was thinking of that table as being ordered by those keys already.

The hilariously bootleg fix was to stash an "ORDER BY RAND()" into the query, knowing full well this was epically wrong. I knew this, and made a mental note to fix it later. That mental note was promptly forgotten when a thousand other things went wrong with the site and the data set was small enough that the absurd fix wasn't causing a major performance hit. I added another sub-query to filter only ATMs ±1.5° of latitude or longitude of the origin, and that sped things up quite a bit, but was still kinda slow (~1.5 seconds) in highly populated areas like Chicago, where 1.5° of latitude or longitude might include several tens of thousands of ATMs. To make matters worse, MySQL was using Filesort to do this, which is really just adding shame to misery. I needed to fix the hilariously bootleg fix with another hilariously bootleg fix, which was shockingly effective.

The hilariously bootleg fix for the hilariously bootleg fix was to add another column, `ordering`, which was filled with random numbers. I realized that ordering randomly worked as a fix to allow the table to be indexed by latitude and longitude, but also to appear to the user in random order so that they see an even distribution of ATMs on-screen. The ordering column was also indexed so that ordering by it could be done faster, and just like that, I have the vague illusion of clustering in google maps, without all the work.

It sounds really stupid, but I was able to get the behaviour that I wanted and for those keeping score out there the fix for the fix was 2,400 times faster than how it was before. The lesson to be learned here: don't prematurely optimize. It'll make you look like so much more of a miracle worker if you just write clear, easy-to-read code at first then get creative in a no-holds-barred optimization session later, speeding things up noticeably. The project was a pain in my ass at times but turned out really well, and for my trouble I learned an awful lot more than I really cared to about MySQL query optimization, but that's alright.

Remember, space cadets, we pronounce it "my-squirrel" now.