It's the driver, not the car!

Sometimes, a project comes with the opportunity to make new choices for platform, technology, hardware, software, etc. And I'll be honest; I feel like a kid in a toy-store. Although, no points for guessing which "aisle" any software engineer heads to in this store.

Performance Benchmarks vs Real Life

Benchmarking websites are such a favourite "hangout" place of us engineers. The lure of the benchmarking sites is like the smell of popcorn in a movie theatre; you're not hungry, yet you'll buy some. Those charts, tables and numbers bring an inexplicable satisfaction to the inner data-science nerd in us. And don't we all love these charts and numbers and tables. After all, it is real authentic data. No bias, no prejudice, no opinions ... just an untamed data set ... pure information without the human manipulation.

But believing in the benchmarking sites blindly is like believing in the mileage numbers published by car manufacturers. Does anyone achieve those numbers in day-to-day driving? There's an asterisk at the end of the brochure, which we choose to ignore: *UNDER TEST CONDITIONS.

When I (used to) drive my car from home to office, I would hit several red lights on the way. Bad roads, especially pot holes after the rains, would always slow me down. Sometimes, there's a lot of traffic; just too many cars on the road. I also have to look out for pedestrians crossing the roads; and allow ambulances and fire engines to get ahead. Servers in real life are also like that. As many threads try to execute their jobs, my processes will be put to the side momentarily. Sometimes, my memory will be swapped out to the disk. Sometimes, several threads will want to access the disk at the same time. Sometimes, other processes like firewall, anti-virus and backup will be given higher priority.

It's the driver, not the car!

When I get pulled over for speeding, it is not the car maker who gets fined for making a fast car. As much as we'd like to believe in tools and platforms to solve the problems of performance and scale, it is rather the patterns and practices that will either make or break the solution. It is possible to use the latest tools, platforms, hardware and everything, and still mess up real bad in the implementation. Writing bad code with anti-patterns is no rocket science. Getting the concurrency configuration incorrect with respect to the underlying hardware resources is always a possibility. Designing a bad data structure will make the execution inefficient. And these problems can't be solved by tools; they can only be solved by habits.

If you are comparing the cost of leading JSON parsers to choose the library for your project, it is one of the two things:

  • Either you have optimised the performance of all other components/modules in the system to such an extent that the parsing of content/payload is your biggest worry. And the evidence to that could be that the cost of parsing is more than 33% of the total job cost.
    OR

  • Most probably, you are being penny-wise-pound-foolish. The real problem is in some other module, and you are losing pounds by the hour, but you're thinking that saving a few pennies in an unrelated space will solve the problem. We end up thinking like this because this is an easier problem to solve.

I'm not saying that tools are irrelevant. But buying a Tesla doesn't make me a better driver. I don't like to fall in the trap of thinking that I will not face problems if I choose a specific set of tools and platforms. More often than not, I've tackled performance and scale by writing better code.

But the story doesn't end there. After doing the best with what we had, when we decided to raise the bar further, we were in a completely different thought process of problem solving and engineering. And the smell of that, my friends, is like fine wine!

Next up: The only metric important for Scalability.