Why speed is more important than scalability

Software developers creating web applications like to talk about scalability as if it is totally unrelated to computing efficiency. They also like to argue that abstractions are important. They will tell you that their DBAL, Router, View Egine, Dependency Injection and even ORM do not have that much overhead (only a little). They will argue that the learning curve pays off and that the performance loss is not that bad. In this post I’ll argue that page loading speed is more important than scalability (or pretty abstractions).

Orders of magnitude of speed on the web

Just to get an idea of speed, I tried to search for a web action for every order of magnitude:

  • 0.1 ms – A simple database lookup from memory
  • 1 ms – Serving small static content from RAM
  • 10 ms – An very simple API that only does a DB lookup
  • 100 ms – A complex page load that calls multiple APIs
  • 1000 ms – Nothing should take this long… 🙂

But I’m sure that your website has pages that take a full second or more (even this site has). How can that be?

CPUs and web farms

Most severs have 1 to 4 CPUs. Each CPU has 2 to 32 cores. If you do a single web request, then you are using (at most) a single core of a single CPU. Maybe that is why people say that page loading speed is irrelevant. If you have more visitors, then they will use other cores or even other CPUs. This is true, but what if you have more concurrent requests than visitors? You can simply add machines and configure a web farm, as most people do.

At some point you may have 16 servers running your popular web application with page loads that average at 300 ms. If you could bring down the page load time to 20 ms, you could run this on a single box! Imagine how much simpler that is! The “one big box” strategy is also called the “big iron” strategy. Often programmers are not very careful with resources. This is because a software developers tend to aim for beautiful abstractions and not for fast software.

Programming languages matter

Only when hardware enthusiasts and software developers work together you may get efficient software. Nice frameworks may have to be removed. Toys like Dependency Injection that mainly bring theoretical value may have to be sacrificed. Also languages need to be chosen for execution speed. Languages that typically score good are: C, C# (Mono), Go and even Java (OpenJDK). Languages that typically score very bad: PHP, Python, Ruby and Perl (source: benchmarksgame). This is understandable, as these are all interpreted languages. But it should be considered when building a web application.

Stop using web application frameworks

But even if you use an interpreted language (which may be 10x slower), then you can still have good performance. Unless of course you build applications consisting of tens of thousands of lines of code that all need to be loaded. Normally people would not do that as it would take too much time to write. (warning: sarcasm) In order to be able to fail – against all odds – people have created frameworks. These are tens of thousands of lines of code that you can install on your server. With it your small and lean application will still be slow (/sarcasm). I think it is fair to say that frameworks will make your web application typically 10-100x slower.

Now let that be exactly the approach that is seen in the industry as “best practice”. Unbelievable right?

5 reasons your application is slow

If you are on shared hardware (VM or shared webhosting), then you need to fix that first. I’m sure switching to dedicated hardware will give you a better (and more consistent) performance pattern. Still, each of the following problems may give you an order of a magnitude of speed decrease:

  1. Not enough RAM
  2. No SSD disks (or wrong controllers)
  3. Using an interpreted programming language
  4. A bloated web framework
  5. Not using Memcache where possible

How many of these apply to you? Let me know in the comments.

Finally

This post will probably be considered offending by programmers that like VMs, frameworks and interpreted languages. Please, don’t use that negative energy in the comments. Use it to “Go” and try Gorilla, I dare you! It is not a framework and it is fast, very fast! It will not only going to be interesting and a lot of fun, it may also change your mind about this article.

NB: Go is almost as fast as the highly optimized C code of Nginx (about 10-100x faster than PHP).

Share

MindaPHP: a new PHP framework optimized for learning

When people talk about Web Application Frameworks (WAF), they often refer to web frameworks with a model–view–controller (MVC) architecture. MVC is a software architecture pattern that separates the representation of information from the user’s interaction with it. Most popular frameworks actually follow the model–view–adapter (MVA) that decouples the model and the view as described below:

Traditional MVC arranges model (e.g., data structures and storage), view (e.g., user interface), and controller (e.g., business logic) in a triangle, with model, view, and controller as vertices, so that some information flows between the model and views outside of the controller’s direct control. The model–view–adapter solves this rather differently than the model–view–controller does by arranging model, adapter or mediating controller, and view linearly without any connections whatsoever directly between model and view. — Wikipedia

More and more frameworks consist of a set of components (e.g. Zend). This is why people start to talk about “full-stack” vs. “glue” frameworks. A “glue” framework allows the programmer to create a tailor-made framework by gluing the needed components together. Full-stack frameworks, on the other hand, do not require you to do this.

Others talk about the difference between “push-based” vs. “pull-based” frameworks. This difference essentially is whether the framework pushes data towards the view or pulls the data in from the view. Most frameworks use the “push” approach.

Separation of concerns

What everybody seems to agree is that we need some form of “separation of concerns” or a n-tier architectural model. This means that we “divide the system cleanly into three tiers: the presentation tier, the business-logic tier, and the data-access or resource tier”, like MVC or MVA does.

What many framework architects do not seem to optimize for are these three important things:

  1. Cost of learning – maximize documentation reuse & minimize innovation
  2. Cost of scaling – maximize compatibility & minimize lines of code executed
  3. Cost of defects – maximize best practices & minimize complexity

They seem obsessed with optimizing separation of concerns a.k.a. reducing the “Cost of spaghetti”. In their efforts they create hard to grasp concepts, like “Dependency Injection” and “Aspect-oriented programming“. Do not get me wrong: I am not saying that these methods do not help you to fight the cross-cutting concern, but IMHO the complexity problems they cause outweigh their benefits of keeping things organized.

MindaPHP to the rescue

So, it may be clear that I believe that simple is better. With that “vision” I wrote MindaPHP. Whether you like it or not you may decide for yourself, but it certainly is easy to learn and about 10-20 times faster than CakePHP or Symfony, while providing the same abstraction layers to keep things organized.

MindaPHP aims to be a full-stack framework that is:

  1. Easy to learn
  2. Secure by design
  3. Light-weight

By design, it does:

  1. … have one variable scope for all layers.
  2. … require you to write SQL queries (no ORM).
  3. … use PHP as a templating language.

Mainly to make it easy to learn for PHP developers. Check it out!

Code: https://github.com/mevdschee/MindaPHP
Demo: http://maurits.server.nlware.com/

Share