Become a better programmer in 30 minutes

Programmers nowadays should all know these 3 very important rules:

  1. Don’t use floating points for money
  2. Store date/time in “UTC” timezone
  3. Always use the “UTF8” character set

Tom Scott brilliantly explains the above 3 topics in videos that roughly last 10 minutes each. He explains them clearly, with depth and in a passionate way. They are listed below. Start the video’s by clicking on them.

1. Don’t use floating points for money

…never ever…

utf8_video

2. Store date/time in “UTC” timezone

…yes, always…

timezone_video

3. Always use the “UTF8” character set

…no exceptions…

floating_points_video

IMHO every programmer should watch these videos, because they are very educational. And “Yes”, it will takes you 30 valuable minutes of your precious time. But “No”, you will not regret it, because they teach you some valuable and elementary lessons. Lessons that apply on any software project, written in any programming language.

Thank you Tom Scott, you rock! Keep doing videos on your YouTube channel!

Share

Redis sorted set stores score as a floating point number

Today I was playing a little with our statistics. I was writing some “Go” (golang) that was requesting a top 20 customers from Redis using a “Sorted Set” and the “ZREVRANGEBYSCORE” command. Then I found out that the score of the sorted set was actually stored as a double precision floating point number. Normally, I would not be bothered about this and use the float storage for integer values.

But this time I wanted to make a top 20 of the real-time monthly data traffic of our entire CDN platform. A little background: Hits on the Nginx edges are measures in bytes and the logs are streamed to our statistics cluster. Therefor the real-time statistics counters for the traffic are in bytes. Normally we use 64 bit integers (in worst case they are signed and you lose 1 bit).

2^64 = 9,223,372,036,854,775,807
      EB, PB, TB, GB, MB, kB,  b

If you Google for: “9,223,372,036,854,775,807 bytes per month in gigabit per second” you will find that this is about 26 Tbps on average. We do not have such big customers yet, so that will do for now. So an “int64” will do, but how about the double precision float? Since it has a floating point, theory says it can not reliably count when numbers become too large. But how large is too large? I quickly implemented a small script in golang to find out:

package main

import (
	"fmt"
)

func main() {
	bits := 1
	float := float64(1)
	for float+1 != float {
		float *= 2
		bits++
	}
	fmt.Printf("%.0f = %d bits\n", float, bits)
}

Every step the script doubles the number and tries to add 1 until it fails. This is the output showing when the counting goes wrong:

9007199254740992 = 54 bits

So from 54 bits the counting is no longer precise (to the byte). What does that mean for our CDN statistics? Let’s do the same calculation we did before:

2^54 = 9,007,199,254,740,992
      PB, TB, GB, MB, kB,  b

If you Google for: “9,007,199,254,740,992 bytes per month in gigabits per second” you will find that this is about 25 Gbps on a monthly average. We definitely have customers that do much more than that.

I quickly calculated that the deviation would be less than 0.000000000000001%. But then I realized I was wrong: At 26 Tbps average the deviation might as well be as big as 1 kB (10 bits). Imagine that the customer is mainly serving images and JavaScript from the CDN and has an average file size of 10 kB. In this case the statistics will be off by 10% during the last days of the month!

Okay, this may be the worst case scenario, but still I would not sleep well ignoring it. I feel that when it comes to CDN statistics, accuracy is very important. You are dealing with large numbers and lots of calculation and as you see this may have some unexpected side effects. That is why these kind of seemingly small things keep me busy.

Share

The PHP floating point precision is wrong by default

Let me show you that PHP is bad at Math:

<?php
echo "0.1 + 0.2 = ". ( 0.1 + 0.2 ) ."\n";
$true = 0.1 + 0.2 == 0.3 ? "Equal" : "Not equal";
echo "0.1 + 0.2 = 0.3 => $true\n";

Output:

0.1 + 0.2 = 0.3
0.1 + 0.2 = 0.3 => Not equal

Now, to analyze what is happening here let’s first look at JavaScript:

>>> 0.1 + 0.2
0.30000000000000004
>>> 0.1 + 0.2 == 0.3
false

Okay, so JavaScript is also bad at Math, but at least it prints numbers (better) as it internally represents them, making debugging easier. We see that the answer is slightly off and that is why the test is returning false. Before explaining why there are errors, let me give you some more examples of the problem:

0.1 * .1 results in 0.010000000000000002
0.7 + .1 results in 0.7999999999999999
1.1 + .1 results in 1.2000000000000002

As you can see the problem also exists when multiplying. This is because the problem does not lie in the operation, but in the way computers internally store numbers that contain a decimal point. The internal representation is called a “floating point “. This floating point representation has accuracy problems as we have shown above, but it doesn’t only apply to PHP floating points. The reason we have these accuracy problems is described in the floating point guide:

Because internally, computers use a format (binary floating-point) that cannot accurately represent a number like 0.1, 0.2 or 0.3 at all.

When the code is compiled or interpreted, your “0.1” is already rounded to the nearest number in that format, which results in a small rounding error even before the calculation happens.  — floating point guide

The floating point guide also explains clearly that:

…binary fractions are different from decimal fractions in what numbers they can accurately represent with a given number of digits, and thus also in what numbers result in rounding errors:

Specifically, binary can only represent those numbers as a finite fraction where the denominator is a power of 2. Unfortunately, this does not include most of the numbers that can be represented as finite fraction in base 10, like 0.1.

Fraction Base Positional Notation Rounded to 4 digits Rounded value as fraction Rounding error
1/10 10 0.1 0.1 1/10 0
1/3 10 0.3 0.3333 3333/10000 1/30000
1/2 2 0.1 0.1 1/2 0
1/10 2 0.00011 0.0001 1/16 3/80

And this is how you already get a rounding error when you just write down a number like 0.1 and run it through your interpreter or compiler. It’s not as big as 3/80 and may be invisible because computers cut off after 23 or 52 binary digits rather than 4. But the error is there and will cause problems eventually if you just ignore it. — floating point guide

Now let’s go back to the PHP floating point calculation and evaluate this code:

<?php
ini_set('precision', 17);
echo "0.1 + 0.2 = ". ( 0.1 + 0.2 ) ."\n";
$true = 0.1 + 0.2 == 0.3 ? "Equal" : "Not equal";
echo "0.1 + 0.2 = 0.3 => $true\n";

Output:

0.1 + 0.2 = 0.30000000000000004
0.1 * 0.2 = 0.3 => Not equal

That is more like it. It does not solve the problem, but makes it easier to understand. Note that we have set the “precision” of the representation of floating point numbers to 17 with the “ini_set” PHP command. Gustavo Lopes explains on the php-internals mailing list why other values (like 100) do not make sense:

Given that the implicit precision of a (normal) IEEE 754 double precision number is slightly less than 16 digits [2], this is a serious overkill. Put another way, while the mantissa is composed of 52 bits plus 1 implicit bit, 100 decimal digits can carry up to 100*log2(10) =~ 332 bits of information, around 6 times more.

Given this, I propose changing the default precision to 17 (while the precision is slightly less than 16, a 17th digit is necessary because the first decimal digit carries little information when it is low). — source

So for now, let’s change the precision to from 14 to 17 in “/etc/php5/apache2/php.ini” on our servers and save ourselves some headaches when we are using PHP floating points.

; The number of significant digits displayed in floating point numbers.
; http://php.net/precision
precision = 17

If you want more background on this topic read the excellent article “What Every Computer Scientist Should Know About Floating-Point Arithmetic“.

Share