On Packing Data

[updated 16 April 2010]

The number of bits needed to store an integer, a, a > 0, is:



Using this formula, 13 bits are needed for the value 7333. In binary, 7333 is 1110010100101, which is indeed 13 bits.

The number of bits needed to store two values,
a and b, independently would be:



Using the identity:



The number of bits needed to store the product of
a and b is:



Therefore, we can save one bit by multiplying
a and b when



As a trivial example, the values 5 (101) and 6 (110) each require three bits so storing them independently would take 6 bits. However, 6*5 is 30, which takes 5 bits.

Just for curiousity, this graph shows the fractional part of



using the equation



It's an interesting "sawtooth" pattern, with zero values at 2
n. The larger version of the graph also shows lines of constant slope between the endpoints of each tooth (in blue), and the point where the tooth equals 0.5 (√2*2n - in red).

Click on the graph for a larger version.



Let's consider a practical application. The start and end of Daylight Savings Time can be specified as "the nth day of the week in month at hour and minute". For example, in 2010, DST in the US begins on the second Sunday in March at 2AM. Let:

d = day of week, Sunday..Saturday = 0..6
r = day rank, first..fifth = 0..4
m = month, January..December = 0..11
h = hour, 0..59
μ = minute, 0..23

Naively, this can be stored in 3+3+4+6+5 = 21 bits.

When multiplying 6 * 5 to save one bit, one of the numbers is needed in order to determine the other. Since we don't a priori know what values were multiplied together, we can't unpack the result. So while the possibility of saving a bit by multiplication made for an interesting analysis, it doesn't help here. There's no point in storing the product of n numbers if you have to know n-1 numbers to recover the nth number.

If we're going to conserve space, it will have to be through application of the first equation. Given two integers,
a and b, a requires fewer bits than b when:



So we want to minimize the maximum value of the number being stored.

For the packing algorithm, construct a set, ω, of an upper limit for each item.

Then the largest number that can be encoded is:



To pack the values vi which correspond to the upper limit values ωi, evaluate:



To unpack
rn into values vi which correspond to the upper limit values ωi, evaluate:



Consider the decimal number 12,345. For base 10 encoding,
ωi would be (10, 10, 10, 10, 10), and the largest number that could be encoded would be 99,999. But suppose we knew that the largest possible value of the first digit was one. Then let ωi be (2, 10, 10, 10, 10). The largest value that could be encoded would be 19,999. Since 19,999 < 99,999, it requires fewer bits.

For the daylight savings time values, construct a set, ω, of the maximum value of each item + 1. For (d, r, m, h, μ) this would be (7, 5, 12, 60, 24).

Packing the start times would requlre 20 bits:



Packing both the start and end times would use 39 bits, for a savings of 3 bits.



Thanks to Lee for his comments on this article.
blog comments powered by Disqus