Computers can be little or big endian. Wondering what endianness is all about, where it originated from and its impacts on code performance? I was too.
A first look at endianness
Let’s take the integer
168496141, as an example.
In little-endian, the least significant bytes come first, which gives us:
address | byte value -------------------- 0x00 | 0x0D -> least significant 0x01 | 0x0C 0x02 | 0x0B 0x03 | 0x0A
In big-endian, the most significant bytes come first, which gives us:
address | byte value -------------------- 0x00 | 0x0A -> most significant 0x01 | 0x0B 0x02 | 0x0C 0x03 | 0x0D
But why bother having two different ways to save integers in memory?
The question “What are the pros and cons of little-endian versus big-endian?” on Quora seems to indicate that neither one is better than the other.
But, x86-64 is a very popular CPU architecture and it is little endian. Same with ARM CPUs, even recent ones: they use little-endian encoding. So with these CPU architectures, networked programs need to reorder bytes on a regular basis to accommodate to the network byte-order: big-endian.
Given all of this byte shuffling, would network performance decrease? Probably. But not so much.
I just checked the Intel 64 instruction set and it has a
BSWAP instruction that is specially made to convert integers from big-endian to little-endian and vice-versa:
This instruction is used by the operating system kernels. For instance, on my Arch Linux installation,
BSWAP appears in the
I trust that Intel probably optimized these functions as much as possible. So I suppose the performance impact of calling
BSWAP is small. In addition, there are several cases where swapping byte order can be completely avoided:
- big-endian machines already use network byte order internally, so they don’t need to swap byte order before sending data off to the network
- little-endian machines that talk to each other over TCP do not need to swap bytes since the bytes are read in-order on the receiving end
The only case where endianness should really cause slow down is if a big-endian machine talked with a little-endian machine.
Math and typecasts
However, Akash Sharma makes a good point for little-endian.
He explains how little-endian makes common maths operations easier:
Consider an example where you want to find whether a number is even or odd. Now that requires testing the least significant bit. If it is 0 it is even. In Little Endian Ordering least significant byte is stored at starting address. So just retrieve the byte and look at its last bit. — Akash Sharma
And how little-endian makes typecasts to smaller types easier:
For example you want to typecast a 4 bytes data type to 2 bytes data type. In Little Endian Ordering it is pretty straightforward as you just need to retrieve the first 2 bytes from starting address in order and you would get the correct number. — Akash Sharma
Consistency and the coin toss
The paper “On holy wars and a plea for peace” by Danny Cohen also gives some interesting background behind the decision to use big-endian for network byte order. This paper is referenced in rfc1700 which defines network byte-order, which is why it makes sense to talk about it here.
Cohen’s paper points to the inconsistency of some little-endian systems back in the eighties, when the paper was written:
Most computers were designed by Big-Endians, who under the threat of criminal prosecution pretended to be Little-Endians, rather than seeking exile in Blefuscu. They did it by using the B0-to-B31 convention of the Little-Endians, while keeping the Big-Endians’ conventions for bytes and words. — Danny Cohen
This sounds like a joke, but Cohen goes on to describe how the M68000 microprocessor was little-endian for bits but big-endian for words (bytes), double-words and quad-words.
And it points to the better consistency of some big-endian systems:
The PDP10 and the 360, for example, were designed by Big-Endians: their bit order, byte-order, word-order and page-order are the same. The same order also applies to long (multi-word) character strings and to multiple precision numbers. — Danny Cohen
Finally, it ends by saying that making a decision and having all computer scientists agree on it will be extraordinarily difficult. For this reason, it suggests:
How about tossing a coin ??? — Danny Cohen
Almost 40 years later, the network folks have settled for big-endian and the majority-share CPUs (x86, arm) are little-endian. In software development though, the choice is not clear. Luckily, we probably will never have to worry about bit-endianness, only byte-endianness. Partial win, I guess.