Friday, October 12, 2007

Little endian vs Big endian

Endianness in simple words is ordering of bytes in memory to represent some data. Computer memory in general is visualized as a sequence of bytes. But, in medium or high level languages like C we work with data types with size more than a byte.

Consider a 32 bit integer (in hex): 0xabcdef12
It consists of 4 bytes: ab, cd, ef, and 12. Hence this integer will occupy 4 bytes in memory. Say we store it at memory address starting 1000. There are 24 different orderings possible to store these 4 bytes in 4 locations (1000 - 1003). 2 among these 24 possibilities are very popular. These are called as little endian and big endian.

  • Little endian - Stores the Least significant byte at the lowest address. Example: Intel Pentium Processors.
  • Big endian - Stores the Most (Big) significant byte at the lowest address. Example: Sun/SPARC, IBM/RISC 6000.

On little endian system, memory will be like:



AddressValue
100012
1001ef
1002cd
1003ab


On Big endian system, memory will be like:



AddressValue
1000ab
1001cd
1002ef
100312

Now, the good news is that usually we don't need to care about endianness. It's taken care by Hardware Platforms, and Compilers. But, in some scenarios we need to care about endianness. A common scenario is when the data need to be exchanged between different systems. In such a situation, a standard layout is specified. Example: network protocols like TCP use Network Byte Order which is big endian. Thus the writers have to ensure that the data they write is in standardized order.

Before learning how to write/read in different orders lets play around and take a simple problem to determine endian of a given system.

There are 2 easy ways.

1. Write a known integer value in a binary file and view the file contents using a hex utility.

This method is more visual but is easy to implement and easy to detect layouts other than little/big endian. (Yes, there are other endians too like mixed endian).

Lets dump a 64 bit integer to a file:


#include <stdio.h>
#include <inttypes.h>

int
main ()
{
uint64_t integerForTesting = UINT64_C (0xabcdef1234567890);
FILE *fp = NULL;
fp = fopen ("dump.bin", "wb");
if (fp)
{
printf ("Writing 64 bit number = 0x%" PRIx64 " to dump.bin as it is\n", integerForTesting);
fwrite (integerForTesting, sizeof (uint64_t), 1, fp);
fclose (fp);
printf ("Done writing\n");
return 0;
}
printf ("Not written\n");
return 0;
}


Now, dump the contents of binary file (dump.bin) using command line hex utility like od.
#od -t x1 --width=8 dump.bin


If it dumps something like
0000000 90 78 56 34 12 ef cd ab
0000010

then the system is little endian.

If it dumps something like
0000000 ab cd ef 12 34 56 78 90
0000010

then the system is big endian.

2. Programmatically by looking at the value of byte stored at the starting address of a known number.

This method is limited in scope as it looks at only the first byte and only on the basis of that distinguishes between big and little endian. It won't detect other endian systems.


#include <stdio.h>
#include <inttypes.h>

int
main ()
{
uint64_t integerForTesting = UINT64_C (0xabcdef1234567890);
unsigned char *ch = (unsigned char *) &integerForTesting;
FILE *fp = NULL;
if (*ch == 0x90)
{
printf ("this is little endian system\n");
}
else
{
printf ("this is big endian system\n");
}
return 0;
}

2 comments:

Rahul Upakare said...

Good! a ready made answer to general interview question. ;)

BTW, you need correction on code snippet 1's line number 13.

Anonymous said...

Commander Karla here:

You dont need the FILE * fp code.

Otherwise very sound!