Image of Free Extract: Data - Ones, Zeros, Bits, and Binary

ADVERTISEMENT

Table of Contents

This post is a free extract from Chapter 1 of the Coding Essentials Guidebook for Developers. If you get value out of it, support us by purchasing it here. Subscribe with your email address if you'd like to stay up to date with new content from Initial Commit.

Data: Ones, Zeros, Bits, and Binary

In the previous sections, we talked about storing data on hard drives and in RAM. But what exactly do we mean by the word "data"? At a high level, we think of things like text documents, images, videos, emails, files, and folders. These are all high-level data structures that we create and save on our computers every day. But underneath the hood, a computer chip (like a CPU or RAM chip) has no idea what an "image" or a "video" is. From a chip's perspective, all of these structures are stored as long sequences of ones and zeros. These ones and zeros are called bits. Bits are commonly stored in a set of eight at a time, known as a byte. A byte is simply a sequence of eight bits, such as 00000001, 01100110, or 00001111. Representing information in this way is called a binary representation. A binary system is one that defines all information using only two values, in this case, a one and a zero. Binary is also called a base-2 numeral system. By comparison, our standard numeral system is called decimal or base-10. The decimal system represents all numbers using ten digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. The best way to see how decimal (base-10) and binary (base-2) stack up is to literally look at the numbers.

Here is how the numbers 0-20 are written in binary:

Comparison of Decimal and Binary

| ----------- | ------------------- |
| Decimal     | Binary              |
| ----------- | ------------------- |
| 0           | 0                   |
| 1           | 1                   |
| 2           | 10                  |
| 3           | 11                  |
| 4           | 100                 |
| 5           | 101                 |
| 6           | 110                 |
| 7           | 111                 |
| 8           | 1000                |
| 9           | 1001                |
| 10          | 1010                |
| 11          | 1011                |
| 12          | 1100                |
| 13          | 1101                |
| 14          | 1110                |
| 15          | 1111                |
| 16          | 10000               |
| 17          | 10001               |
| 18          | 10010               |
| 19          | 10011               |
| 20          | 10100               |
| ----------- | ------------------- |

Can you see how this works? It's just like counting with regular decimal numbers, but we only have two digits to work with. As a result, we need to reset and create new digit columns much quicker than we are used to in the decimal system. To appreciate this, consider the decimal number 16, which is 10000 in binary. In binary, it needs a whopping five digits to write, whereas in decimal it only requires two!

Can you see how this works? It's just like counting with regular decimal numbers, but we only have two digits to work with. As a result, we need to reset and create new digit columns much quicker than we are used to in the decimal system. To appreciate this, consider the decimal number 16, which is 10000 in binary. In binary, it needs a whopping five digits to write, whereas in decimal it only requires two!

To further clarify how computers store data in binary, let's consider a few examples:

Storing numbers

Assume we have a calculator program open with the number 12 typed in it. At this point, the calculator and its content are stored in RAM since the program is open and active. Nothing is stored on the hard drive since we haven't saved anything yet. We'll assume for this example that the calculator has a "save" function.

We see a 12 on our screen since we typed it into the calculator, but behind the scenes, the computer interprets the number 12 as binary. As we see from the table above, the number 12 is written as 1100 in binary. This binary value 1100 is stored in RAM. If the file is saved using our calculator's save option, the binary content gets stored on the hard drive. So now it is stored both in RAM and the hard drive. If the calculator program is closed, its content (including the value 12) will eventually be erased or overwritten in RAM. But since our saved file still exists on the hard drive, it can be retrieved later. In a nutshell, numbers can be directly converted to their binary form and written, either to RAM or a hard drive.

Storing characters

Now what if we have a text editor open with letters, punctuation, white space, and symbols typed in it? Since RAM and hard drives can only store ones and zeroes, how can each of these text characters be stored? The key is to use a standardized system that represents each text character with a unique numeric value. Systems that do this are called encodings. One of the most commonly used character encoding system is called ASCII (American Standard Code for Information Interchange). ASCII provides a set of numeric values representing 128 different characters. These characters include the digits 0-9, lowercase letters, uppercase letters, common punctuation, and common symbols. ASCII assigns each of these characters with a unique numeric value, which the computer can interpret as binary.

NOTE: Technically the ASCII set also includes a set of control codes like tab, line feed, carriage return, and other legacy ancestors of the typewriter/teletype.

Here are a few examples of some characters and their ASCII codes, shown both in binary and decimal formats:

ASCII Character Example

| ----------- | ------------------- | ------------------- |
| Character   | ASCII Code (Binary) | ASCII Code (Decimal)|
| ----------- | ------------------- | ------------------- |
| !           | 00100001            | 33                  |
| $           | 00100100            | 36                  |
| +           | 00101011            | 43                  |
| 1           | 00110001            | 49                  |
| 2           | 00110010            | 50                  |
| 3           | 00110011            | 51                  |
| A           | 01000001            | 65                  |
| B           | 01000010            | 66                  |
| C           | 01000011            | 67                  |
| a           | 01100001            | 97                  |
| b           | 01100010            | 98                  |
| c           | 01100011            | 99                  |
| ----------- | ------------------- | ------------------- |

From the computer's perspective, no characters exist at all. Text files are just long sequences of these binary codes. When converted to their character images and displayed on a screen, these values are meaningful to us humans, but to a computer, it is all just ones and zeroes.

Storing arbitrary file content

As a final example, let's touch on how arbitrary data files like images and videos are stored. Images and videos are a bit more complex than numbers and characters since more steps are required to break them down into the ones and zeroes that the computer understands. An image is essentially a grid of pixels, each with a specific color and transparency value. A grid can be represented numerically with X and Y axes. Colors and transparencies can also be represented numerically using a color model like RGBA (Red-Green-Blue-Alpha). The RGBA color model breaks down colors into their red, green, and blue components, plus an alpha value that specifies the transparency. The red, green, and blue components can each be represented by a decimal number between 0 and 255. The alpha or transparency value is specified as a decimal number between 0 and 1. The closer the alpha value is to 0, the higher the transparency. The four values are typically lumped together as a tuple, which is a fancy way to say they are placed inside parenthesis and separated by commas, as follows:

RGBA Color Example

| ---------------------- | ---------------------- |
| RGBA Tuple             | Color Description      |
| ---------------------- | ---------------------- |
| rgba(255, 0, 0, 1)     | Pure red, fully opaque |
| rgba(0, 255, 0, 0.75)  | Pure green, 75% opaque |
| rgba(0, 0, 255, 0.25)  | Pure blue, 25% opaque  |
| rgba(0, 0, 0, 1)       | Black, fully opaque    |
| rgba(255, 255, 255, 1) | White, fully opaque    |
| rgba(233, 71, 12, 0.5) | Orange, 50% opaque     |
| ---------------------- | ---------------------- |

Now that we have a scheme to represent colors with numbers, all that's left to do is to convert the numbers into binary and they can be stored in RAM and on hard drives. There are many formats for storing images, like PNG (Portable Network Graphic), JPEG (Joint Photographic Experts Group), and BMP (Bitmap). Each of these defines a set of rules for storing image data. Videos are simply collections of images played rapidly in sequence. By ordering a set of images using a sequential time variable, video data can also be stored as binary in many formats like MP4, MOV, and AVI.

In general, any arbitrary information that can be converted into a binary format can be stored on a computer. The key is devising systems that allow the information to be converted back and forth between binary and a human-readable form.

In this chapter, we identified some of the important parts of a computer and discussed how data is represented in computing terms. We will build on this in the next chapter by exploring how this foundation supports computer programming.


To continue reading, purchase the Coding Essentials Guidebook for Developers here.

Final Notes