Hashtables vs. Arrays

by kirupa | 10 December 2011

Have questions? Discuss this HTML5 / JavaScript tutorial with others on the forums.

Two data structures that you probably use frequently are hashtables and arrays. On the surface, they both allow you to perform common tasks such as adding data, accessing data, and removing data. To the untrained eye, they practically look like the same thing. What sets them apart is how they work behind the scenes, and this difference has a significant impact in how and when you would use them.

By the end of this tutorial, you will have learned enough about how hashtables and arrays work to make an informed decision on which data structure to use. Let's get started!

The Data

To better help explain the differences between arrays and hashtables, I am going to use some sample data from Wikipedia on best-selling artists and how many records they've sold:

*Artist*	*Certified Sales (millions)*
The Beatles	248.3
Elvis Presley	201.2
Michael Jackson	155.8
Madonna	157.8
Elton John	153
Led Zeppelin	133.4
Queen	88.2
ABBA	54.4
Mariah Carey	118.2
Celine Dion	111.4
Pink Floyd	110.1
AC/DC	99.3
The Rolling Stones	89.5
Bee Gees	63.2

What we want to do is store these values and then retrieve the number of albums sold for any given artist. For example, if I wanted to know the sales figures for the Rolling Stones, I should be able to get 89.5 from my application very quickly.

In the following sections, you'll see how to do that using both arrays and hashtables.

Examining Arrays

Let's start with our friendly array. If I wanted to store my data in an array, the JavaScript will look as follows:

<script>
var myData = [];

myData.push(["The Beatles", "248.3"]);
myData.push(["Elvis Presley", "201.2"]);
myData.push(["Michael Jackson", "155.8"]);
myData.push(["Madonna", "157.8"]);
myData.push(["Elton John", "153"]);
myData.push(["Led Zeppelin", "133.4"]);
myData.push(["Queen", "88.2"]);
myData.push(["ABBA", "54.4"]);
myData.push(["Mariah Carey", "118.2"]);
myData.push(["Celine Dion", "111.4"]);
myData.push(["Pink Floyd", "110.1"]);
myData.push(["AC/DC", "99.3"]);
myData.push(["The Rolling Stones", "89.5"]);
myData.push(["Bee Gees", "63.2"]);

</script>

I am creating a new array called myData. Initially, this array is empty, but I painstakingly add each artist and their sales figures. Notice that the artist information that I push into the myData array is wrapped in an array of its own as shown by the brackets. After all of this code has run, you have an array that contains many other arrays.

Populating our array is only one part of what we are trying to do. What we really want is to be able to easily find the sales figures for a particular artist. Remembering that myData's items are arrays themselves, the code for retrieving the sales figure given an artist is:

<script>
var myData = [];

myData.push(["The Beatles", "248.3"]);
myData.push(["Elvis Presley", "201.2"]);
myData.push(["Michael Jackson", "155.8"]);
myData.push(["Madonna", "157.8"]);
myData.push(["Elton John", "153"]);
myData.push(["Led Zeppelin", "133.4"]);
myData.push(["Queen", "88.2"]);
myData.push(["ABBA", "54.4"]);
myData.push(["Mariah Carey", "118.2"]);
myData.push(["Celine Dion", "111.4"]);
myData.push(["Pink Floyd", "110.1"]);
myData.push(["AC/DC", "99.3"]);
myData.push(["The Rolling Stones", "89.5"]);
myData.push(["Bee Gees", "63.2"]);

var artist = "The Rolling Stones";

for (var i = 0; i < myData.length; i++) {
}

</script>

Notice that I create a loop to go through each item in our array and check if the first value in the item's array (myData[i][0]) is the same as the artist we are looking for. Once the item and the artist match, we retrieve the sales figure and call it a day.

To look at this example more generically, let's we have an array with n items with a value somewhere that we are interested in:

If you had no idea where the value is and wanted to find it, you'll need to start at the beginning of your array and then move through each item and check if its contents match the value you are looking for.

If your value is at the beginning of the array, you won't have too many items to go through resulting in you finding the value quickly. If your value is at the end of the array, you'll have to go through a lot of items and take more time finding the value.

On average, we say that the amount of time it takes to find a value in an array is proportional to the number of items in your array. The more items, the longer it will take to find an item.

Examining Hashtables

Now that you have an idea of how arrays search for data, let's see how a hashtable would handle a similar scenario.

The code for adding data using a hashtable looks as follows:

<script>
var myData = new Object();

myData["The Beatles"] ="248.3";
myData["Elvis Presley"] ="201.2";
myData["Michael Jackson"] ="155.8";
myData["Madonna"] ="157.8";
myData["Elton John"] ="153";
myData["Led Zeppelin"] ="133.4";
myData["Queen"] ="88.2";
myData["ABBA"] ="54.4";
myData["Mariah Carey"] ="118.2";
myData["Celine Dion"] ="111.4";
myData["Pink Floyd"] ="110.1";
myData["AC/DC"] ="99.3";
myData["The Rolling Stones"] ="89.5";
myData["Bee Gees"] ="63.2";

</script>

To explain the code a bit, I first create a new Object called myData. Inside myData, I declare properties/keys named after our artists with the values set to the artists' sales figures.

To retrieve a particular sales record, all you do is pass the property (the artist name) to the myData object:

<script>
var myData = new Object();

myData["The Beatles"] ="248.3";
myData["Elvis Presley"] ="201.2";
myData["Michael Jackson"] ="155.8";
myData["Madonna"] ="157.8";
myData["Elton John"] ="153";
myData["Led Zeppelin"] ="133.4";
myData["Queen"] ="88.2";
myData["ABBA"] ="54.4";
myData["Mariah Carey"] ="118.2";
myData["Celine Dion"] ="111.4";
myData["Pink Floyd"] ="110.1";
myData["AC/DC"] ="99.3";
myData["The Rolling Stones"] ="89.5";
myData["Bee Gees"] ="63.2";

alert(myData["ABBA"]);

</script>

That's all you have to do. When you call myData["ABBA"], your browser evaluates that to 54.4. That's all there is to it. What you write is clean and simple, but what goes on behind the scenes is more complex.

Hashtables work by finding the exact location your value is stored and returning that value instantaneously or nearly instantaneously. You may be wondering how that is done.

Let's first look at a simplified view of how a hashtable works:

When you add data to a hashtable, you specify both a key and a value to go with it:

myData["Pink Floyd"] ="110.1";

In the above case, the key is "Pink Floyd", and the value is 110.1.

When your hashtable sees this key, it passes it through what sounds like a painful process called hashing, and the output of this hashing function is a number indicating where in the hashtable our value needs to go.

If you want to retrieve the value from the hashtable, all you have to do is provide the same key. When your hashtable sees this key, the hashing function returns the same number of the location inside the hashtable containing your value. Once you have this number, the hashtable has no difficulty retrieving the exact value.

Unlike with the array, notice that you did not have to scan through your hashtable to find the value. Because the hashing function returns the same index number for the same key, the hashtable knows exactly where to go to retrieve the value.

Conclusion

The following table summarizes the memory and performance characteristics of hashtables and arrays:

*Task*	*Average Complexity*
Memory (hashtable)	O(n) - Linear
Memory (array)	O(n) - Linear
Inserting (array)	O(1) - Constant
Search (array)	O(n) - Linear
Delete (array) - middle	O(n) - Linear
Delete (array) - beginning/end	O(1) - Constant
Inserting (hashtable)	O(1) - Constant
Search (hashtable)	O(1) - Constant
Delete (hashtable	O(1) - Constant

From a performance point of view, the hashtable is generally a clear winner. It takes constant time to find the destination for a key and to place a value there. Retrieving a value is the same story. The hash function ensures that for the same key the same location of the bucket is returned.

Arrays do not work with that same level of efficiency. If you know the index position of where your item is, accessing the item via the array is instantaneous. More than likely, you do not know where the item in the array the item you are looking for is. In these cases, you will have to scan through your items which linearly takes as much time as the number of items in your array.

Not everything is about efficiency though. Sometimes, you will want to use an array if you will be iterating through the items. Iterating through your data sounds simple, but it isn't something you can easily do with a hashtable. I know, bummer!

Just a final word before we wrap up. If you have a question and/or want to be part of a friendly, collaborative community of over 220k other developers like yourself, post on the forums for a quick response!

Kirupa's signature!

Need help?

Ask on the forums!

Customize Theme

Hashtables vs. Arrays

The Data

Examining Arrays

Examining Hashtables

Conclusion

The KIRUPA Newsletter

Follow:

Popular

Loose Ends