Removing Duplicate Items from an Array

by kirupa   |   17 July 2012

  Have questions? Discuss this HTML5 / JavaScript tutorial with others on the forums.

When you are dealing with data in an Array, you may be interested in filtering out all duplicate values. You may be interested in having a count of the unique values only. Whatever your reason, not having duplicate values in your array might be a good thing.

Note

What you are about to see will only work if your array contains strings, numbers, or boolean values.

For example, let's say your array called myArray contains the following data:

array with duplicate values

This data has some duplicate values. To help you see them, I've highlighted them all below:

duplicate values from the array highlighted

What you are going to learn in this tutorial is how you can easily remove these duplicate values to end up with something that looks like this:

duplicate values removed

Let's get started.

Looking at the Code

Let's first look at the full code for making this all work. We'll dive into why the code works the way it does later.

Without further ado, the full code is:

var myArray = ["bar", "foo", "zorb", "bar", "baz", "fum", "baz"];

Array.prototype.removeDuplicates = function() {
	var input = this;
	var hashObject = new Object();
	
	for (var i = input.length - 1; i >= 0; i--) {
		var currentItem = input[i]; 
	
		if (hashObject[currentItem] == true) {
			input.splice(i, 1);
		}
		
		hashObject[currentItem] = true;
	}
	return input;
}

myArray.removeDuplicates();
alert(myArray);

If you add this code to a script tag and run it, you'll see a dialog containing all of the duplicate values removed:

the output

Now that you see this code working, let's actually learn why it works the way it does.

Understanding What the Code Does

The chunk of code responsible for all of this, is the removeDuplicates function:

var myArray = ["bar", "foo", "zorb", "bar", "baz", "fum", "baz"];

Array.prototype.removeDuplicates = function() {
	var input = this;
	var hashObject = new Object();
	
	for (var i = input.length - 1; i >= 0; i--) {
		var currentItem = input[i]; 
	
		if (hashObject[currentItem] == true) {
			input.splice(i, 1);
		}
		
		hashObject[currentItem] = true;
	}
	return input;
}

myArray.removeDuplicates();
alert(myArray);

Before looking at each line of this code in greater detail, let's first understand what we are trying to do.

At a high level, this code works as follows. You have your array:

array with duplicate values

This array contains duplicate values that we would like to remove. To help with this, you call in some outside help. That outside help arrives in the form of a generic object called hashObject whose only purpose is to act like a hashtable:

here we have our hashobject

If you are not familiar with hashtables, I encourage you to read the following article. In a nutshell, a hashtable allows you to store values. It stores the values by picking a precise location to store the value based on a name (aka key) you provide. Boring, right?

The excitement happens when you try to retrieve a value. Instead of searching through each item in order to find what you are looking for, it uses the name you provided earlier to instantaneously find the value associated with it. To put it bluntly, a hashtable is ridiculously fast when it comes to retrieving stored values. You'll see shortly how it is used to help us detect duplicate values.

We have our hashObject - which is initially empty. We have our array. It's time to begin. For reasons I will explain shortly, let's start at the end of our list and move to the front. As you hit each item in your array, we use the following heurestic to figure out what to do:

Let's walk through our example. The last item in our array is baz. We check if baz exists inside hashObject. Since hashObject is initially empty, it doesn't exist. We go ahead and add baz to our hashObject by making baz be the name and true the value:

hashObject filling up

Let's move to the next item, and that item is fum. This item does not exist in our hashObject either, so let's go ahead and add it to hashObject as well:

fum gets added because it doesn't already exist

The next item in our list is baz. Now things start to get interesting. The item baz does exist in our hashObject - it exists from having been added at the very beginning:

conflict time!

What this means is that this instance of baz is a duplicate. When we hit a duplicate value, there is no point in keeping it around in our array. It's best to take it out back and put it out of its misery by simply removing it.

Notice that our duplicate baz item has now been removed:

hashObject removed

That's all there is to it. This entire process of verifying whether an item exists in our hashObject, adding the item to it if it doesn't exist, or removing the item from the array if does exist repeats itself until you hit the end of the array:

only unique items left

At the end, what you are left with is just a unique list of items in your array. Seems pretty simple now, right?

Diving into the Code

Now that you have a good idea of how the code works (or should work!), let's go through each line of the code in our removeDuplicates function in greater detail:

Array.prototype.removeDuplicates = function() {
	var input = this;
	var hashObject = new Object();
	
	for (var i = input.length - 1; i >= 0; i--) {
		var currentItem = input[i]; 
	
		if (hashObject[currentItem] == true) {
			input.splice(i, 1);
        } else {
        	hashObject[currentItem] = true;
        }
    }
    return input;
}

The first line of this function tells you that you are extending the prototype of your Array object:

Array.prototype.removeDuplicates = function() {

I won't cover prototypes in this article, but just know that it is this line that allows us to call the removeDuplicates function as a method on your array instance.


The next two lines setup the two variables you will be using - the one for your array and the one for your hashObject:

var input = this;
var hashObject = new Object();

Because this function is meant to be used via an array object, the keyword this refers to an array. To reduce confusion, I am simply referring to this via the input variable.

In the second line, I create my hashObject object. Note that in JavaScript you don't have an explicit hashtable implementation. Instead, plain old objects emulate the functionality of a hashtable quite well, so that's why we initialize the hashObject variable to be a new Object.


The next line is where the action starts to go down. You have your for loop declaration:

for (var i = input.length - 1; i >= 0; i--) {
	...
}

Notice that our loop goes backwards. We start at the length of your array (normalized to a 0-based index position by subtracting a 1) and then decrementing until we hit 0.

Now, you may be wondering why we are going backwards. The reason is that we will be modifying the size of our array by removing duplicate values. The only way to ensure your code doesn't crash in the middle of your loop is to end on a value that you are guaranteed to hit. That value is 0 aka the index position associated with the beginning of your loop.


The next line is simple:

var currentItem = input[i]; 

We create the currentItem variable to store the current item you are examining as part of our loop.


The bulk of the heurestic I described earlier can be found next:

if (hashObject[currentItem] == true) {
	input.splice(i, 1);
} else {
	hashObject[currentItem] = true;
}

We first check if the currentItem exists in our hashObject. That is done by passing in the currentItem as the name and seeing if that returns a value of true. If a value of true is returned, that means it exists and the currentItem is a duplicate.

In that case, we need to remove the current item from our array by calling the splice method on it:

if (hashObject[currentItem] == true) {
	input.splice(i, 1);
} else {
	hashObject[currentItem] = true;
}

If our hashObject doesn't contain an entry matching our currentItem, then this check will fail. In that case, let's add it to our hashObject by using the currentItem value as the name/key, and giving that entry a value of true:

if (hashObject[currentItem] == true) {
	input.splice(i, 1);
} else {
	hashObject[currentItem] = true;
}

This block of code essentially ensures that you keep your array free from duplicate values. If anybody asks, that is what you should tell them.


The last thing we do is return our freshly modified array back to whatever called it - in this case, the array which we wished to cleanse of all duplicated values:

return input;

There is nothing fancy going on here. Seriously. Absolutely nothing at all. Once this line is run, you are done.

Conclusion

Ok, so here is the deal. The code I just showed you and explained works well. BUT, it could work better. As fellow Blenders, Jacob Alber and DoRon Motter pointed out, the main problem with the simple approach is that modifying the size of your array by removing an item each time a duplicate value is found is a very expensive operation. Each time you modify the size of your array, a new array is allocated in memory with the new size that you want. Your browser will eventually clean all that up, but there is the initial cost of having to deal with creating a new array that needs to be paid.

For smaller sized arrays, that doesn't really make much of a difference. When you start going into having tens of thousands of items, then the few milliseconds you lose from the simple approach can easily add up. Just be aware of this potential issue when you are using this approach for removing duplicate values from really large arrays.

Getting Help

If you have questions, need some assistance on this topic, or just want to chat - post in the comments below or drop by our friendly forums (where you have a lot more formatting options) and post your question. There are a lot of knowledgeable and witty people who would be happy to help you out

Share

Did you enjoy reading this and found it useful? If so, please share it with your friends:

If you didn't like it, I always like to hear how I can do better next time. Please feel free to contact me directly via e-mail, facebook, or twitter.

Brought to you by...

Kirupa Chinnathambi
I like to talk a lot - A WHOLE LOT. When I'm not talking, I've been known to write the occasional English word. You can learn more about me by going here.

Add Your Comment (or post on the Forums)

blog comments powered by Disqus

Awesome and high-performance web hosting!
BACK TO TOP
new books - yay!!!