Java - HashSet
Set interface is also an implementation of Collections interface which is mainly needed if we do not want any duplicate. In other words Set will never allow duplicate elements to be added in it.
The concrete implementations of Set are:
HashSet
As the name suggests, HashSet is a concrete implementation of Set, which follows a technique called 'Hashing'. So, to understand HashSet, you don't have to understand Hashing in detail.
Just keep in mind that elements are inserted in a HashSet based on the HashCode of the element.
Sounds difficult. No worries, we will simplify it.
Just like ArrayList and LinkedList, HashSet also has an add() method to add elements to HashSet.
Now, when we add elements to the HashSet, the elements/objects doesn't gets added at the end of the HashSet, but they are added to the location which is calculated based on the elements/objects HashCode.
What exactly is a HashCode?
Example :
Just imagine there is a large hall and the desks and benches in the hall are placed in only
a single row. Benches are marked from roll numbers 1 to 500.
Now, there are a group of students and we need to calculate their rollno based on their names.
And make them sit on their seats based on their roll numbers.
But how? Let's mark the alphabets from 1 to 26. i.e. A is 1, B is 2, C is 3 and so on.
Let's suppose there is a guy 'SAM' and we will make his roll number by adding the letters
in his names.
i.e. S - 19, A - 1, M - 13.
So, the added value of SAM is 19+1+13=33.
So, Sam is assigned roll no 33.
Similarly if 'ABY' comes, her rollno would be A-1, B-2, y-23. i.e. 1+2+23=26.
Thus, Aby would be assigned rollno 26.
So, in this way roll numbers would be assigned to all the students in that group.
And that is Hashing and the rollno we are calculating is called its HashCode.
Now, if someone comes and asks you to find 'ABY', simply you will calculate the roll number
by adding the letters of her name and find her seat.
Hash Collision
But stop, think for a while. There can be a guy named 'BAY'.
So, his roll number would be: B-2, A-1, Y-23. i.e 2+1+23=26.
Strange! Bay and Aby both have roll number 26.
This is called a Hash Collision where two elements(or guys in this case) have the same roll numbers. So, one more seat will be reserved for 'BAY' with number 26 attached on the seat.
But there can be a situation where a large group of students are assigned roll number 26.
So, in that case you have to go to seat number 26 and ask the names of each and every students.
So the logic of calculating HashCode(roll number in this case) should be in such a way that hash collisions(equal roll numbers in this case) does not occur.
HashSet Definition
Set hashSet = new HashSet();
The HashSet will be created with the Initial capacity of 16 and Load Factor 0.75.
As the name suggests Initial capacity is the size of the HashSet when it is initially created.
And Load Factor is 0.75 or 75% which means when the HashSet is 75% full, a new HashSet of size 16 will be created and linked to the existing one.
Now, just think if someone wants to change the initial capacity( i.e 16).
How to create a HashSet of Custom size?
i.e. Say someone wants to create an HashSet of size 50. HashSet gives us the flexibility to do that as well.
HashSet hashSet = new HashSet(50);
How to create a HashSet of Custom load factor and size ?
Again if someone wants to change the Initial capacity and Load factor(Say we want to change the load factor to 0.90 i.e. 90%) as well. It can be done in the below way:
HashSet hashSet = new HashSet(50,0.90);
Constructors in HashSet
- HashSet() : Creates a HashSet with initial capacity 16.
- HashSet(int initialCapacity) : Creates a HashSet with a custom initial capacity.
- HashSet(int initialCapacity, float LoadFactor) : Creates a HashSet with a custom initial
capacity and custom Load Factor.
public class TestCollection{
public static void main(String[] arg){
String s1 = new String("Sham");
String s2 = new String("Paul");
String s3 = new String("John");
// We will be adding these Strings to the HashSet.
Set hashSet = new HashSet(); // Declare an HashSet.
hashSet.add(s1); // Add the String to the HashSet.
hashSet.add(s2);
hashSet.add(s3);
System.out.println(hashSet); // Output would be : [Paul,Sham,John]. Not in the same order they were inserted.
hashSet.remove(s2); // Deletes "Paul" from the HashSet.
System.out.println(hashSet); // New output would be : [John,Sham]
if(hashSet.contains("Sham")){
System.out.println("Sham is present in the HashSet.");
}
}
}
1) We have created three Strings s1, s2 & s3.
2) Then we have created a HashSet() using the below format.
Set hashSet = new HashSet();
3) Next we have added Strings s1,s2 & s3 using the add() method.
hashSet.add(s1);
hashSet.add(s2);
hashSet.add(s3);
4) When we have printed the values of the HashSet, we have seen the names were not printed in the order they were inserted.
i.e. We have inserted Sham, Paul and then John.
But when we printed the names. Paul, Sham and then John was printed.
This is because the names are not inserted serially at the HashSet(). Whenever we try to insert an object, the HashCode is calculated from that object and then the object is inserted. Same way the roll numbers of the students were calculated based on names in the above example.
System.out.println(hashSet); // Output would be : [Paul,Sham,John].
5) Since, 'Paul' is stored in s2. We have removed 'Paul' by using the remove() method.
hashSet.remove(s2);
6) Next we are checking if 'Sham' is present in the HashSet. We are doing it using the contains() method of HashSet.
Advantages and disadvantages of HashSet
So, HashSet is very effective for searching, deletion & insertion as well.
The only drawback is, if the HashCodes of two objects are equal a LinkedList is created and the equals() method comes into picture for finding that particular element.
Sounds confusing? We will learn in detail the equals() and HashCode() methods next.