Data structure: insert, remove, contains, get random element, all at O(1)
I was given this problem in an interview. How would you have answered?
Design a data structure that offers the following operations in O(1) time:
- insert
- remove
- contains
- get random element
Solution 1:
Consider a data structure composed of a hashtable H and an array A. The hashtable keys are the elements in the data structure, and the values are their positions in the array.
- insert(value): append the value to array and let i be its index in A. Set H[value]=i.
- remove(value): We are going to replace the cell that contains value in A with the last element in A. let d be the last element in the array A at index m. let i be H[value], the index in the array of the value to be removed. Set A[i]=d, H[d]=i, decrease the size of the array by one, and remove value from H.
- contains(value): return H.contains(value)
- getRandomElement(): let r=random(current size of A). return A[r].
since the array needs to auto-increase in size, it's going to be amortize O(1) to add an element, but I guess that's OK.
Solution 2:
O(1) lookup implies a hashed data structure.
By comparison:
- O(1) insert/delete with O(N) lookup implies a linked list.
- O(1) insert, O(N) delete, and O(N) lookup implies an array-backed list
- O(logN) insert/delete/lookup implies a tree or heap.
Solution 3:
You might not like this, because they're probably looking for a clever solution, but sometimes it pays to stick to your guns... A hash table already satisfies the requirements - probably better overall than anything else will (albeit obviously in amortised constant time, and with different compromises to other solutions).
The requirement that's tricky is the "random element" selection: in a hash table, you would need to scan or probe for such an element.
For closed hashing / open addressing, the chance of any given bucket being occupied is size() / capacity()
, but crucially this is typically kept in a constant multiplicative range by a hash-table implementation (e.g. the table may be kept larger than its current contents by say 1.2x to ~10x depending on performance/memory tuning). This means on average we can expect to search 1.2 to 10 buckets - totally independent of the total size of the container; amortised O(1).
I can imagine two simple approaches (and a great many more fiddly ones):
-
search linearly from a random bucket
- consider empty/value-holding buckets ala "--AC-----B--D": you can say that the first "random" selection is fair even though it favours B, because B had no more probability of being favoured than the other elements, but if you're doing repeated "random" selections using the same values then clearly having B repeatedly favoured may be undesirable (nothing in the question demands even probabilities though)
-
try random buckets repeatedly until you find a populated one
- "only" capacity() / size() average buckets visited (as above) - but in practical terms more expensive because random number generation is relatively expensive, and infinitely bad if infinitely improbable worst-case behaviour...
- a faster compromise would be to use a list of pre-generated random offsets from the initial randomly selected bucket, %-ing them into the bucket count
- "only" capacity() / size() average buckets visited (as above) - but in practical terms more expensive because random number generation is relatively expensive, and infinitely bad if infinitely improbable worst-case behaviour...
Not a great solution, but may still be a better overall compromise than the memory and performance overheads of maintaining a second index array at all times.
Solution 4:
For this Question i will use two Data Structure
- HashMap
- ArrayList / Array / Double LinkedList.
Steps :-
- Insertion :- Check if X is already present in HashMap --Time complexity O(1) . if not Present Then Add in end of ArrayList -- Time complexity O(1). add it in HashMap also x as key and last Index as a value -- Time complexity O(1).
- Remove :- Check if X is present in HashMap --Time complexity O(1). If present then find the its index and remove it from HashMap --Time complexity O(1). swap this element with last element in ArrayList and remove the last element --Time complexity O(1). Update the index of last Element in HashMap --Time complexity O(1).
- GetRandom :- Generate Random number from 0 to last index of ArrayList . return the ArrayList element at random index generated --Time complexity O(1).
- Search :- See in HashMap for x as a key. --Time complexity O(1).
Code :-
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.LinkedList;
import java.util.List;
import java.util.Random;
import java.util.Scanner;
public class JavaApplication1 {
public static void main(String args[]){
Scanner sc = new Scanner(System.in);
ArrayList<Integer> al =new ArrayList<Integer>();
HashMap<Integer,Integer> mp = new HashMap<Integer,Integer>();
while(true){
System.out.println("**menu**");
System.out.println("1.insert");
System.out.println("2.remove");
System.out.println("3.search");
System.out.println("4.rendom");
int ch = sc.nextInt();
switch(ch){
case 1 : System.out.println("Enter the Element ");
int a = sc.nextInt();
if(mp.containsKey(a)){
System.out.println("Element is already present ");
}
else{
al.add(a);
mp.put(a, al.size()-1);
}
break;
case 2 : System.out.println("Enter the Element Which u want to remove");
a = sc.nextInt();
if(mp.containsKey(a)){
int size = al.size();
int index = mp.get(a);
int last = al.get(size-1);
Collections.swap(al, index, size-1);
al.remove(size-1);
mp.put(last, index);
System.out.println("Data Deleted");
}
else{
System.out.println("Data Not found");
}
break;
case 3 : System.out.println("Enter the Element to Search");
a = sc.nextInt();
if(mp.containsKey(a)){
System.out.println(mp.get(a));
}
else{
System.out.println("Data Not Found");
}
break;
case 4 : Random rm = new Random();
int index = rm.nextInt(al.size());
System.out.println(al.get(index));
break;
}
}
}
}
-- Time complexity O(1). -- Space complexity O(N).