Interview Question
AnalystsCountry: India
Interview Type: In-Person
We cant store duplicate data, as it will lead to waste lot of memory. Instead we can have arrays, one for each key.
1. FirstName[0..length], lastName[0..length], number[0..length], designation[0..length]
2. Search value in respective array, which return the index of item.
3. Fetch data from other arrays from same index. This will return complete details of person.
If you are doing it using java , you just need to create people object , you can add as many fields to the object , and getter and setter methods for them and store them in an array or Arraylist or Set might be even better based on the requirements. When searching and sorting you can implement one Comparable interface for the natural ordering and various comparator objects for different kind of sorting. Seems straightforward in case of Java. If this problem is a database problem , store the values in different fields and do not allow extra spaces to the fields. Index the fields you want to do search upon.
Can someone suggest what should be the general approach of solving such problems? I believe, when they ask you to design this system for the entire population of a country, they are expecting you to talk about databases and B-tree indexes. If this database has to have a front-end web application, we can talk about how our classes will be designed to hold the data coming from the database. Right?
Such a big corpus can not me stored in Arrays since you need consecutive free memory addresses for initialization. You will most definitely run into out of memory exception. If corpus is large but can be accommodated in memory than hybrid of Map and tree can be used or if corpus is too large we can use Apache solr or Lucene to index the corpus and then query it.
Solution to this problem must solve 3 criteria efficiently:-
1. Memory wastage
2. should dynamically grow or shrink as per need
3, addition/removal/query of a person's data should be optimal
Technically we can think of 3 ways to solve this problem:
1. Arrays (Not Possible)
Addition/Removal/Searching of a person's data is efficient in this way. But can you think of array size equal to population of India. If there are less records, then lot of memory will be wasted. Also the size of array can't grow when there is need for it.
2. Linked List (Not Efficient)
We can implement Linked list for this as the records can be added dynamically when there is need for it. Memory wastage problem is solved as the record is created only when a new member appears. But addition/Removal/searching of a person's data with specific phone no becomes tedious. Imagine you have someones phone no in the last node. Then we have to iterate whole linked list to get to that node. So not Efficient.
3. Indexing/Hashing (Solution)
Indexing/Hashing gives us the way to store data efficiently. It grows with space requirement so memory wastage is taken care of. The best thing about this approach is that Addition/Removal/Searching of a person's data is very efficient. When we query for a person's phone no like 98********. First all nos starting from 9 only are taken into account. Then nos starting with 8 and hence forth. So this is an optimal solution.
If the computer has enough primary memory, simply three hashmaps of type :
(telephone#, String[])
(first name, String[])
(last name, String[])
If not enough primary memory, create a folder which contains files by numbers example
1,2,3,4 ......
and create three hashmaps as above by replacing string[] with the respective numbers.
The best way to design such system is to use indexing/searching framework e.g., lucene/ solr frameworks if you heard about those.
- Anonymous July 23, 2014Where each object, in this case
Person[firstname, lastname, number] will be one document and would be stored in index containing fields - firstname, lastname, number and then you can query for any required field like
---firstname: mike
-- lastname: kahn
etc.
B-Trees or tries are generally used for indexing data based on the type of use-cases you want to perform with searches and performance considerations