Bloomberg LP Interview Question
Financial Software DevelopersCountry: United States
Interview Type: Phone Interview
I guess the reason of 10 bytes for 7b coming from 7,000,000,000. That takes 10 digits, and each person takes a unique number. If it is already unique, why is the international code needed ?
An unsigned int covers the 0 to 2^32 - 1 about 8.5b. That means an unsigned int is good enough for 7b people phone #. The phone# db size is 7b*4 bytes = 28 gb. For today's commercial grade computer, 28g ram is small. It can be up to couple hundred gb ram. Even a PC, can put 32gb there. Also, SSD is as good as ram.
Memory size is not an issue.
You could, but think about the consequences. For example, trie trees only store keys at the leaf nodes, which means we will end up with 7 billion leaf nodes! If I am correct, that is going to be a memory nightmare! Perhaps BST build like a prefix tree is a better option.
Keep in mind various operations that you may have to performs on the data structure. For example, deleting people, adding new people keeping in mind the performance of these operations.
Base will be hashmap structure name - phone. As many maps as there are first letters. And a reference table letter - how to find the data map.
This way you just find the needed Map by the first letter, load it in memory and then retrieve the phone. It works well only if the letter distribution is more or less uniform.
A phone book typically stores name and phone number , but first name and last name is used to identify clients. So we will need to come with a data structure to model phonebook assuming first and last name uniquely refer to a contact number. A possible option would be using a dynamic perfect hashing technique that uses a 2-tier hash level schemes such that the second level of hashing guarantees collision free hashing.
Now given the gigantic size of the customer base , we can maintain 2 hash structures..one smaller one in the main memory of maybe size (say X) and the bigger hash table in the secondary memory will be some multiple of X...so that a any of the portion from secondary memory fit completely into the main memory space for queries.
I believe this architecture will hold suffice for the problem statement.
//
//
// A client wants to build a software phone book that contains
// everyone in the world (7 billion people). Every person has only
// the first name and the name is unique. What data structure would
// you use to store the data?
//
// -- rv July 14, 2015 in United States | Report
// Bloomberg LP Interview Question
//
// Run with VM arguments -ea to enable assert testing
//
// (c) 2016 Perry Anderson, All Rights Reserved, worldwide.
//
//
import java.util.*;
import java.util.zip.CRC32;
/*
* ---- Beginning of Answer ---
*
* In any typical database real world application you have to take
* into consideration available ram versus database not to mention
* database servers and the like. But what really makes such a large
* database application manageable is to make use of the CRC32
* coding algorithm to minimize indexing demands. The CRC32 method
* produces a number that is at least 99.9999% effective such that
* the possibilities of a duplicate is exceedingly low for two
* different strings. For example, like trying to pick out the Planet
* Earth out of 10,000 galaxies the size of the Milky Way.
*
* So the data structure that I would use to store the data would be
* a simple hash map implementation based on the CRC key value which
* keeps the size of the hash map key to a mere 4 bytes compared to
* the incredible database demands required to store duplicate name
* strings in an index database. In other words, very few people use
* four bytes to represent their first name. Hence the storage costs
* are minimized and performance gains are maximized.
*
* ---- End of Answer ----
*
* What you see below is the skeleton of what would be required to
* coordinate such a large database in a typical distributed server
* application.
*
* This example also shows the effective use of interfaces in Java
* to coordinate changes amongst a large number of classes. The same
* effect can be implemented in C++ use Abstract Classes.
*
*
*/
class PhoneBookEntry implements Comparable<PhoneBookEntry> {
static CRC32 checksum = new CRC32();
String name;
String number;
long key;
public PhoneBookEntry(String name, String number) {
this.name = name;
this.number = number;
checksum.reset();
checksum.update(name.getBytes(),0,name.getBytes().length);
this.key = checksum.getValue();
}
@Override
public int compareTo(PhoneBookEntry o) {
return (int)(key - o.key);
}
}
/**
* interface PhoneBookInterface
*
* Defines the interface to the PhoneBook conceptual database for
* use by all implementations of the actual PhoneBook database.
* @author perry
*
*/
interface PhoneBookInterface {
PhoneBookEntry create(String name, String number);
PhoneBookEntry retrieve(String name);
PhoneBookEntry update(String name, String number);
PhoneBookEntry delete(String name);
PhoneBookEntry put(PhoneBookEntry entry);
}
/*
* Typically, you would need a cache for use by applications to
* read in whatever phone book entries needed during a session.
*/
class PhoneBookCache implements PhoneBookInterface {
static CRC32 checksum = new CRC32();
HashMap<Long, PhoneBookEntry> cache = new HashMap<Long, PhoneBookEntry>();
@Override
public PhoneBookEntry retrieve(String name) {
checksum.reset();
checksum.update(name.getBytes(),0,name.getBytes().length);
long searchkey = checksum.getValue();
return cache.get(searchkey);
}
@Override
public PhoneBookEntry create(String name, String number) {
// TODO Auto-generated method stub
return null;
}
@Override
public PhoneBookEntry update(String name, String number) {
// TODO Auto-generated method stub
return null;
}
@Override
public PhoneBookEntry delete(String name) {
// TODO Auto-generated method stub
return null;
}
@Override
public PhoneBookEntry put(PhoneBookEntry entry) {
// TODO Auto-generated method stub
return null;
}
}
/*
* Typically, you would store all phone book entries into a
* professional database like Oracle or MySQL. This class
* specializes in converting application requests into
* the vendor's implementation. Here, the CRC32 value key
* greatly simplies the need for indexing.
*
*/
class PhoneBookDatabase implements PhoneBookInterface {
static CRC32 checksum = new CRC32();
HashMap<Long, PhoneBookEntry> database = new HashMap<Long, PhoneBookEntry>();
@Override
public PhoneBookEntry retrieve(String name) {
long searchkey = getSearchKey(name);
return database.get(searchkey);
}
Long getSearchKey(String name) {
checksum.reset();
checksum.update(name.getBytes(),0,name.getBytes().length);
long searchkey = checksum.getValue();
return searchkey;
}
@Override
public PhoneBookEntry create(String name, String number) {
long searchkey = getSearchKey(name);
PhoneBookEntry entry = new PhoneBookEntry(name, number);
database.put(searchkey,entry);
return entry;
}
@Override
public PhoneBookEntry update(String name, String number) {
// TODO Auto-generated method stub
return null;
}
@Override
public PhoneBookEntry delete(String name) {
// TODO Auto-generated method stub
return null;
}
@Override
public PhoneBookEntry put(PhoneBookEntry entry) {
// TODO Auto-generated method stub
return null;
}
}
/*
* Assuming that a server can have more than one database
* we use the mod operation on the key value to find out which
* database the phone entry has been assigned.
*
*/
class PhoneBookServer implements PhoneBookInterface {
static CRC32 checksum = new CRC32();
static int maxDatabasePerServer = 100;
HashMap<Long, PhoneBookDatabase> databases = new HashMap<Long, PhoneBookDatabase>();
@Override
public PhoneBookEntry retrieve(String name) {
long searchkey = getSearchKey(name);
PhoneBookDatabase actualDatabase = databases.get(searchkey);
return actualDatabase.retrieve(name);
}
Long getSearchKey(String name) {
checksum.reset();
checksum.update(name.getBytes(),0,name.getBytes().length);
long searchkey = checksum.getValue() % maxDatabasePerServer;
return searchkey;
}
PhoneBookDatabase getDatabase(String name) {
long searchkey = getSearchKey(name);
PhoneBookDatabase actualDatabase = databases.get(searchkey);
if ( actualDatabase == null ) {
actualDatabase = new PhoneBookDatabase();
databases.put(searchkey, actualDatabase);
}
return actualDatabase;
}
@Override
public PhoneBookEntry create(String name, String number) {
PhoneBookDatabase actualDatabase = getDatabase(name);
return actualDatabase.create(name, number);
}
@Override
public PhoneBookEntry update(String name, String number) {
// TODO Auto-generated method stub
return null;
}
@Override
public PhoneBookEntry delete(String name) {
// TODO Auto-generated method stub
return null;
}
@Override
public PhoneBookEntry put(PhoneBookEntry entry) {
// TODO Auto-generated method stub
return null;
}
}
/*
* To find where the server holding the database is located
* we assume a fixed value of 10,000 servers online at any time
* and use a mod operation on the CRC32 value to know where the
* actual database for that particular phone book entry is.
*
*/
class PhoneBookServerCache implements PhoneBookInterface {
static CRC32 checksum = new CRC32();
static int maxServers = 10000;
HashMap<Long, PhoneBookServer> servers = new HashMap<Long, PhoneBookServer>();
static PhoneBookServerCache singleton = new PhoneBookServerCache();
private PhoneBookServerCache() {
}
public static PhoneBookServerCache getInstance() {
return singleton;
}
Long getSearchKey(String name) {
checksum.reset();
checksum.update(name.getBytes(),0,name.getBytes().length);
long searchkey = checksum.getValue() % maxServers;
return searchkey;
}
PhoneBookServer getServer(String name) {
long searchkey = getSearchKey(name);
PhoneBookServer actualServer = servers.get(searchkey);
if ( actualServer == null ) {
actualServer = new PhoneBookServer();
servers.put(searchkey, actualServer);
}
return actualServer;
}
@Override
public PhoneBookEntry retrieve(String name) {
PhoneBookServer actualServer = getServer(name);
return actualServer.retrieve(name);
}
@Override
public PhoneBookEntry create(String name, String number) {
PhoneBookServer actualServer = getServer(name);
return actualServer.create(name, number);
}
@Override
public PhoneBookEntry update(String name, String number) {
// TODO Auto-generated method stub
return null;
}
@Override
public PhoneBookEntry delete(String name) {
// TODO Auto-generated method stub
return null;
}
@Override
public PhoneBookEntry put(PhoneBookEntry entry) {
return null;
}
}
/*
* Typically, you would need a cache for use by applications to
* read in whatever phone book entries needed during a session.
*/
class PhoneBookApp implements PhoneBookInterface {
PhoneBookCache cache = new PhoneBookCache();
PhoneBookServerCache server = PhoneBookServerCache.getInstance();
@Override
public PhoneBookEntry retrieve(String name) {
PhoneBookEntry entry = cache.retrieve(name);
if ( entry == null ) {
entry = server.retrieve(name);
cache.put(entry);
}
return entry;
}
@Override
public PhoneBookEntry create(String name, String number) {
return server.create(name,number);
}
@Override
public PhoneBookEntry update(String name, String number) {
// TODO Auto-generated method stub
return null;
}
@Override
public PhoneBookEntry delete(String name) {
// TODO Auto-generated method stub
return null;
}
@Override
public PhoneBookEntry put(PhoneBookEntry entry) {
// TODO Auto-generated method stub
return null;
}
}
public class LPMain007 {
public static void main(String[] args) {
PhoneBookApp app = new PhoneBookApp();
String[] entry1 = { "Jack", "800-555-1212" };
PhoneBookEntry test1 = new PhoneBookEntry("Jack", "800-555-1212");
String[] entry2 = { "Bill", "888-555-1212" };
PhoneBookEntry test2 = new PhoneBookEntry("Bill", "888-555-1212");
assert app.create(entry1[0],entry1[1]).compareTo(test1)==0;
assert app.create(entry2[0],entry2[1]).compareTo(test2)==0;
assert app.retrieve(entry1[0]).compareTo(test1)==0;
assert app.retrieve(entry1[0]).compareTo(test2)!=0;
assert app.retrieve(entry2[0]).compareTo(test2)==0;
assert app.retrieve(entry2[0]).compareTo(test1)!=0;
}
}
For 7 billion people representing phone number as 12 bytes 10bytes + international code , you will need 70 billion bytes minimum 1 million = 1 mb , 70 billion=70000 million , 70 gb *12 = 840 gb... Average ram size is 8gb so you cannot store all this information in memory ..
- smarthbehl July 14, 2015You will need a distributed data structure, may be something like a distributed hash table.