Given n number of legacy servi

Google Interview Question for Member Technical Staffs

0

of 2 votes

7
Answers
Given n number of legacy services with user data<userid, info, date>
Design an API to return user data in a given date range, it should collect data from each service and merge and return the data sorted by date.
- next_big_gig December 05, 2014 in United States | Report Duplicate | Flag | PURGE
Google Member Technical Staff System Design

Email me when people comment.

An error occurred in subscribing you.

Country: United States
Interview Type: In-Person

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 1 vote

From a systems standpoint, you've got to take into account a bunch of different messaging types. I'd use a combination of the adaptor pattern (to wrap the interface to each legacy server), with the interface:

#include <ctime>
    #include <set>

    struct Record { int uid; std::string info; tm date;};
    struct CompRecord {
       bool operator()(const Record& l, const Record& r) { return difftime(l.date,r.date) < 0;}
    };

    struct iGetDates {
       virtual ~iGetDates{}
       virtual std::vector<Record> getDates(tm from, tm to) = 0;
    }
    class DateGetter : public iGetDates {
    public:
        template <typename T>
        void add_server(T* server) {
           servers_.push(std::unique_ptr<Record>(server));
        }
        virtual std::vector<Record> getDates(tm from, tm to) {
           std::set<Record,CompRecord> return_set;
           for (auto& server : servers_) {
             for (auto& record : server->getDates(from,to)) {
               return_set.insert(record);
             }
           }
           return std::vector<Record>{return_set.begin(),return_set.end()};
        }
     private:
       std::vector<std::unique_ptr<iGetDates >> servers_;
     };

Ownership is a little funny, probably should provide a clone method and/or a move-clone method.

From an algorithms standpoint, we can say that set is an implementation of a Red-Black tree. Adding the value is O(lg(n)), overall it is O(n*lg(n)). I'd guess that from that point they'll probably ask me to implement a red-black tree, to which I will groan and then do it.

- Anonymous December 15, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Why not use a priority queue to for the storing and sorting of the dates?

- Anonymous December 19, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I would strongly agree with Red-Black trees or Balanced BST. Priority queue may give max/min but will fail in giving a range of results efficiently.

Using a balanced BST,
Insertion - O(n logn)
Querying based on given date - O(log n).
Sorting the queried results in Inorder - O(logn)

- Sudhindra.A December 25, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Priority Queue: Get the data from different services and insert them into priority Queue.. and retrieve either delMin or delMax

- naren December 05, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

struct userData{
    int userId;
    string info;
    int date;
};

bool acompare(userData lhs, userData rhs) { return lhs.date < rhs.date; }

void merge(&useDate arr[], int arrSize, int startDate, int finishDate){
    std::sort(array, array+1000, acompare);
    for (int i=0; i<arrSize; i++){
       if (arr[i].date <= finishDate && arr[i].date >= startDate)
       cout<<"User ID: "<<arr[i].userId<<" " <<" User Info: " << arr[i].info<<endl;
    }
}

- Mike December 06, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

My experience with things like this, a service would not keep the internal contents in Date sorted order, so that needs to be addressed too.

Using a mix of QuickSort and MergeSort-like operations.

static interface Service{
	UserData[] getUserData();
}

static interface UserData{
	long userId;
	String info;
	long date;
}

public static UserData[] getDateSortedUserData(Service[] services){
	UserData results = new UserData[0];
	//comparator for Java QuickSort
	Comparator<UserData> userDataComparator = new Comparator<UserData>{
		final long maxVal = (long)Integer.MAX_VALUE;
		final long minVal = (long)Integer.MIN_VALUE;
		public int compare(UserData data1, UserData data2){
			long diff = data2.date - data1.date;
			if(diff > maxVal){
				return Integer.MAX_VALUE;
			}
			if(diff < minVal){
				return Integer.MIN_VALUE;
			}
			return (int)diff;
		}
	}
	for(Service service : services){
		UserData[] serviceUserData = service.getUserData();
		//Java QuickSort
		Arrays.sort(serviceUserData, userDataComparator);
		//custom merge sort
		results = mergeData(results, serviceUserData);
	}
	return results.toArray(new UserData[results.size()]);
}

private static void mergeData(UserData[] arr1, UserData[] arr2){
	UserData[] results = new UserData[arr1.length + arr2.length];
	int resultIndex = 0;
	int arr1Index = 0;
	int arr2Index = 0;
	while(arr1Index < arr1.length && arr2Index < arr2.length){
		if(arr1[arr1Index].date < arr2[arr2Index].date){
			results[resultIndex] = arr1[arr1Index];
			arr1Index++;
		else{
			results[resultIndex] = arr2[arr2Index];
			arr2Index++;
		}
		resultIndex++;
	}
	while(arr1Index < arr1.length){
		results[resultIndex] = arr1[arr1Index];
		resultIndex++;
		arr1Index++;
	}
	while(arr2Index < arr2.length){
		results[resultIndex] = arr2[arr2Index];
		resultIndex++;
		arr2Index++;
	}
	return results;
}

- zortlord December 10, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

There's an awful lot of algorithmic code here for a question that requests to define an interface.

I would treat this question as a design question.

First, I would ask a bunch of questions, some of them obvious, some are not.
1. How many services do I need to support?
2. How many requests do I need to support per second?
3. How much data does a typical request return?
4. Can I assume the result fits in memory?
5. Is there a disparity between the speeds in the services response?
6. Do different services use different formats? Are they able to convert it to a uniform format like XML/JSON?
7. What is the format in which I need to return the dates/info? How will this data be used?
8. What is the format of the user id? Is it global to all services or does it need conversion? If it isn’t global, do I have an API to translate global ids to local ids?
9. Is the way a user id is represented likely to be changed? How about date and info?
10. Is the user info in different languages? Are the dates in different time zones?
11. What should happen if one service is not responding / reports an error?
12. What are the APIs for these services? Do I need to design an API for them as well?
13. Should I validate the input? What happens if the date range is invalid?

Then, based on the answers (or lack thereof), I would propose the following ideas:

We want to be able to access the services uniformly so we need to have a base interface for all of them. In the more likely case that the services are not available for extension this would be an interface to an adaptor class. This interface will be very similar to the aggregator interface. The main method of this interface is something like

struct service_interface 
{
	virtual map<date_t, info_t> get_user_data_in_range(uid_t uid, date_t start, date_t end) = 0;
	virtual ~service_interface() = default;
};

We have to define a virtual destructor for the derived / implementing classes, since we have a virtual method. The types date_t, info_t and uid_t are the aggregator defined types that the different services need to convert their representation of ids/dates/info from and into. For maximum compatibility these types should not be complex classes but simple strings / ints. It may also be useful to use simple types to avoid exposing a complex class representation to the services if it may be changed later. Either way these types may be typedefs. If I knew nothing about the services and could assume nothing about the usage / requirements of the system I would simply use strings for everything, possibly unicode strings if the info is international. The important thing is that all services need to conform to a specific format for the dates and info they return and the user id they receive. This is the part of the system that shouldn’t change and that’s why wide strings are attractive, because they are fairly abstract and scalable. For example, if we use JSON or XML as the format of the string and for some reason a new service appears and wants to add new data to the user info representation, it doesn’t break the interface and only the parser has to account for this possibility. Now as for the aggregator service, if it is to be used in a very specific way that isn’t expected to change it may define a similar method with different data types, maybe even complex data types that make it easy to use. It may even provide a single data type but be open for extension via overloading. Now this depends on the language but generally it could provide a template method of the form

struct aggregator 
{
	template<class D, class I>
	void get_user_data_in_range(uid_t uid, D start, D end, map<D,I>& result)
};

Notice that the signature is slightly different because practically for c++ we can’t use the data representation as the return argument because of conflicting specialization overloads.
Now we can provide specific template specialization for different D/I return types. The basic implementation throws “not implemented”. Another option is that it would use basic types again and let the clients do the parsing. This seems like unnecessary delegation but it may be more flexible in case the service representations are changed often and only the clients know how they have changed. It would make the aggregator code very simple but would add more code to the client. These two approaches do not conflict and can be used together. Additionally, if the number of services is changing dynamically we may want the aggregator interface to support this, i.e. provide add and remove methods for the services of the form

void add_service(shared_pointer<service_interface> s);
void remove_service(shared_pointer<service_interface> s);

The return value may be boolean if we want to return previous existence or not. We will also need then to keep these interfaces in a data structure for the implementation.
Now about the aggregator, notice that this isn’t an interface but rather an implementation as it assumes only one way to combine the data, perhaps to be represented in different types. If we plan to have many different aggregators and we want to use them in some polymorphic fashion then the template solution is not suitable. We need to define an aggregator interface and we may need to resort to the solution of delegating data representation to the client. The interface will contain a set of virtual methods to get the data in relatively simple types that will be implemented differently by different aggregator implementations. For example, one aggregator may return the range in days and another in months, or one aggregator will do some processing on the data to convert it to a different format. This means that the previously defined methods will be virtual and again we need a virtual destructor. Additionally, we won’t have a data structure in the interface and we will let the different aggregators decide how / if they want to store their service interfaces.
Additionally, if some services are slow network services or disk services and the aggregator is read-heavy, i.e. there are many overlapping requests at a time it may be useful to implement an aggregator with an in-memory cache like memcached with LRU or some similar method. This may make the interface faster at the cost of more RAM and more complicated algorithms. It may also make things difficult in case the services update the data asynchronously, in which case they will have to provide publish/subscribe methods in the service interface to the aggregator so it can be informed of invalid cache entries.
Finally, if some results cannot fit in memory the aggregator implementation may need to perform some kind of external sort and the aggregator interface, rather than providing the entire map in memory, would provide an iterator to the map. This would have to be an extension to the interface.

- Omri.Bashari June 29, 2015 | Flag Reply

CareerCup

Google Interview Question for Member Technical Staffs

Books

Videos

Resume Review

Mock Interviews