Microsoft Interview Question
Senior Software Development EngineersCountry: United States
So you want to have one (or more) sender who reads the file and multiple machines processing and building the output file. Depending on the size of each machine's memory, we need to come up with hashing function. Run the id through has and find out which machine (1-16 for ex) will be processing it. Note that both files with same id has to go to the same machine to process it.
Each machine will receive the id -> name or id ->address and it needs to merge the records by keeping the id in hashSet.
Once all the machines are done processing it, they can write to single file one by one.
int ids won't be usually preferred as it does not have enough range. GUID is usually preferred and in this case you can simply take the first char of GUID and send it to corresponding machine if we decide to take 16 machines each capable of running
memory = total possible number of Ids / 16
The solution will depend on the format of the file and whether they are sorted based on id or not, if we assume thet the each line contains an entry for each id with comma separating id and name or id and address and a given line in the two file correspond to the same id then it will be a simple solution.
- The Artist May 24, 2020