Interview Question
Country: United States
Interview Type: In-Person
@shilpitaroy87: distribution is no an easy problem, you have only 1 Mio sessions, active players will probably move with time zones around the world (maybe at night at a certain location...). I guess at this point the 10^6 sessions make maybe not so much sense anymore (technically). Maybe you have servers in each continent, just to improve gaming experience due to round-trip times and to save costs. So, a session might be local to a geographical location as well. That's the first and maybe easiest catch.
Then even in a geographical local zone, you might end up with the need to have two or three such session servers (replicas). If you can have a master and a hot-standby, then the only thing you need to make sure, is, that your slave is "almost always" in sync with the master and a load balancer that will switch to the slave if the master is offline (and a strategy how the master can get back online).
But what if one master can't deal with the traffic anymore? You could start with the DNS server again and do round robin etc. How ever, this will partition your session space, etc. etc.
I think at this point there are so many questions to ask, a generic answer is not possible anymore. What infrastructure does the company have, do they host on a public cloud? what does that cloud offer? how many concurrent sessions in a time zone do we expect? why 6^10 sessioin id's, why don't we extend? Should we do a perfect solution that only few can pay for or a pragmatic approach that we can grow later, if we have money? Are we in a startup or are we Google? etc. etc. etc.
Sorry the previous is not formatted.
Based on ChrisK's solution, here is a thread based implementation. I think the skip list can be avoided if we can calculate the array index -
public class SessionsGetAndPut {
public static void main(String[] args) {
int[] sessionArr = new int[320000];
Put put = new Put(sessionArr);
Get get = new Get(sessionArr);
for (int i = 0; i < 1000; i++) {
Thread pTh = new Thread(put);
Thread gTh = new Thread(get);
pTh.start();
gTh.start();
}
}
static class Put implements Runnable {
private int[] sessionArr;
public Put(int[] sessionArr) {
super();
this.sessionArr = sessionArr;
}
@Override
public void run() {
put();
}
public void put() {
Random rand = new Random();
int session = rand.nextInt(1000000);
System.out.println("Putting session value - " + session);
int val = sessionArr[session / 32];
int mod = session % 32;
sessionArr[session / 32] = (val | (1 << mod));
}
}
static class Get implements Runnable {
private int[] sessionArr;
public Get(int[] sessionArr) {
super();
this.sessionArr = sessionArr;
}
@Override
public void run() {
System.out.println("Got session value - " + get());
}
public int get() {
Random rand = new Random();
int session = -1;
while (session == -1) {
int loc = rand.nextInt(1000000);
int val = sessionArr[loc / 32];
int mod = loc % 32;
if ((val & (1 << mod)) > 0)
session = loc;
sessionArr[loc / 32] = (val & ~(1 << mod));
}
return session;
}
}
}
Based on ChrisK's solution, here is a thread based implementation. I think the skip list can be avoided if we can calculate the array index -
public class SessionsGetAndPut {
public static void main(String[] args) {
int[] sessionArr = new int[320000];
Put put = new Put(sessionArr);
Get get = new Get(sessionArr);
for (int i = 0; i < 1000; i++) {
Thread pTh = new Thread(put);
Thread gTh = new Thread(get);
pTh.start();
gTh.start();
}
}
static class Put implements Runnable {
private int[] sessionArr;
public Put(int[] sessionArr) {
super();
this.sessionArr = sessionArr;
}
@Override
public void run() {
put();
}
public void put() {
Random rand = new Random();
int session = rand.nextInt(1000000);
System.out.println("Putting session value - " + session);
int val = sessionArr[session / 32];
int mod = session % 32;
sessionArr[session / 32] = (val | (1 << mod));
}
}
static class Get implements Runnable {
private int[] sessionArr;
public Get(int[] sessionArr) {
super();
this.sessionArr = sessionArr;
}
@Override
public void run() {
System.out.println("Got session value - " + get());
}
public int get() {
Random rand = new Random();
int session = -1;
while (session == -1) {
int loc = rand.nextInt(1000000);
int val = sessionArr[loc / 32];
int mod = loc % 32;
if ((val & (1 << mod)) > 0)
session = loc;
sessionArr[loc / 32] = (val & ~(1 << mod));
}
return session;
}
}
}
Cool, I like the randomization approach, but would still combine it with skip list, imagine you have to find the last empty slot in 1 Mio, you will, in average, try 1 Mio Times. There are two issues, if the empty slots get few, it takes longer and longer to find it, the second is, it's non deterministic in terms of run time - that is all easy with few active sessions.
The other issue is, a 6 digit session number by random doesn't really introduce security, meaning it's still easy to guess a session id if there is no other security measure. Randomized session key that introduce some security are significantly larger and thus the space of used numbers is sparse.
But randomized with skip list would be cool :-) although still nondeterministic, but that might be ok... ...
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#define MIN 0
#define MAX 1000000
typedef struct {
char *p;
unsigned int id;
}session;
void get(session *s){
srand(time(0));
s->id = (random()%(MAX-MIN+1))+MIN;
usleep(500*1000);
return;
}
int main (){
session *ps;
int num,i;
printf("No of players?\n");
scanf("%d",&num);
ps = (session *)malloc(sizeof(session)*num);
for(i=0;i<num;i++){
printf("Name of player?\n");
(ps+i)->p = (char *)malloc(sizeof(char)*10);
scanf("%s",(ps+i)->p);
get (ps+i);
printf("%s %d\n",(ps+i)->p,(ps+i)->id);
}
return 0;
}
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#define MIN 0
#define MAX 1000000
typedef struct {
char *p;
unsigned int id;
}session;
void get(session *s){
srand(time(0));
s->id = (random()%(MAX-MIN+1))+MIN;
usleep(500*1000);
return;
}
int main (){
session *ps;
int num,i;
printf("No of players?\n");
scanf("%d",&num);
ps = (session *)malloc(sizeof(session)*num);
for(i=0;i<num;i++){
printf("Name of player?\n");
(ps+i)->p = (char *)malloc(sizeof(char)*10);
scanf("%s",(ps+i)->p);
get (ps+i);
printf("%s %d\n",(ps+i)->p,(ps+i)->id);
}
return 0;
}
updated:
He may wanted a bit array instead of HT. Why? Because a HT to store a single session id will use more than a 32 bit value, so, when all sessions are assigned you have more than 4*1 Mio. That's probably between 4 MB and 8 MB, not a big deal actually. Lookup is O(1), etc... But you can just do better in terms of memory usage and constant factors if you expect many many concurrent sessions (not good for 10 or 100).
Use an integer array, let's say 64 bit: each int can store 64 used/not used Infos. You will use 1MBit, that's 32 times less space. Put and set is bit shifting, direct memory access and bit masking. No compensated O(1) when growing HT, no hash calculation, no re-allocation.
something else to consider is the get function. How to get quickly a not used session. O(n) is obvious but not good. We could introduce a skip list (I think it is called) which would be an array that marks if there is an empty slot in the layer below, each layer reducing the size by, let's say 64.
at the bottom layer you have ceiling(10^6 / 64) 64 bit integers = ~16'000 ints, 128 KByte
the next layer above could store in a 64 bit if any of below 64 integers have a single slot free, that's 250 64 bit integers, 2 KBytes
the next layer above would be 4 64 bit integers
... you get log64(n) layers, so you could find an empty session in log64(1Mio) * 64 bit manipulation steps (worst case) 5 * 64 ~ 300 instructions, little cache misses, should be fast
The 64 I used because I assume a 64 bit architecture, so, it makes little sense to fetch 32 bit values from the memory / cache while I can 64 with the same speed. Bottlenect will be memory, cache I/O anyway, so use the bus width.
- Chris November 02, 2017As well interesting would be a follow up, how to distribute/scale across machines if you had many more sessions and want some stability against single machine failure / network partitioning.