Palantir Technology Interview Question
SDE1sCountry: India
Interview Type: Written Test
Implement a simple, persistent, thread-safe, cache, which should ideally be able to store up to 1 million product names.
Define:
Cache: Key -> value. get, write
Persistent: Write to a data-base
Thread-safe: Multiple requests to get and write simultaneously
Store 1 million product names
Constraints:
Space: Locking, datastructure to store information
(alternative data-structures may require less locking like trees,
trade-off space-time)
Request return time
thread-safe
Brute force(dont optimize): (Dont consider the constraints) unlimited space
data storage:
hash map
startup:
initialize cache (possibly have seriaslizd cache in database, load into hash-map)
event handlers
Maintain queue of requests.
Buffer queue of requests - handle sequentially
shut down:
write all data in hash map to data-base
test cases:
add individual features:
Parallelize requests:
New architecture:
load balancer
shared state:
Master slave, atomic commit protocol
sequence of get requests : trivially distribute to other threads
get(x), write(x', y)
x != x'
- parallelize
x == x':
sequential.
Thread safe:
Given a queue sequence.
perform topological sort on the queue and parallelize it.
write(x, y, t) depends on get(x, t') or write(x, y', t'') with largest t'/t''
computation of deps for static list:
given mp: optype -> x -> opId
and we have computed deps for operations uptil time 't'.
for operation at time 't+1'.
get(x, t+1) -> mp write x, mp = mp (read x = t+1)
write(x, y, t+1) -> max (mp write x) (mp read x), mp (write x = t+1)
Thus in O(1) we can compute the new dependency relation for any set of enquiries.
Now given that we have topologically sorted the input into some graph directed graph G.
We need to do two things:
1. Define an update for the graph.
get(x, t+1) we have the new node it gets assigned to n
if n == null
a) create a new starting node n' in the graph G and update GM : nodeId -> node
nodeIds are unique for now the timestamp (problem: space of timestamps is small, could wrap around address space or is drawn from some pool and consumed nodes retire their ids)
attempt to acquire lock on n:
if n has been consumed then go to (a)
otherwise add current node as child of n.
release the lock
2. Traverse the graph and schedule events to corresponding nodes.
for each of the starting frontier:
try and acquire a node.
once you acquire the node,
dispatch for consumption to some worker from a thread pool
then mark as consumed
release the lock.
add pairs of features
add triples of features
etc.
Persistence:
On simple example with queue of things:
Intersperse writes to data base.
Log information for each intermediate log.
Functionality^ state has a rollback, intermediate means high volume of writes
reduces data base as bottleneck.
Intermediate log helps recover not all but some information of failed missed transactions.
Trivially parallelizable to alternate solution:
Concerns: Race conditions from simultaneous writes.
Order writes to same resource:
Enforced by traversal of topologically sorted tree
Data storage ballpark:
10^6 records
record: key, value
8bytes - key
value - 1 MB
Graph: (noderef, edge, time)
Graph Map: key -> noderef
Stored in database values:
1000 requests per second.
time required for write:
0.1s
time required for read:
0.0001s
x reads per second,
goal:
calculate time required to service above quota of requests.
time: max possible response time.
sequence of writes all to the same resource.
(1000 - x) writes all to e.
preceded by x reads all to e.
x reads, 1000- x writes all to x.
maximum possible delay time is 99 seconds if all are writes.
Average consumption of space considering each resource recieves an equal number
of requests.
1000 requests, n resources.
1000/n requests.
1000/n*0.1 - 1 is the maximum time required to respond to a request.
maximum possible space consumed locally when servicing such requests.
space consumed:
1000 requests per second.
Graph: (noderef, edge, time)
Maximum number of elements in the graph over the course of a second.
In the beginning 0
n write request all to the same resource.
990 requests are buffered. (8 bytes, 4 bytes, 4 bytes) 16KB.
Graph Map: key -> noderef
Implement a simple, persistent, thread-safe, cache, which should ideally be able to store up to 1 million product names.
- Anonymous October 26, 2015Define:
Cache: Key -> value. get, write
Persistent: Write to a data-base
Thread-safe: Multiple requests to get and write simultaneously
Store 1 million product names
Constraints:
Space: Locking, datastructure to store information
(alternative data-structures may require less locking like trees,
trade-off space-time)
Request return time
thread-safe
Brute force(dont optimize): (Dont consider the constraints) unlimited space
data storage:
hash map
startup:
initialize cache (possibly have seriaslizd cache in database, load into hash-map)
event handlers
Maintain queue of requests.
Buffer queue of requests - handle sequentially
shut down:
write all data in hash map to data-base
test cases:
add individual features:
Parallelize requests:
New architecture:
load balancer
shared state:
Master slave, atomic commit protocol
sequence of get requests : trivially distribute to other threads
get(x), write(x', y)
x != x'
- parallelize
x == x':
sequential.
Thread safe:
Given a queue sequence.
perform topological sort on the queue and parallelize it.
write(x, y, t) depends on get(x, t') or write(x, y', t'') with largest t'/t''
computation of deps for static list:
given mp: optype -> x -> opId
and we have computed deps for operations uptil time 't'.
for operation at time 't+1'.
get(x, t+1) -> mp write x, mp = mp (read x = t+1)
write(x, y, t+1) -> max (mp write x) (mp read x), mp (write x = t+1)
Thus in O(1) we can compute the new dependency relation for any set of enquiries.
Now given that we have topologically sorted the input into some graph directed graph G.
We need to do two things:
1. Define an update for the graph.
get(x, t+1) we have the new node it gets assigned to n
if n == null
a) create a new starting node n' in the graph G and update GM : nodeId -> node
nodeIds are unique for now the timestamp (problem: space of timestamps is small, could wrap around address space or is drawn from some pool and consumed nodes retire their ids)
attempt to acquire lock on n:
if n has been consumed then go to (a)
otherwise add current node as child of n.
release the lock
2. Traverse the graph and schedule events to corresponding nodes.
for each of the starting frontier:
try and acquire a node.
once you acquire the node,
dispatch for consumption to some worker from a thread pool
then mark as consumed
release the lock.
add pairs of features
add triples of features
etc.
Persistence:
On simple example with queue of things:
Intersperse writes to data base.
Log information for each intermediate log.
Functionality^ state has a rollback, intermediate means high volume of writes
reduces data base as bottleneck.
Intermediate log helps recover not all but some information of failed missed transactions.
Trivially parallelizable to alternate solution:
Concerns: Race conditions from simultaneous writes.
Order writes to same resource:
Enforced by traversal of topologically sorted tree
Data storage ballpark:
10^6 records
record: key, value
8bytes - key
value - 1 MB
Graph: (noderef, edge, time)
Graph Map: key -> noderef
Stored in database values:
1000 requests per second.
time required for write:
0.1s
time required for read:
0.0001s
x reads per second,
goal:
calculate time required to service above quota of requests.
time: max possible response time.
sequence of writes all to the same resource.
(1000 - x) writes all to e.
preceded by x reads all to e.
x reads, 1000- x writes all to x.
maximum possible delay time is 99 seconds if all are writes.
Average consumption of space considering each resource recieves an equal number
of requests.
1000 requests, n resources.
1000/n requests.
1000/n*0.1 - 1 is the maximum time required to respond to a request.
maximum possible space consumed locally when servicing such requests.
space consumed:
1000 requests per second.
Graph: (noderef, edge, time)
Maximum number of elements in the graph over the course of a second.
In the beginning 0
n write request all to the same resource.
990 requests are buffered. (8 bytes, 4 bytes, 4 bytes) 16KB.
Graph Map: key -> noderef