Palantir Technology Interview Question for SDE1s


Country: India
Interview Type: Written Test




Comment hidden because of low score. Click to expand.
1
of 1 vote

Implement a simple, persistent, thread-safe, cache, which should ideally be able to store up to 1 million product names.

Define:
Cache: Key -> value. get, write
Persistent: Write to a data-base
Thread-safe: Multiple requests to get and write simultaneously


Store 1 million product names

Constraints:
Space: Locking, datastructure to store information
(alternative data-structures may require less locking like trees,
trade-off space-time)
Request return time
thread-safe

Brute force(dont optimize): (Dont consider the constraints) unlimited space
data storage:
hash map

startup:
initialize cache (possibly have seriaslizd cache in database, load into hash-map)

event handlers
Maintain queue of requests.
Buffer queue of requests - handle sequentially

shut down:
write all data in hash map to data-base

test cases:

add individual features:
Parallelize requests:
New architecture:
load balancer
shared state:
Master slave, atomic commit protocol

sequence of get requests : trivially distribute to other threads

get(x), write(x', y)
x != x'
- parallelize
x == x':
sequential.

Thread safe:
Given a queue sequence.
perform topological sort on the queue and parallelize it.
write(x, y, t) depends on get(x, t') or write(x, y', t'') with largest t'/t''
computation of deps for static list:
given mp: optype -> x -> opId
and we have computed deps for operations uptil time 't'.
for operation at time 't+1'.
get(x, t+1) -> mp write x, mp = mp (read x = t+1)
write(x, y, t+1) -> max (mp write x) (mp read x), mp (write x = t+1)

Thus in O(1) we can compute the new dependency relation for any set of enquiries.
Now given that we have topologically sorted the input into some graph directed graph G.

We need to do two things:
1. Define an update for the graph.
get(x, t+1) we have the new node it gets assigned to n
if n == null
a) create a new starting node n' in the graph G and update GM : nodeId -> node
nodeIds are unique for now the timestamp (problem: space of timestamps is small, could wrap around address space or is drawn from some pool and consumed nodes retire their ids)
attempt to acquire lock on n:
if n has been consumed then go to (a)
otherwise add current node as child of n.
release the lock

2. Traverse the graph and schedule events to corresponding nodes.
for each of the starting frontier:
try and acquire a node.
once you acquire the node,
dispatch for consumption to some worker from a thread pool
then mark as consumed
release the lock.


add pairs of features
add triples of features
etc.

Persistence:
On simple example with queue of things:
Intersperse writes to data base.
Log information for each intermediate log.
Functionality^ state has a rollback, intermediate means high volume of writes
reduces data base as bottleneck.
Intermediate log helps recover not all but some information of failed missed transactions.

Trivially parallelizable to alternate solution:
Concerns: Race conditions from simultaneous writes.
Order writes to same resource:
Enforced by traversal of topologically sorted tree

Data storage ballpark:
10^6 records
record: key, value
8bytes - key
value - 1 MB

Graph: (noderef, edge, time)
Graph Map: key -> noderef
Stored in database values:
1000 requests per second.

time required for write:
0.1s

time required for read:
0.0001s

x reads per second,


goal:
calculate time required to service above quota of requests.
time: max possible response time.
sequence of writes all to the same resource.
(1000 - x) writes all to e.
preceded by x reads all to e.
x reads, 1000- x writes all to x.
maximum possible delay time is 99 seconds if all are writes.

Average consumption of space considering each resource recieves an equal number
of requests.
1000 requests, n resources.
1000/n requests.
1000/n*0.1 - 1 is the maximum time required to respond to a request.

maximum possible space consumed locally when servicing such requests.
space consumed:
1000 requests per second.

Graph: (noderef, edge, time)
Maximum number of elements in the graph over the course of a second.
In the beginning 0
n write request all to the same resource.
990 requests are buffered. (8 bytes, 4 bytes, 4 bytes) 16KB.

Graph Map: key -> noderef

- Anonymous October 26, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
-1
of 1 vote

Implement a simple, persistent, thread-safe, cache, which should ideally be able to store up to 1 million product names.

Define:
Cache: Key -> value. get, write
Persistent: Write to a data-base
Thread-safe: Multiple requests to get and write simultaneously


Store 1 million product names

Constraints:
Space: Locking, datastructure to store information
(alternative data-structures may require less locking like trees,
trade-off space-time)
Request return time
thread-safe

Brute force(dont optimize): (Dont consider the constraints) unlimited space
data storage:
hash map

startup:
initialize cache (possibly have seriaslizd cache in database, load into hash-map)

event handlers
Maintain queue of requests.
Buffer queue of requests - handle sequentially

shut down:
write all data in hash map to data-base

test cases:

add individual features:
Parallelize requests:
New architecture:
load balancer
shared state:
Master slave, atomic commit protocol

sequence of get requests : trivially distribute to other threads

get(x), write(x', y)
x != x'
- parallelize
x == x':
sequential.

Thread safe:
Given a queue sequence.
perform topological sort on the queue and parallelize it.
write(x, y, t) depends on get(x, t') or write(x, y', t'') with largest t'/t''
computation of deps for static list:
given mp: optype -> x -> opId
and we have computed deps for operations uptil time 't'.
for operation at time 't+1'.
get(x, t+1) -> mp write x, mp = mp (read x = t+1)
write(x, y, t+1) -> max (mp write x) (mp read x), mp (write x = t+1)

Thus in O(1) we can compute the new dependency relation for any set of enquiries.
Now given that we have topologically sorted the input into some graph directed graph G.

We need to do two things:
1. Define an update for the graph.
get(x, t+1) we have the new node it gets assigned to n
if n == null
a) create a new starting node n' in the graph G and update GM : nodeId -> node
nodeIds are unique for now the timestamp (problem: space of timestamps is small, could wrap around address space or is drawn from some pool and consumed nodes retire their ids)
attempt to acquire lock on n:
if n has been consumed then go to (a)
otherwise add current node as child of n.
release the lock

2. Traverse the graph and schedule events to corresponding nodes.
for each of the starting frontier:
try and acquire a node.
once you acquire the node,
dispatch for consumption to some worker from a thread pool
then mark as consumed
release the lock.


add pairs of features
add triples of features
etc.

Persistence:
On simple example with queue of things:
Intersperse writes to data base.
Log information for each intermediate log.
Functionality^ state has a rollback, intermediate means high volume of writes
reduces data base as bottleneck.
Intermediate log helps recover not all but some information of failed missed transactions.

Trivially parallelizable to alternate solution:
Concerns: Race conditions from simultaneous writes.
Order writes to same resource:
Enforced by traversal of topologically sorted tree

Data storage ballpark:
10^6 records
record: key, value
8bytes - key
value - 1 MB

Graph: (noderef, edge, time)
Graph Map: key -> noderef
Stored in database values:
1000 requests per second.

time required for write:
0.1s

time required for read:
0.0001s

x reads per second,


goal:
calculate time required to service above quota of requests.
time: max possible response time.
sequence of writes all to the same resource.
(1000 - x) writes all to e.
preceded by x reads all to e.
x reads, 1000- x writes all to x.
maximum possible delay time is 99 seconds if all are writes.

Average consumption of space considering each resource recieves an equal number
of requests.
1000 requests, n resources.
1000/n requests.
1000/n*0.1 - 1 is the maximum time required to respond to a request.

maximum possible space consumed locally when servicing such requests.
space consumed:
1000 requests per second.

Graph: (noderef, edge, time)
Maximum number of elements in the graph over the course of a second.
In the beginning 0
n write request all to the same resource.
990 requests are buffered. (8 bytes, 4 bytes, 4 bytes) 16KB.

Graph Map: key -> noderef

- Anonymous October 26, 2015 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More