Programming Interview Questions

Chris

SWE

Questions (1)

Comments (50)

Follows (1)

Followers (6)

Reputation 2645

Page:

1 2 3 4 5 6 7 8 9

Comment hidden because of low score. Click to expand.

of 3 vote

You need to find a path from start that does not end in the end node. You can brute force all paths for from start. You can optimize if you store for a given node if it has a path to end so you do not redo the same work over and over. Recursive solution should be straight forward.

- Chris January 02, 2019 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

like this

#include <algorithm>
#include <iostream>
#include <string>
#include <utility>
#include <unordered_map>
#include <vector>

using namespace std;

vector<string> topNVideos(const vector<pair<string, unsigned int>>& videos, 
                          size_t topN) {
    // TODO: topN is a constant 10 in the given problem, this method only behaves
    // nice if topN is small. 
    unordered_map<string, unsigned int> video_count;
    vector<pair<string, unsigned int>> result_count;
    
    for (const auto& v : videos) {
        video_count[v.first] += v.second; // TODO: ensure no overflow
        // notice here as well, the string as key, which makes the thing not O(n) 
        // but O(n*avg_string_length(video_title)), probabily in practice we would
        // work with an integer e.g. a reference to the actual title ...
        // As well notice the copying around of string's which is not nice
        // maybe we would use somethign like std::string_view in practice with C++ 17.
    }
    
    for (const auto& vc : video_count) {
        // insertion sort for small and constant topN ok, makes that thing
        // O(N*10) which is O(N). There are better ways to cut off topN.
        // (e.g. quicksorts partition, especially if topN doesn't need to be sorted)
        result_count.push_back(vc);
        for (size_t i = result_count.size() - 1; i > 0; --i) {
            if (result_count[i].second <= result_count[i - 1].second) break;
            swap(result_count[i], result_count[i - 1]);
        }
        if(result_count.size() > topN) {
            result_count.pop_back();
        }
    }

    vector<string> result(result_count.size());
    transform(result_count.begin(), result_count.end(), result.begin(), 
        [](const pair<string, unsigned int>& vcount) -> string { return vcount.first; });
    return result;
}
    
int main()
{
    vector<pair<string, unsigned int>> movies = {
        {"Rambo", 1}, 
        {"Indiana Jones", 2}, 
        {"Lethal Weapon", 2}, 
        {"Indiana Jones", 5}, 
        {"Rambo", 1},
        {"Lethal Weapon", 4},
        {"Casablanca", 7},
        {"Star Wars", 9},
        {"The Godfather", 15},
        {"Wallstreet", 3}};
    
    for(const string& video : topNVideos(movies, 4)) { // top 4 instead of top 10
        cout << video << endl;
    }
}

- Chris September 04, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

1-pass: per video the number of watches (sum, remove dublicates)
2-pass: pick top 10 from that list

both are O(n) time. It is as well O(n) space.

- Chris September 04, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

the maximum is obviously the set it self so 1|2|3 in above case (=1|3=2|3}, the smallest seem to be the smallest|2nd smallest. O(n) algo, find the smallest, 2nd smallest and or all elements... or and sum. done.

- Chris May 23, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

@sriram.itech: how about: ababaa with aba as pattern:
A) Replace first (aba)baa -> baa -> false
B) Replace 2nd ab(aba)a -> (aba) -> true
str.replace starts either from start (A) or end .. so it won't work for all productions

Maybe reconsider your solution

- Chris May 23, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

The plain recursion is obviously O(2^n) because if the pattern is something like aa and the string is like aaaa at every position you have two valid options, advance in the pattern or recurse it. So, the naive is exponential. Whether you start at the end or beginning doesn't make a difference.Other examples are ababaa with aba... O(m*n) using memoization isn't that bad.

Better will be preprocessing the generator string, finding out which suffix is a common prefix, so you can optimize backtracking. Knuth-Morris-Pratt O(n+m) style algorithm... but in 30 Minutes, assuming you first want to be safe by doing some brute force is tough.

- Chris May 22, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

took @Gabriels code, made it purely recursive and added missing case.
If you start thinking recursive it's easy:
- for the +: must advance the pattern, but two options with the string, advance or not
- for the *: three options: advance pattern and string, advance only pattern, advance only string
- if not + and not *: if they match advance pattern and string
- termination: if pattern reached the end but string didn't (no match), if both reached the end (match)
do not forget all the border cases, so there is no buffer overrun

#include <iostream>
#include <string>

using namespace std;
using iter = string::iterator;

bool reg(iter is, iter ie, iter rs, iter re) 
{
	if (is == ie && rs == re) return true;
	if (rs == re) return false;

	if (*rs == '+') {
		if (reg(is, ie, rs + 1, re)) return true; // advance pattern but not input (empty character)
		return (is != ie) && reg(is + 1, ie, rs + 1, re); // advance input and pattern, if input not at the end
	}
	if (*rs == '*') {
		if (reg(is, ie, rs + 1, re)) return true; // advance pattern but not input (empty sequence)
		return (is != ie) && (reg(is + 1, ie, rs, re) || reg(is + 1, ie, rs + 1, re)); // advance input but not patter OR advance input and pattern		
	}
	return (is != ie) && (*is == *rs) && reg(is + 1, ie, rs + 1, re); // if they match recurse
}

void test(string s, string regx, bool match) {	
	cout << "TEST CASE input: '" << s << "' expression '" << regx << "'"
		<< (reg(s.begin(), s.end(), regx.begin(), regx.end()) == match ? " PASSED" : " FAILED")
		<< endl;
}

int main() {
	test("baaabab", "ba*a++", true);
	test("baaabab", "ba*a+", true);
	test("baaabab", "a*ab", false);
	test("", "", true);
	test("1", "", false);
	test("", "+", true);
	test(" ", "+", true);
	test(" ", "+*", true);
	test(" ", "+++", true);
	test(" ", "+**", true);
	test("", "+**", true);
	test("", "+**+", true);
	test("a", "+**+", true);
	test("zzaazzuu", "+**+", true);
	test("zzaazzuup", "+*p*+", true);
	test("zzaazzuup", "+a**+", false);
	test("aaaaaa", "+**", true);
	test("aaaabbba", "+**", false);
	test("aabc", "a+++bc", true);

	return 0;
}

so what's space and runtime complexity (n is length string, m is length pattern)
space is the stack, and the stack depth will never exceed O(m + n) (actually O(max(n,m)) because it will always follow a branch to the end where it fails latest or succeeds.

worst case time is trickier:
- theoretically it branches of at the * case three times O(3 ^ (n + m))
- practically string: zzzq, pattern: ***z

how to improve is an interesting discussion here:
- preprocessing the pattern to only test the suffix if prefixes are common: this is not trivial with the * and +
- creating an automaton, in it's core the same as above
- no of them is trivial, there is a simpler dp way hidden in the recursion so one can prune some branches for example, if the pattern position and string position has already been senn, one can store the result already obtained. That way it should come down to O(m*n) when adding memoization.

- Chris May 21, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I am not sure I fully understand your approach. What I understand:
- find components of the graph (connected islands of sensors that are within reach)
- if such a components max y and min y is beyond both corridor lines it will block the way through

Finding those components can be done in O(n^2) naively, there is an optimization to this that does not improve worst case, but is better with lots of sensors. The question is, how to optimize looking at n-1 sensors to find adjacent sensors of a given sensor.
- rasterize the sensor positions and maintain a list of sensors located in a raster square
- for a given sensor, check the surrounding squares, if raster square is the sensor reach (diameter) you need to check 9 squares, accessing those sensors is O(1) e.g. with a hashtable on raster x,y... if all sensors are located in 9 or fewer squares, you do not win anything, if they are normally distributed over thousands of squares, you will end up in O(n*n/squarecount)

Cool question, thanks for sharing.

- Chris May 01, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

1. To only add a single person, do one pass and pick the largest gap, keep start index and gap. O(n) time O(1) space
2. To add multiple, pass once and build a max heap (prio. queue) with gaps and position, ordered by gap, when adding a person remove top element, calculate position where to add, if left or right gap exists, add those to the heap.
O(n) space, initially takes O(n*lg(n)) to form heap, later it takes O(lg(n)) to add a person

- Chris April 17, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

The question asks for O(1) access to a random element, you are allowed to preprocess. To pick a random element n O(1) you need to know how many quotes there are, create a random number and access in O(1). If the quotes do not change, you traverse once and store in a second file (the index file) the offset to the quote. You can read the index into memory or random access it (fixed record length). If you do not want to self create an index use a dbms. If you need to maintain the index after adding/deleting quotes, use a btree. Of the stuff doesn't fit on a single machine, shard.

- Chris March 27, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I would say inverse and test for prefix is best to achieve. How ever reverse it in place, no auxiliary space to reverse and maybe mention to reverse it back after wards and note what this would mean for other concurrent accesses to that same structure.

- Chris March 20, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

// let's assume only 26 characters, all lower case
// a >= b
bool is_greater_or_equal(const string& a, const string& b, const vector<int>& rank) {
	size_t l = min(a.size(), b.size());
	for (size_t i = 0; i < l; ++i) {
		int ao = rank[a[i] - 'a'];
		int bo = rank[b[i] - 'a'];
		if (ao != bo) return ao > bo;
	}
	return a.size() >= b.size();
}

int main()
{
	vector<string> words{"cc", "cb", "bb", "ac" };
	vector<char> ordering{ 'c', 'b', 'a' };
	vector<int> rank(26, -1);	
	
	int i = 0;
	for (char c : ordering) rank[c - 'a'] = i++;	
	for (size_t i = 1; i < words.size(); ++i) {
		if (!is_greater_or_equal(words[i], words[i - 1], rank)) {
			cout << "not ordered";
			return 1;
		}
	}
	cout << "ordered";
	return 0;

- Chris February 28, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

@prudent_programmer: To determine if a sequence of strings is ordered one needs to look at every character of every word - thus O(number of words * length of word)

- Chris February 28, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

combinations of what? Any amount of any number and any operations? Of course its NP hard, as it is at least as hard as subset sum.

- Chris February 23, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Theoretical all paths, which is NP hard and makes little sense. You can adapt A* (a star) which estimates the remaining distance e.g. by using euclidian distance. You just do not consider candidates in the queue that are longer (actual + estimate) than a certain value (e.g. two times the distance from your location to the airport)...
That way you could solve it more or less efficiently. A star is basically Dijkstra, but it adds the estimated remaining distance to the actual distance. And it will not end in your case with the first route but it will continue searching untill all paths got to the destination or exceeded the max distance you decide to search for.

The other problem, I assume we can assume the large file is chronological, so we might want to use binary search.

- Chris February 17, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I would first define what a column in a tree is. I think a recursion can help:
#columns(root)=#columns(root.left)+length(root.value)+#columns(root.right),
#columns(empty tree)=0

The position at which to print node value is: (level of the node, #columns(root.left))

If I can't print at x,y positions I need a level order traversal to print line by line.

It is O(n) time and O(n) space (for the level order traversal and the position)

Note, there are better ways to layout the tree, if the tree has branches that can be kind of nested into each other.

- Chris February 07, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

@Alex solution is what I'd do as well. Normalize pattern and string.

- Chris January 10, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

1. it's unclear why its relevant that push pushes to the head. Because if pop takes out all elements with the same probability, as soon as there is more than one element in it, the "list" acts like a perfect shuffler. What you get out is randomized, no order, nothing. So, you will need to sort in some way. That may be, you push the element and it's original position so you can reconstruct easy, or you sort the output.
2. There is no difference between 2 and 1. You see a shuffled array when you try to access the data. Where it has been shuffled is really not that much of a deal.

I sugest restating the question so it becomes clearer.

- Chris January 10, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

of 2 vote

@dummy: obviously she/he is posting a lot of questions, they are not necessarily fake, they are good practice actually ;-) some of them are/were questions at these companies she's referring to.

- Chris January 09, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

@Alex, two things:
1) I I defined it a round table, because I didn't like the special cases on the left and right end. so, 0 would be next to 6. First thing was a straight bench, came to mz mind, but a round one is more convenient with less corner cases and since that wasn't specified.
2) Any chance to PM? Maybe follow my profile link, if your interested.

- Chris January 05, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

pangram contains all letters of the alphabet Y. Palindrome reads the same forward and backwards.

Solution for palindrome is trivial, for every word find an inverse word or finding the inverse word missing the last character of the original word (even, odd palindrome) and minimize. If the words are in a dictionary it's O(n*m), n being the number of words, m being the average word length. How to imrpove: start with shorter words, needs some kind of order, use a trie (theoretically faster in constants, practically not due to cash misses)

For minimal pangram, the problem is a minimal cover set, I think it's called. It is known to be NP-hard. A greedy approach exists, that is not optimal, one takes the word with the most characters not already taken. This way the problem get's O(|Y|*n*m), Y is the alphabet, thus constant (e.g. 26), n is the number of words and m is the average word length. If we preprocess the words in O(n*m) we can get the algorithm down to O(|Y|^2*n) thus to O(n) for constant alphabet.

- Chris January 05, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

Questions:
- is the input given as a flag array or array with taken seat positions, is it sorted?
- has the bench a start and an end or is it a circular bench around a circular table?
- Should it be optimzed to seat a single person or multiple people?

Assumption/Definition:
- start with an array of sorted, occupied seats, assume the array is sorted
- to avoid edge cases, assume it's a circular bench around a round table
- n: number of seats

Solution:
- create a binary heap that contains the min distance from a seating between two
taken seats
- when querying for a seat, use that heaps top element, remove it, create two new \
once (left and right) and re-insert it
- O(n) space, O(lg(n)) time per query, O(m*lg(m)) to prepare, assuming, m seats are
taken initially.

Additionals:
what if seats can get unoccupied again once they were occupied?
- in this case, it's better to use a tree instead of a heap, becaue the tree has an
order and it's easy to have a vector with tree-items so, I can access either
using the seat index to merge empty spaces if a person leaves.
- Besides this, it stais the same problem.

#include <vector>
#include <queue>
#include <iostream>

using namespace std;

class SeatFinder
{	
	struct QComp {
		bool operator () (const pair<int, int>& a, const pair<int, int>& b) {
			return a.second < b.second;
		}
	};
	size_t n_;
	priority_queue<pair<int, int>, vector<pair<int, int>>, QComp> heap_; // start, length

public:
	SeatFinder(int seatCount, const vector<int>& takenSeats) : n_(seatCount) {
		size_t m = takenSeats.size();
		for (size_t i = 1; i <= takenSeats.size(); ++i) {
			int start = takenSeats[i - 1];
			int end = takenSeats[i % m];
			putSlot(start, end);
		}
	}

	// returns next empty slot with highest distance to left and right 
	int getNextEmptySlot() {
		if (heap_.empty()) {
			putSlot(0, 0);
			return 0;
		}
		if (heap_.top().second < 1) return -1; // returns -1, because there is no space left.
		auto slot = heap_.top(); // start, length --> start + length is the next taken seat, start is the previous taken seat, length must be >= 2
		heap_.pop();
		int seatIdx = (slot.first + 1 + slot.second / 2) % n_; // start + floor(length / 2) takes it to the middle
		putSlot(slot.first, seatIdx);
		putSlot(seatIdx, (slot.first + slot.second + 1) % n_);
		return seatIdx;
	}

private:
	// closed interval [start, end] where start is the previous taken seat and end is the next taken seat
	void putSlot(int start, int end) {
		int length = (n_ + (end - 1) - start) % n_;
		heap_.push({ start, length });
	}
};

int main()
{
	SeatFinder sf(16, {1, 2, 3, 9});
	for (int i = 0; i < 16; i++) {
		cout << sf.getNextEmptySlot() << endl;
	}
}

- Chris January 04, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

the question is totally unclear, the 2^32th fibonacci number is extremly large, a number with roughly 1 Billion digits. Even NoOnes refered closed form formula will never calculate it in O(1) unless n is considered to be a constant (2^32) which is a bit silly. In the other hand the question could be interpreted as well as telling for any number between 1 and 2^32 wheter it is a Fibonacci, which then is very easy, since there are 46 fibonacci numbers in this range and those can easily be precalculated.

- Chris January 02, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

it's an easy question but not that simple:
- is the input an element in the tree or an arbitrary value, I assume an arbitrary value
- I assume the tree is balanced, so the time complexity of below code is O(lg(n)), otherwise O(n) is best. space complexity is here the same as time complexity unless you do the search by patching pointers temporarily.

The naive algo is inorder traversal, keeping the previous element and when the search value is passed, exploring the next element to see which value is closer. O(n)

More sophisticated is:

const Node* nearestElement(const Node* root, int value, const Node* nearest = nullptr) 
{
	if(root == null) return nearest;
	if(root->value_ == value) return root;
	if(nearest == nullptr || abs(root->value_ - value) < abs(nearest->value_ - value)) { 
		nearest = root;
	}
	if(root->value_ > value) {
		return nearestElement(root->left_, value, nearest);
	} 
	return nearestElement(root->right_, value, nearest);
}

- Chris December 26, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Essentially the same question:
careercup.com/question?id=4509450021896192
Since interviewers gives milliseconds and might have a devices background, depending on allowed jitter, you might need to use RTC events, if that's easily available on your OS. You can craft this into the entioned algo.

- Chris December 11, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

subset sum problem is NP complete. There is a pseudo polinomial algorithm ala knapsack to solve it for small sums - which has limitted value in parctice.

- Chris December 10, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

0.5 is hardly any integer... Was it doubles or decimals?

- Chris December 10, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 2 vote

Assumptions:
- There is no negative weight
- Multiple weights can occure as in the smaple given
- I assume first k-weight means max the top-k, according to summed weight

Approach:
- clearly, if the input was sorted according to weight, the highest is the one with all
balls (if no neg. weights)
- the next lower is when excluding the element with least weight
- the next lower is excluding the 2nd lowest weight
- the third is either excluding the 3rd lowest or the 2nd and lowest

The basic idea is pretty straight forward:
- sort input as described above
- maintain a priority queue with the highest combination on top
- When picking an element from that queue, calculate the next options, where it is
not straight forward to decide which will be the next higher one. Put all of them into
the queue, then next loop the highest is picked etc. That is similar to Dijkstra
shortest path. But in order to be efficient, I only calculate next elements when an
item gets picked and this needs a bith of thought, but I can create a tree, similar to
a Fenwick on the fly ...
When an item is picked there are at most two new options, one is to take the next
element, the other is to take the current element and the lowest possible
- For the time complexity, there is the issue we need to produce the results which is
O(n*k). Then there is the sort which is O(n*lg(n)) and the maintenance of the queue which
is O(k*lg(k)). The queue component is irrelevant since always smaller than the sort. Then
there are some copy operations inside the algo, which are executed at most 2k times, thus
resulting in O(k*n). Overall it is O(n*k + n*lg(n)) - depending on the input one or the
other term dominiates.

Cool question, never saw this one, really nice! Although a bit hard to come up with the
"next element" algorithm in 30 minutes together with all the rest.

#include <string>
#include <vector>
#include <queue>
#include <iostream>

using namespace std;
using uint = unsigned int;

void print_top_k_subsets(const vector<pair<string, uint>>& input, uint k) 
{
	if (input.size() < 1 || k < 1) return;
	uint result_count = 0;

	// sort
	auto sinput(input); // sorted input, smallest weight first
	sort(sinput.begin(), sinput.end(), 
		[](const pair<string, uint>& a, const pair<string, uint>& b) {
		return a.second < b.second;
	});

	// queue with queitem having least loss on top
	struct QueueItem {
		int weight_loss;
		int last_excluded;
		int last_excludable;
		vector<bool> excluded;
	};
	auto qcomp = [](const QueueItem& a, const QueueItem& b) {
		return a.weight_loss > b.weight_loss;
	};
	priority_queue<QueueItem, vector<QueueItem>, decltype(qcomp)> q(qcomp);
	q.push(QueueItem({0, -1, static_cast<int>(sinput.size() - 1), vector<bool>(sinput.size(), false)}));
	
	while (!q.empty() && result_count < k) {
		QueueItem current = q.top(); // copies the vector, not so nice
		q.pop();

		// print result (should be put into a seperate function) 
		result_count++;
		cout << result_count << "th: {";
		int sum = 0;
		bool first = false;
		for (int i = 0; i < sinput.size(); ++i) {
			if (!current.excluded[i]) {
				sum += sinput[i].second;
				cout << (first ? ", " : "") << sinput[i].first;
				first = true;
			}
		}
		cout << "} sum=" << sum << endl;

		// get next potential candidates and put it into the queue
		if (current.last_excluded > 0) {
			QueueItem new_item(current);
			new_item.weight_loss += sinput[0].second;
			new_item.last_excludable = current.last_excluded - 1;
			new_item.last_excluded = 0;
			new_item.excluded[0] = true;
			q.push(new_item);
		}
		if (current.last_excluded < current.last_excludable) {
			if (current.last_excluded >= 0) {
				current.weight_loss -= sinput[current.last_excluded].second;
				current.excluded[current.last_excluded] = false;
			}
			current.last_excluded++;
			current.weight_loss += sinput[current.last_excluded].second;
			current.excluded[current.last_excluded] = true;
			q.push(current);
		}
	}
}


int main()
{
	cout << "sample 1" << endl << "-----------------------------" << endl;
	print_top_k_subsets({ {"red", 4}, {"yellow", 1}, {"green", 2}, {"blue",  2} }, 5);
	cout << "sample 2" << endl << "-----------------------------" << endl;
	print_top_k_subsets({ { "red", 4 },{ "yellow", 1 },{ "green", 2 },{ "blue",  2 }, {"amber", 1}, {"black", 2}, {"white", 1} }, 15);

- Chris December 10, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

If that's the node p of G where sum of paths p to n1 and p to n2 is minimal:
1) reverse edges, since it's a DAG
2) run bi-source bfs form n1 and n2. The node where the frontiers first meet is the desired node p.

- Chris December 10, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

That's a systems design question, not coding, I assume.

- I define the thing I want the rolling statistics as fact (e.g. user id, hashtag, ...)
- Then I would split the time window into slots such a way that the timewindow / slots provides a reasonable update rate for the statistics while it keeps the number of slots small enough for a feasible solution.
- Now I have two things: a queue which represents a time slot. Each queue items holds a hash table which counts for the seen facts the occurrences (frequency)
- Further I keep a hashtable that will count the occurrences total, it will have the fact as key and a refrence to a tree-item as value. The tree-item has the frequency as key and the fact as value.
- If an item arrives, I will update the head of the queue or create a new queue item if a new slot is touched by the timestamp of the arriving item. I will as well increment the frequency of that item in my hash-table->tree-item structure
- I will check as well the oldest items in the queue and if they are older than the new item arrived, I will subtract the key-value pairs from my main statistic and throw away that queue item away

Now I can always query my tree for the top-k keys and receive the the top-k items.

This design has a few assumptions and weak points:
- it is optimized for a high query rate on top most k items.
- it is slightly wrong, because I add items as they arrive to the hash-table -> tree-item structure, but remove them with a lower granularity of time per slot
- I only update the statistics if items arrive, assuming, items arrive always and if not, the statistics relatives don't change (which is true, but maybe top k should be an empty list if no items arrived longer than the moving window)

For high availability and fault tolerance of single servers:
- a number of 'frequency aggregatos' that receive the events and assign them to time slots (either synchronized time or fine grained ... the whole time discussion could fill a couple of minutes here). Here I could place a load balancer in front or if I control the clients assign different clients a different aggregator and fallback if primary for that client fails, etc..
- multiple 'redundant statistics runners' that will receive data from aggregator and sum up to final statistics as pointed out in prior section. The aggregator will translate thousand or hundereds of thouasands of requests to just one per time slot, which will be a list of key-value pairs it will forward to the main server. It might already cut away the long tail in order to minimize traffic... but that's another discussion because it's not that easy, one might need to consider re-elevating long-tail's that peak up or if very small time slots are considered, if a fact occurs exactly once in a time slot, it might need re-elevation as well otherwise statistics might be wrong.

Now, I can query any one of the redundant statistics runner. How ever, I need to account for statistics runner that are temporarily overloaded or didn't receive all updates from the aggregators, because they will return a different and wrong statistics.

- Chris December 09, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

@cool guy
the hash table is more space effective than a trie due to pointers etc. and therefore a hashtable can be cash friendlier if short enough. If you use hashtable in the naive way, you will need to recalculate the hash for the string with every lookup, which is not so nice. If you write your own hashtable and hash function you can avoid this by reusing the hashvalue of k-1 characters to calculate hash of k characters (expanding prefix). With a trie, you know if a prefix is not in the dict as soon as you expand a character c[k], with a hashtable you will have to expand every prefix to the max word size in order to make sure it's not in there. This makes the hash algorithm ineffective for some crafted sample input. The tries memory disadvantage can be improved using a patricia tree - sometimes compacted trie, but it will never be as compact in memory as a hashtable.

There was a question on LeetCode like this, and I tried to beat the fastest c++ implementation which used a hashtable approach. I tried using a trie and even a patricia tree implementation. It turned out, the time to create the trie dominated the advantage of the faster lookups because it was one dictionary per query, so, that's always a good thing to think about, how many queries per dictionary instance. An other thing I noticed there, the data in the testcases were just not advantageous for the trie implementation. Short dictionary (fits into cache) with relatively short words.

I could imagine a combination being good, create a sort of trie on substrings of words of fixed size, say 5 characters, so for short words it's like a single hashtable, long words that share a prefix, would behave like a trie etc... I haven't seen anybody doing that and haven't thought really about it, but it kind of jumped into my mind right now.

edited:
maybe I should have added, a ptrica trie can be more space effective than a hashtable for crafted input, but it's not the case for a regular western language.

- Chris December 09, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

@Alex: of course, calculate the absolute time when the task should be run. this is calculated when a task get's added.

- Chris December 09, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 2 vote

1) calculate absolute times when a task get's added
2) push it into a priority queue with smallest absolute on top (binary heap)
3) have a runner that polls the queue once in a time interval (e.g. second) and checks if top task is to be executed, pop items from the heap that need to be executed.
4) now, the question is, how long does the run take? can I execute it synchronously or will this delay the scheduler? I could as well span into a thread pool to call run there.

a more sophisticated edge case is, if two items receive the same absolute time, is it required to maintain order of insertion when running? Might be relevant in some use cases. The mentioned edge case, is obvious. at 5 task b runs, at 10 task a runs.

runtime and space complexity is obvious.

- Chris December 08, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

@sachin323: You should pick the path with smallest p, where p is an element of each path from (0,{0..n-1}) to (m-1,x) where (c[i+1], {r[j+1], r[j-1], r[j]} is the largest value). This sounds strange... surely possible, but what for? I see no connection to tallest horizon.

Anyway, if that's the task, I'd scan the matrix for minimal value that can be reached from any of the 3 left hand pixels. Then I know through which pixel the path goes. From there I'd expand right and left: O(n*m), no extra space, except to store the path.

I assume he wanted something else, like detecting a couture, like maybe an upper edge of a scanned page or so... but I can't see that in here.

anyway, sorry to hear it didn't work out this time.

- Chris December 04, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

It's hard to tell what his intentions were, but certainly questions are needed such as:
- multiple "directories" per user?
- multiple users per directory?
- if prefix matches, does it mean he has access to all sub. dirs.?
- can there be a revoke in a sub. Dir?
- how many grants / user typically, outliers?
- how many users, how many directories.
- how deep are those paths typically, where is the grant in this hirarchy

One approach can be, have an entity for a user, and a list of directory prefixes, load the user with grants and check if path-prefix is in there, maybe cache user. If there are thousands of grants per user but only few grants per directory, you may do the opposite on the directory, load per prefix the number of users.

Etc...

- Chris December 04, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 2 vote

between a and b, I assume a half open interval [a, b)

question is the distribution, if it should be normal, the previous posts are probably right (needs a prove though). If we want uniform distribution things are different.

I could create a random function that creates a uniform random number in the range [0 and 2^n) by applying the given function to each bit. Then I could use this new function to create a random number in the range [0, 2^n) where n would be lg2(b-a) + 1. If the random number exceeds b-a, I try again, if not, I add it to a and return the result.

- Chris December 03, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

I figured a good approach is a maze runner that detects the outer contour of the rooms. Then it should zig-zag through the interiour, similar like flood filling. It's going to be hard to prove it will really work in 30 minutes - as it is to implement it correct in 30 minutes.

- Chris December 03, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

since it seems to be a contest question and I noticed people posting questions from live contests, I'd just advice to use a segment tree. The queries etc. should be easy with it. If you set a one and it was a zero, check if there is a segment on the left and right and merge accordingly.

- Chris December 03, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

the question will be, random with what distribution?
I assume with a uniform distribution: each point in the final region has the same probability to be chosen. For a single rectangle this is straight forward:

x = rectangle.left + random(rectangle.width);
y = rectangle.top + random(rectangle.height);

For multiple rectangles that don't overlap, we can say, we choose the rectangle proportionally to it's area and once the rectangle is chosen, pick a point in there.

long long are_sum = 0;
for(Rectangle& r : rectangles) {
	area_sum += r.width * r.height;
	if(random(area_sum) < r.width * r.height) {
		x = rectangle.left + random(rectangle.width);
		y = rectangle.top + random(rectangle.height);
	}
}

But if they overlap it's tricky. The easiest way I can think of is to union the new rectangle with the existing area and only consider the rectangles that are not already covered and apply method 2.

- Chris December 03, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Assumptions:
- the cell where the boat is starting has hight -1, the boat can float
- it takes one time unit to move let, right, up and down (Manhatten moves)
- in order to be able to float on a field the water level must be field hight + 1 when
I reach there. for example level is 2 when I start at field (0,0) I can go to
field (1,0) in one time unit if level of that field (1,0) is 2.

I'd solve it using a Dijkstras sort of shortest path. The cost to go to an adjacent field
is current-path-length + min(1, field-height - current-path-length). That is, the cost is defined by the field and not the weight of an edge. Thus I don't need the relaxation step from Dijkstra.

I could add heuristics that estimates remaining distance, to create an A* like thing.

#include <queue>
#include <vector>
#include <algorithm>
#include <iostream>

using namespace std;

int min_time_to_reach_lower_right(const vector<vector<int>>& matrix)
{
	const pair<int, int> MOVES[] = { {1,0},{-1,0},{0,1},{0,-1} };
	struct QueueItem {
		int row, col, pathlen;
	};

	int m = matrix.size();
	if (m == 0) return 0;
	int n = matrix[0].size();
	if (n == 0) return 0;

	auto qcomp = [](const QueueItem& a, const QueueItem& b) { return a.pathlen > b.pathlen; };
	vector<vector<bool>> explored(m, vector<bool>(n));
	priority_queue<QueueItem, vector<QueueItem>, decltype(qcomp)> pq(qcomp);
	pq.push({0, 0, 0});
	explored[0][0] = true;
	while (!pq.empty()) {
		auto qi = pq.top();
		pq.pop();
		if (qi.row == n - 1 && qi.col == m - 1) return qi.pathlen;
		for (auto& move : MOVES) {
			int r = qi.row + move.first;
			int c = qi.col + move.second;
			if (r >= 0 && r < m && c >= 0 && c < n && !explored[r][c]) {
				explored[r][c] = true;
				pq.push({ r, c, max(qi.pathlen + 1, matrix[r][c] + 1) });
			}
		}
	}	
	return -1; // that's actually an error, it should always be able to reach the end
}

- Chris December 03, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Let's assume my interpretation is correct (see my other post), the solution is using a DFS-like approach. It-s more efficient than BFS because I expect a route to be found starting on a early row and if there is a choice, the upper row wins if there is a path to the right through that element. So, if I run BFS it will use that very primitive heuristic to expand the upper routes first.

A BFS, opposed to that, will explore all routes simultaneously looking for a shortest path. So it will expand a lot of routes it doesn't need to if the desired route doesn't start from the very bottom. How ever, worst case scenario is the same, has the same big-O properties.

That would be O(m*n) time and O(m*n) space complexity.

#include <vector>
#include <stack>
#include <algorithm>
#include <iostream>

using namespace std;

// return the sequence of rows to be choosen if columns start at 0 and end at n-1
vector<int> find_highest_path(const vector<vector<bool>>& matrix)
{
	vector<int> path;
	int m = matrix.size();
	if (m == 0) return path; 
	int n = matrix[0].size();
	if (n == 0) return path;
	vector<vector<int>> parent(m, vector<int>(n, -1));

	int lr = -1;
	for (int i = 0; lr == -1 && i < m; ++i) {
		stack<pair<int, int>> s;
		if (!matrix[i][0]) continue;
		s.push({ i, 0 });
		while (!s.empty()) {
			int r = s.top().first;
			int c = s.top().second;
			s.pop();
			if (c == n - 1) {
				lr = r;
				break;
			} 
			for (int nr = min(r + 1, m - 1); nr >= max(0, r - 1); --nr) {
				if (matrix[nr][c + 1] && parent[nr][c + 1] == -1) {
					parent[nr][c + 1] = r;
					s.push({nr, c + 1});
				}
			}
		}
	}

	// back track path
	int c = n - 1;
	while (lr != -1) {
		path.push_back(lr);
		lr = parent[lr][c];
		c--;
	}
	reverse(path.begin(), path.end());
	return path;
}

template<class T>
ostream& operator <<(ostream& os, const vector<T>& c)
{
	os << "{";
	bool first = true;
	for (auto e : c) {
		if (!first) os << "," << e;
		else os << e;
		first = false;
	}
	os << "}";
	return os;
}

int main()
{
	cout << find_highest_path(
	{
		{ 0,0,1,1,0,1,1,0,0,0,0,0 },
		{ 1,1,0,0,1,0,1,1,1,1,0,1 },
		{ 0,0,1,0,0,0,0,1,0,0,1,0 },
		{ 1,1,1,1,0,0,1,1,0,1,0,0 },
		{ 0,0,0,1,1,1,0,1,1,0,0,0 },
		{ 0,0,1,0,0,1,0,0,0,1,1,1 },
		{ 1,0,0,0,0,0,1,1,1,1,1,1 },
	}) << endl; // should return 1,1,0,0,1,0,0,1,1,1,2,1

	cout << find_highest_path(
	{
		{0,0,1,1,0,1,1,0,0,1,0,0},
		{1,1,0,0,1,0,0,1,1,0,0,1},
		{0,0,1,0,0,0,0,1,0,0,1,0},
		{1,1,1,1,0,0,1,1,0,1,0,0},
		{0,0,0,1,1,1,0,1,1,0,0,0},
		{0,0,1,0,0,1,0,0,0,1,1,1},
		{1,0,0,0,0,0,1,1,1,1,1,1},
	}) << endl; // should return 1,1,2,3,4,4,3,3,4,3,2,1
}

- Chris December 03, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Are you saying you want lines L which are paths from column 0 to column m-1 in a matrix where adjacent pixels are m[c+1][r-1], m[c+1][r], m[c+1][r+1] from pixel m[c][r] if that pixel is 1. I assume with tallest line you mean the line l of L that reaches the highest point, that is minimum r (row), if we assume upper-left is r=0, c=0? If there are multiple with the same minimum r, pick the one that has most points on that height.

- Chris December 03, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Opposed to exact content, one might want to have a comparison that is robust against typos, layout dependent differences (e.g. table layout vs. column layout), intended introduction of minor differences, etc.

To minimize false positives, the approaches previously described work fine. If you want to have a balance between false positives and false negatives, the following approach is be better suited:
1) strip html tags off, take the text, strip filling words (words with no content, like "and", "it", "a", ...) from these words, do word stemming (reading = read, etc...) of remaining words and then count the occurrence of each word, so you get two multi-dimensional vectors (k dimensions, for each word stem a dimension)
2) feed the two vectors into a comparison function (like cosine-similarity) and now define a threshold when do you consider the page same content and when not. Clearly, if you have a database to test against (where you know the result you want to achieve) you have an advantage.

- Chris December 02, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

does the field I click flip as well? The problem statement sais it's flipped twice = it's not flipped.

- Chris December 02, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

recursive definition should do it in a tree, in a generic graph it would be NP-hard.

int max_path_rec(Node* root, int& max_path)
{
	if (root == nullptr) return numeric_limits<int>::min();	
	int left_branch = max(0, max_path_rec(root->left_, max_path));
	int right_branch = max(0, max_path_rec(root->right_, max_path));
	max_path = max(max_path, root->value_ + left_branch + right_branch);
	return root->value_ + max(left_branch, right_branch);
}

int max_path(Node* root) 
{
	int mpath = numeric_limits<int>::min();
	max_path_rec(root, mpath);
	return mpath;
}

- Chris December 02, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I assume connected is adjacent and the Node has no parent pointer.

Node* find_local_min(Node *root)
{
	stack<pair<Node*, Node*>> s;
	s.push({ root, root });
	while (!s.empty()) {
		Node* p = s.top().first; // parent of current, none = root if p = n
		Node* c = s.top().second; // current 
		int cv = c->value_;	// current value
		s.pop();
		if (c->left_ != nullptr && c->right_ != nullptr
			&& cv < p->value_ && cv < c->left_->value_ && cv < c->right_->value_) {
			return c;
		}
		if (c->left_ != nullptr) s.push({ false, c->left_ });
		if (c->right_ != nullptr) s.push({ false, c->right_ });
	}
	return nullptr;
}

- Chris December 01, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

@CoolGuy: I don't get the difference of Job runner and worker in your latest post. How ever, let's assume the job runner is a job scheduler and distributes the work to workers. So it acts like a Loadbalancer. A typical thing you can do to return results is IP rewriting. The job runner sends a request to the worker but the sender-ip is actually the client's-ip .. but your interface states not a synchronous interface...

- Chris November 29, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

Here the solution that will use a definable number of slots vs. tracking requestLimit-Amount of items in the queue. Que size never exceed slotCt.

class RateLimitter 
{
private:
	queue<std::pair<long long, unsigned int>> requests_;
	long long intervalUs_;
	unsigned int requestLimit_;
	unsigned int slotCt_;
	long long slotLengthUs_;
	unsigned int reqsPerInterv_;

public:
	RateLimitter(long long intervalMs, unsigned int requestLimit, unsigned int slotCt = 100)
		: intervalUs_(intervalMs * 1000), requestLimit_(requestLimit), slotCt_(slotCt),
		  slotLengthUs_(intervalMs * 1000 / slotCt), reqsPerInterv_(0) {
		if (slotLengthUs_ == 0) throw invalid_argument("intervalMs > slotCt");		
	}
	
	bool processRequest() {
		long long nowUs = duration_cast<microseconds>(steady_clock::now().time_since_epoch()).count();
		long long slotId = nowUs / slotLengthUs_;

		// is oldest slot expired
		if (!requests_.empty() && requests_.front().first <= slotId - slotCt_) {
			reqsPerInterv_ -= requests_.front().second; // all that were in this slot get freed
			requests_.pop(); // remove it
		}

		// is there "space"?
		if (reqsPerInterv_ < requestLimit_) {				
			reqsPerInterv_++;
			if (!requests_.empty() && requests_.back().first == slotId) {
				requests_.back().second++; // append to open slot
			} else {
				requests_.push({slotId, 1}); // create new slot with count 1
			}
			return true;
		}
		return false;
	}
};

- Chris November 29, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

@Alex, that's right as well. I'd as well create a histo if I had to reduce space and can't shorten the interval.

- Chris November 29, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

@Alex: No problem, but Memory is max-requests per interval, not number of requests. The queue never grows larger than max-requests, note the "requests_.pop();" :-) - The ringbuffer would be of that size from the beginning, just like your solution...

- Chris November 29, 2017 | Flag Reply

Page:

1 2 3 4 5 6 7 8 9

CareerCup

Chris

Books

Videos

Resume Review

Mock Interviews