jarflores
BAN USERThis answer is what I am thinking, but explicitly use a max heap to store the characters by their count. BuildMaxHeap still runs in O(n) and then consecutively decrease the priority (i.e. a character's count) of the root and its max child. This way, the root will always be the character with the highest count remaining.
- jarflores July 05, 2015In general, when you have N points and you want to find the closest M points to a special point P, you:
1) Create a min heap of size M
2) Loop through the N points and add them to the min heap based on the comparison X < Y when distance(X,P) < distance(Y,P).
3) Print the min heap in increasing order.
The total run time is O(N*lg(M) + M*lg(M)) == O(N*lg(M)). In this special case, you would need to determine how to get the N points in the first place. If hotels are stored (and indexed) by lat/long coordinates, you would likely query based on a circle C with center P... i.e. get hotels X where
(X.lat - P.lat)^2 + (X.long - P.long)^2 < R^2
Depending on the database you use, you can likely order order query results by distance to P and limit to M results.
If querying based on a distance function is not possible, you can simplify the query to use a square S with center P having only inequalities:
(P.lat - D/2) < X.lat < (P.lat + D/2) && (P.long - D/2) < X.long < (P.long + D/2)
and then run the algorithm above using the heap. But there is an issue: it is possible for S to contain N points, but not the *closest* M points to the center P. For example, S could have a hotel on one of the corners but there could be a closer hotel outside S very close to one of the sides of S. Here you would need to choose an extra large square (whatever that means) to ensure you have the correct set of points.
If you are able to design the app yourself, then you can likely choose a database which makes the query simple and avoid any post processing of the query results.
Every node N of a max heap satisfies the following property:
second largest value in subtree rooted at N = max(N.left, N.right)
Note that it is not true (in general) that the third largest value (of the subtree rooted at N) is min(N.left, N.right). We only know that if min(N.left, N.right) is not the the third largest value (of the subtree rooted at N), then it lies within the subtree rooted at max(N.left, N.right).
Explaining this at the root should make the algorithm clear. Let H1 be the original heap and H2 the secondary. After the root, the second largest element of the H1 is max(root.left, root.right) == root.maxChild(). As per the note above, we do not know if min(root.left, root.right) == root.minChild() is the third largest element of H1. We only know that the subtree rooted at root.maxChild() may contain values larger than root.minChild(). Therefore, we store root.minChild() in H2.
Now the loop begins. Basically H2 keeps a list of the nodes/indices of H1 that haven't been counted as one of the largest k elements "yet," and every time you pass over a node's "minChild" to check the "maxChild subtree" for larger elements, we store that "minCHild" in H2 (which keeps the max yet-to-be-counted value at its root). If it is ever the case the the node stored a H2's root is larger, we must add both of curr's children to H2 since we have not traversed either.
Here is an O(nlg(n)) algorithm:
1) Sort the array (which is O(nlg(n)) or O(n) if bounded).
2) For i = 1 to n, x=arr[i], binary search the array in the index range [i + 1, n] for the largest index j such that arr[j - 1] <= x + 1 < arr[j]. Save count = j - i as the number of floats in the interval [x, x + 1]. This is O(nlg(n)).
3) Keep a running total of the max count and return it. O(1)
Would be nice to have a solution that didn't require sorting.
Even if you:
- sort the array (so, O(nlg(n)) or O(n) if they are bounded)
- for each value x in the array, loop through _every_ other value and count how many are in the interval [x, x + 1] (so O(n^2))
- return the x with the largest count
you have an O(n^2) solution that requires O(n) space. This is brute force and very not exponential.
Here is a basic algorithm that assumes all the nodes needed exist (checking for null nodes or "out of bounds" indices only adds clutter). The basic idea is that you use a secondary heap to store the nodes you have yet to traverse. I'm not sure whether is satisfies the O(n) bound. Getting this done in log(n) will require some clever binary search.
public class HeapStatistics {
public Node<T> kthElem(int k, Heap<T> hp) {
if(k == 0) return hp.max();
if(k == 1) return hp.max().maxChild();
Heap<T> candidates = new Heap<T>(hp.max().minChild());
Node<T> curr = hp.max().maxChild();
k--;
while(k > 0) {
if(curr.maxChild().val >= candidates.max().val) {
candidates.add(curr.minChild());
curr = curr.maxChild();
} else {
candidates.add(curr.maxChild());
candidates.add(curr.minChild());
curr = candidates.extractMax();
}
k--;
}
return curr;
}
}
public interface IHeap<T> {
public T max { get; set; }
public T add(Node<T> n);
public T extractMax();
}
public class Node<T> {
int key;
T val;
public Node<T> parent(Node<T> n);
public Node<T> left(Node<T> n);
public Node<T> right(Node<T> n);
public Node<T> maxChild(Node<T> n) {
return n.left().val >= n.right().val ? n.left() : n.right();
}
public Node<T> minChild(Node<T> n) {
return n.left().val >= n.right().val ? n.right() : n.left();
}
}
The beauty of RPN is that the operator follows the operands...so you only need one stack of doubles to compute the final result. Simple pseudo-code:
for(int i = 0, len = tokens.length; i < len; i++)
{
if(IsANumber(tokens[i])
numberStack.push(tokens[i]);
else
{
if(IsAnInvalidOperator(tokens[i]) || numberStack.Count < 2)
throw new Exception("Bad token array!");
double tmp = numberStack.pop();
numberStack.push(compute(tokens[i], numberStack.pop(), tmp));
}
}
return numberStack.pop();
Here is a C# solution. As per the above comments, there can be only one influencer and the algorithm is O(n). Imagine the matrix as a "cylinder"; then we must find a row where `false` values wrap all the way around. At the same time, as we test for false values, we eliminate other users from being an influencer candidate. Therefore, we should have to test each column for false at most twice: once to see if the current user can be the influencer, and once to see if it is false in the (possible) influencer's row. If we find a row of all false, afterward just check the column values for that user to be true.
using System;
namespace Influence
{
public interface InfluencerFinder
{
/**
* Given a matrix of following between N LinkedIn users (with ids from 0 to N-1):
* followingMatrix[i][j] == true iff user i is following user j
* thus followingMatrix[i][j] doesn't imply followingMatrix[j][i].
* Let's also agree that followingMatrix[i][i] == false
*
* Influencer is a user who is:
* - followed by everyone else and
* - not following anyone himself
*
* This method should find an Influencer by a given matrix of following,
* or return -1 if there is no Influencer in this group.
*/
int getInfluencer(bool[][] followingMatrix);
}
public Finder : InfluencerFinder
{
public int getInfluencer(bool[][] followingMatrix)
{
int numUsers = followingMatrix[0].length;
int numNotFollowing = 0;
int j = 0;
int user;
bool colOverflow = false;
while(numNotFollowing < numUsers)
{
if(user == numUsers - 1 || colOverflow)
return -1;
user = j;
numNotFollowing = 0;
while(!followingMatrix[user][j] && numNotFollowing < numUsers)
{
j++;
numNotFollowing++
if(j >= numUsers)
{
j = j % numUsers;
colOverflow = true;
}
}
}
// if here, then row user has all false... check col
int numFollowedBy = 0;
for(int i = 0; i < numUsers; i++)
{
if(i == user)
continue;
if(!followingMatrix[i][user])
return -1;
}
return user;
}
}
}
Sorry for multiple posts...if someone can remove the others, please do!
Also, after thinking a little longer, there is symmetry within the sub-problems and you would be able to cache their solution. For instance, if the series of choices (1,2,3,4) would produce a "game" with the same "state" as (3,2,1,4), (1,4,3,2), and (3,4,1,2). Then change
scenario.PlayerAHasWinningStrategy |= recursiveCanPlayerAWin(branch);
to
if(ScenarioCache.HasDetermined(branch))
scenario.PlayerAHasWinningStrategy = ScenarioCache.GetOutcome(branch);
else
scenario.PlayerAHasWinningStrategy |= recursiveCanPlayerAWin(branch);
The ScenarioCache is just a HashSet whose members are Scenario instances. You must override Equals and GetHashCode so that two scenarios are equivalent only when PlayerATotal, PlayerBTotal and InvalidChoices are all equal (by value). The HasDeteremined(branch) method simply checks if the dictionary has branch as a member (as determined by Equals). The GetOutcome(branch) method simply returns PlayerAHasWinningStrategy for that member.
- jarflores March 29, 2015This is basically brute force, but the algorithm should be clear.
using System;
using System.Collections.Generic;
namespace SumGame
{
class Program
{
static void Main()
{
}
}
static class WinComputer
{
static bool canIWin(int maxChoice, int winTotal)
{
Scenario.maxChoice = maxChoice;
Scenario.winTotal = winTotal;
return recursiveCanPlayerAWin(new Scenario(0,0));
}
static bool recursiveCanPlayerAWin(Scenario scenario)
{
if(scenario.PlayerAHasWinningChoice())
return true;
for(int i = 0; i <= maxChoice; i++)
{
if(scenario.IsInvalidChoice(i))
continue;
scenario.PlayerAChoose(i);
if(scenario.PlayerBHasWinningChoice())
return false;
for(int j = 0; j <= maxChoice; j++)
{
if(scenario.IsInvalidChoice(j))
continue;
Scenario branch = scenario.Clone();
branch.PlayerBChoose(j);
scenario.PlayerAHasWinningStrategy |= recursiveCanPlayerAWin(branch);
}
}
return scenario.PlayerAHasWinningStrategy;
}
}
class Scenario
{
public int PlayerATotal;
public int PlayerBTotal;
public HashSet<int> InvalidChoices;
public bool PlayerAHasWinningStrategy;
private static int maxChoice;
private static int winTotal;
public Scenario(int a, int b, HashSet<int> h = null)
{
PlayerATotal = a;
PlayerBTotal = b;
InvalidChoices = h ?? new HashSet<int>();
PlayerAHasWinningStrategy = false;
}
public Scenario Clone()
{
return new Scenario(PlayerATotal, PlayerBTotal, InvalidChoices);
}
public void PlayerAChoose(int x)
{
PlayerATotal += x;
InvalidChoices.Add(x);
}
public void PlayerBChoose(int x)
{
PlayerBTotal += x;
InvalidChoices.Add(x);
}
public bool PlayerAHasWinningChoice()
{
return PlayerHasWinningChoice(PlayerATotal);
}
public bool PlayerBHasWinningChoice()
{
return PlayerHasWinningChoice(PlayerBTotal);
}
public bool IsInvalidChoice(int x)
{
return InvalidChoices.Contains(x);
}
public bool PlayerHasWinningChoice(int playerTotal)
{
for(int i = 0; i <= Scenario.maxChoice; i++)
{
if(!IsInvalidChoice(i) && (playerTotal + i >= Scenario.winTotal))
return true;
}
return false;
}
}
}
Define the obvious hash function h:{alphabet} --> {0,1,2,...,25}. Since we are looking for the first "non-repeating" character, this means there should be a point when the characters stop "streaming" and we can actually determine this. So lets suppose the streamed characters are contained in a character array A. Then we can use count sort to find the first non-repeating character
B = integer array of length 26
//initialize array
for i=0, i< 26, i++
B[i]=0
//count streamed characters
for i=0, i < A.length, i++
B[h(A[i])]++
//find first non-repeating character
i = 0
while B[h(A[i])] > 1
i++
return i //or A[i] if you want the actual character
If the characters continue to stream, then the array A will have letters appended to it and you can continue to check all the values >= N.
It is odd to say that the characters are "streaming" and that you want to compute the "first non-repeating" (aka unique) character. For if the characters are truly streaming (indefinitely) then how can we know if any character will remain unique?
For simplicity, assume that the values of the array A are all positive integers and A has length N (In the general case, hash the values of A into the positive integers). Now, count sort the values A[i] of A, starting from i=0, noting that the first bin to obtain two elements will be the first duplicate.
Let B = array of length max{ A[i] }
for i=0, i < B.length, i++
B[i]=0
for i=0, i < N, i++
if B[A[i]] //returns true when B[A[i]]=1, aka A[i] has already appeared
return i //index of the first duplicate
B[A[i]]++
Since we begin with i=0, we are assured to return the first duplicate.
- jarflores December 01, 2013Suppose you have a binary number 1101110. If we consider this as base -2, the difference is that the odd digits now represent a negative value. We simply need to convert the odd digits into base -2. Examples:
10=2^1 in binary and 110=2^2-2^1=2^1 in base -2
1000=2^3 in binary and 1100=2^4-2^3=2^3 in base -2
In general
2^n=2^{n+1}-2^n
and when n is odd, 2^{n+1} is even, hence positive in base -2. This allows us to rewrite an odd digit of a binary number ...0001000.... as ...0011000...
Therefore, to find the base -2 expansion of n, a simple solution to the problem would be:
1) Sum the odd powers of 2 until you obtain a number x > n.
2) Compute the `bit-wise and`
y = n & x
(as in C or C++)
3) Multiply y by 2 (aka bit-wise shift left) and then add to n:
z = (y << 1) + n
The binary expansion of z is the base -2 expansion of n.
- jarflores December 01, 2013For simplicity, suppose we have 2N elements and that the array A is indexed from 1 to 2N. Sort the elements in increasing order then swap A[2*(i+1)] with A[2*N-(2*i+1)] for
i=0 to (N-2)/2 when N is even
i=0 to (N-3)/2 when N is odd
So the running time is simply O(sort) + O(N), which if they are all integers, should be O(N).
This is the nicest solution, good work!
- jarflores July 05, 2015It seems that all the solutions here require "pre-processing" all characters by their count. It would be nice to see an algorithm that does minimal work, i.e. if a string does not have a character that is repeated consecutively, then it only requires a single pass through the string (which does nothing). Whereas a string like 'hello' would only require a single swap. Not sure how manageable this would be.