Given a very long list of URLs

Google Interview Question for Software Engineer / Developers

7

of 7 votes

70
Answers
Given a very long list of URLs, find the first URL which is unique ( occurred exactly once ).

I gave a O(n) extra space and O(2n) time solution, but he was expecting O(n) time, one traversal.
- AK December 06, 2011 in India | Report Duplicate | Flag | PURGE
Google Software Engineer / Developer Algorithm Large Scale Computing

Email me when people comment.

An error occurred in subscribing you.

Country: India
Interview Type: In-Person

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 20 vote

you can solve this in O(n) using a combination of trie and linked list. The leaf node of a trie maintains a flag to record duplicate urls and pointer to a node in a link list. If you encounter a new url, add a node to the head of the linked list and set the pointer in the trie. Whenever you encounter a url that is already in the trie, if the flag is not set, then set the flag, delete the node from the linked list and set pointer to null. If the flag is already set, then ignore and read the next url.
After processing all the urls, the link list should only contain the unique urls and the node at the tail is the first unique url from the list. For n urls, inserting urls into the trie in O(n) and link list operations are all constant time. The node could just keep the index of the url in the list so that we don't have to store urls in the link list as well.

- hello world December 14, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

i thought tries have insertion time o(nlogn) for n elements.
are you sure its o(n), its a tree structure...

- andy February 29, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

The trie insertion time is k, where k is the length of the string being inserted. So the total time is O(nk). Since the length of the URL's doesn't depend on n, though, and can be expected to be relatively constant, this is O(n).

- Anonymous March 25, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

Why not just use a hashtable if your storage is going to be O(n) anyway though?

- Anonymous March 25, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Storage is going to be O(n) -- but a hash table is not as space efficient as it has to store the entire string in case of collision, but in all partial matches are collapsed together, no?

- wealthychef June 17, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Incorrect soln will not be O(n) as you would be deleting nodes from link list...parsing till node will be extra time.

- s November 03, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

To answer the question of above user:
There is no extra time needed to "parse till the node to be deleted".
The following code is my implementation of the same idea as that of the original poster,
the linked list is combined with the trie,
once you have finished travelling to end of an url in the trie,
you would have the node in the linked list to be deleted.

/* The fields 'parent' and 'child' are for the trie.
 * The fields 'prev' and 'next' are for the linked list.
 * I presume that the character-set of URL is {'a' - 'z', '.', '/', ':'}.
 * In real interview, need to discuss the assumption with your interviewer.
 */
class Node
{
 public Node parent = null;
 public Node[] child = new Node[29];
 public Node prev = null;
 public Node next = null;
 public char data = ' ';
 public int count = 0;
}

class G105
{
 public static String firstUnique(String[] array)
 {
  Node root = new Node();
  Node first = null;
  Node last = null;
  for (String s : array)
  {
   Node current = root;
   /* Modify the trie */
   for (char c : s.toCharArray())
   {
    int index = 0;
    if (':' == c) {index = 28;}
    else if ('/' == c) {index = 27;}
    else if ('.' == c) {index = 26;}
    else {index = c - 'a';}

    if (null == current.child[index])
    {
     Node next = new Node();
     next.parent = current;
     next.data = c;
     current.child[index] = next;
    }
    current = current.child[index];
   }

   /* Modify the linked list */
   current.count++;
   if (1 == current.count)
   {
    if (null == first) {first = last = current;}
    else
    {
     current.prev = last;
     last.next = current;
     last = current;
    }
   }
   else if (2 == current.count)
   {
    if (current == first)
    {
     first = current.next;
     if (null != first) {first.prev = null;}
    }
    else
    {
     Node prev = current.prev;
     prev.next = current.next;
     if (null != current.next)
     {
      current.next.prev = prev;
     }
    }
   }
  }

  StringBuffer sb = new StringBuffer();
  Node current = first;
  while (current != root)
  {
   sb.append(current.data);
   current = current.parent;
  }
  return sb.reverse().toString();
 }

 public static void main(String[] args)
 {
  String[] urls = {
   "abc.google.com",
   "abc.facebook.com",
   "abc.amazon.com",
   "abc.yahoo.com",
   "abc.facebook.com",
   "abc.yahoo.com",
   "abc.facebook.com",
   "abc.google.com"
   };
  System.out.println(firstUnique(urls));
 }
}

- Alva0930 February 15, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

This has to be a doubly linked list, for the removal of nodes to be O(1) , right ?

- sumanth232 October 13, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Even though delete operation for a single and the doubly linked list is O(1) operation once we have access to its current and previous pointers, finding the elements for deletion is still O(N). Therefore I dont think overall delete operation is O(1) operation when finding that element for removal is necessary

- Saurabh July 01, 2018 | Flag

Comment hidden because of low score. Click to expand.

of 9 vote

I believe you used hash(or trie or something similar) to store the counts of each URL.
To make it one traversal, you need only to use a doubly linked list to link all the unique ones (either link the items in the hash table or do it separately) so far as you traverse through.
Remove the URL from the list if its count goes over 1;
and of course connects its predecessor and successor (which is actually implied when I say "remove")
So after one traversal, the first one of your linked list is the desired one.
Extra spaces for two pointers for each URL are required for this solution.

- fentoyal December 07, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

suppose you are in the middle of traversing thru the list, u have a half prepared link list, now u get another url, so what u do?? u traverse thru the link list untill u find the match. But how u r going to do this "traverse thru" for each element on "overall O(n) complexity"??

- Anonymous December 07, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

The doubly linked list is in addition to the hash table.
To remove or to insert a node to this list is up to the count of the URL in the hash table.

- fentoyal December 09, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

If we have a hash table then why do we require a linked list. Can't we just traverse the array and keep on checking the count for each one??

- R January 30, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

You may consider an array is a special type of hashtable which integrates (partly) the doubly linked list. But in practice, I believe a real hast table is required and each of its item is pointing to a doubly linked list.You can tell the difference between these two different models

- fentoyal January 31, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

What does unique mean? appearing exactly once?

Hash the url and use the hash to check for dupes, should be faster than plain old string matching.

- Sergey Brin January 30, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

millions of url i dont think hash is a good idea

- anshulzunke September 18, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

I think this might work in linear time.
Assign a prime number for each character (letters, digits and special characters). Eg: 'a'--> 2, 'b'-->3, '1'-->17, '#'--> 29. Compute the hash code of the url based on the character's prime number and its position in the url.
After computing the hashcode, insert the hashcode in HashSet (in Java). Do this for every URL. Incase the hashcode of URL mathces the hashcode of some previous url inserted in HashSet, then its the first duplicate.

- badalrocks February 01, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

If urls are stored in xml file, then a quick way would be to use Muench's grouping technique to retrieve unique url.

- Anonymous February 07, 2009 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

bedalrocks,

Using prime numbers is usually a very bad idea, practically speaking.

- Anonymous March 10, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Sort the url list using quick sort in O(nlogn). Scan through the array to find the unique URL in O(n) time. Total time: O(nlogn).

PS: Here sorting and scnaning involves string comparision. Say if each string is length K, we can treat n*k = N as another constant.

- Sadineni.Venkat April 28, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

The list of url's is very big, and hence sorting does not seem to be a valid option here.

- Kk May 24, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

what abt md5 checksum

- Anonymous May 15, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Seems list quite a few here have missed a part of the question. The question is to find the first unique URL. Not any unique url or the first duplicate.

A solution would be
* go through the URLs, store in hash table, for key=url, a boolean value that represents if the URL has been seen once or more than once. That is first pass.
* go through the URLs again, and for each URL, check in the hash table, for how many times it has occured. If it has occured only once, that's the URL you want.

Now, there are some space constraints that I haven't considered. Millions of URLs. Let's say 10 million URLs. Each URL has let's say 50 characters on average. That's 500 million characters. That takes up a lot of space even though it is doable. However if I am using a hash table, I am not storing the entire URL, but just the hash code and a boolean. That would be let's say 4 bytes for hash code and a bit for boolean. ~40 MB for hash table. That's not bad at all.

One way to optimize the second pass would be .. if I could create another array that contains the URLs in the order in which they appear in the original list, however without any duplicates. I can construct this during the first pass. So during the second pass, I could go through this array, that way reducing the number of URLs I am processing. However this needs additional memory.

Now if I could make this more challenging, what if the number of URLs is not millions but billions. In that case the hash table cannot be stored in memory. It would be in the order of 40GB. How would we solve the problem then?

One solution I can think of is to use distributed computing. Have multiple server machines store the hash table pieces in their local memory. Since network communication could be faster than disk reads, this could work well. Otherwise the hash table could be stored on disk. This would be slow because you don't know where the hash code of next URL will be in the hash table memory.

- Rajesh Konda June 29, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

How will you check for duplicates if you don't store the entire url? After all, if the hash is just 4 bytes, there is a non-zero chance of a collision.

Also, you have perhaps missed an important step: Canonicalize the URLS. There could be multiple versions of the same url, which might hash differently but are actually the same.

Also, your comment about hashtable and hard disk access is incorrect, if you use chained linear hashing. In fact the original intent behind inventing the hash table was to have a fast disk access. An array -> quick access, followed by a linked list access, which would be fast if the size of the linked list (# of collisions for that hash) is small.

And you could just store the least index of the url, (whether it was first, second, nth etc) in the hash table itself and just walk the hash table ignoring dupes. For this just a bit array would be sufficient as a supporting structure to figure out the first unique.

Also, if the urls are valid, about using IP addresses? With a trie (patricia) of IP address + a lookup structure (say hash/tree) for each IP adress, you could perhap cut down on space and make the lookups faster.

- LOLer June 29, 2009 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

The linked list/trie combo approach above successfully captures the first unique URL.

- wealthychef June 17, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

One way would be to maintain a Bloom filter for the URL's that you have seen before. They are great for negative membership queries.

- interviewtomorrow September 15, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

the question is to find the 1st unique url. If u make the hash of all the url's and then traverse each hash to check the frequency you may be able to get all the unique url's but certainly not the one that comes first.
One way to come up with the 1st unique url is to go thourgh the list order in which the url's are presented to us and then look at the set of unique url's (found by Rajesh's method) to determine which is the 1st unique.

- aks on Rajesh March 20, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I like the bloom filter idea indeed. Now, since the list is very long but finite (ie. not a stream), we can iterate it backwards, keeping track of the last URL that we don't find in the bloom filter.

def first_unique_url(alist):
 latest_unique = None
 bloom = BloomFilter() #assuming some class here
 for url in backwards(alist):
  if not bloom.contains(url):
   bloom.add(url)
   latest_unique = url
 return latest_unique

- throwaway221 February 06, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

you can have a trie sort of structure O(n) time complexity, each node can have a flag which says whether a word finishes there or not and a count stating number of occurrences.

- Anonymous March 07, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

How about using linkedHashMap?

HashMap

md5(url) | reference to node in doubly linked list

while( lines ){
if( md5(url) exists )
delete node in in doubly linkedlist
else{
add to linkedlist queue end
set value to the node ( queue end).
}
}

return the first node of the queue.

I think this can be done in O(n)

- Anonymous January 02, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Trie would be better. But instead of having count of urls we should have the index of the string in the list. Whenever we encounter the url again make the index = -1. So now the trie will give indexes in each node of trie as Null(never got set) or -1(set more than once) or +ve index value of the url in the list(set only once). now getting all the non null and +ve indexes find the min index you will get first unique URL.

- anshulzunke September 18, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 2 vote

uniq filename.txt > f2.txt

using unix commands? not sure about the time complexity tho :|

- dumdum December 07, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Or we can use std::sort + std::unique for a C++ position
if unix command is allowed.

- fentoyal December 07, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

uniq requires that its input be sorted, and sort takes O(n log(n))

- JeffD December 07, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

This is well explained in CC book

- Anonymous December 07, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

what is the CC book?

- Somisetty December 08, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Cracking the coding interview. You'l find it on the home page of this site.

- P December 08, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

As fentoyal says, put them into a hash table. Add to each actual bucket entry an iterator into a doubly-linked list of those items that have been seen once. When you walk down the original list of urls, if you don't find it in the hash table, add it, and add its entry (hash table iterator, or raw url, either way) to the back of the linked list. If you do find the entry in the hash table, look at the iterator stored there. If it's null, you've seen it 2+ times already, do nothing. If the iterator there is non-null, use it to pull the item out of the linked list (O(1) time), and then set the iterator in the hash table to null.

When you've gone through the list of urls, pick the front-most (which is the first added, that was not subsequently removed) url from the linked list.

- JeffD December 07, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I dont think this will give time O(n).

Look at the case when an element was previously considered unique and now as we iterate through the sequence we find that it has a double. In this case we have to remove from hash which is fine as it can be done in O(1) but removing the item from the linked list takes O(n)

doesnt this push the overall running time to O(n^2)

- Anonymous December 07, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

No, no - you never remove the url from the hash table (otherwise if you see it again, you'd think it was the first time). There are 3 states - 1) it's a new url (add to hash table and to back of list, add that list iterator to the url's hashtable entry), 2) you've seen the url and the hash table still has a valid iterator to the list (you've seen it once - so we remove the url from the list, and remove the iterator from the url's entry in the hashtable) 3) you've seen the url, and its hash table entry no longer has the iterator -> you've seen the url 2+ times. It's not in the list, so it's in no danger of being thought unique - do nothing).

- JeffD December 07, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

... to answer your objection more specifically, you're talking about case 2). We don't have to iterate through the doubly-linked list to find our candidate, because removing an element from a linked list is O(1), if you know exactly where it is. (I don't know how it is in Java, I'm sure it's the same, but in C++ you can have an iterator (acts like a pointer) to a list element. Being doubly-linked, removal is constant-time, and you don't mess up any other iterators.) And, again, you never take the url's entry out of the hash table.

- JeffD December 07, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I think space complexity here is way too high.
1. Keeping LONG list of URL as a DLL.
2. Maintaining a HashMap.

Slightly better solution is to use 2 hashmaps.

1. unqiue_url
2. all_url

Traverse through URLS
if URL is present in all_url
remove url from unique_url
else
add url to unique_url
add url to all_url
end

Items left in unique_url will be unique

- Anonymous December 07, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

> Items left in unique_url will be unique

... but how will you know which was first? And, my doubly-linked list contains only unique items, it should take less space than another hash table.

- JeffD December 07, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

unqiue_url you can store url as key and value as the order of occurence 1, 2, 3, 4. or use a Map which keeps track of insertion order, but I guess that uses a DLL internally.

I did not see that you only store unique items in DLL.

Both solutions have same complexity, guess choice would be between what is easier to read and maintain

- Anonymous December 11, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

if you change the unique map to a vector (C++) or arraylist (Java), you could have the position in the unique_url list as the value of the all_url map and just remove it from the list right away. This solves the knowing which was first problem since lists preserve order.

- Anonymous January 03, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

I guess you can first create a hashmap of all the url whose key contains
i) no of occurrences of the url
ii) corresponding linked list node containing the url.

So whenever a new url is met, it is determined whether the url is contained in the hashmap.
i) If it is contained and the no of occurences is 1, then we increase the number of occurences by 1 and delete the corresponding linked list node.
ii)If it is not contained in the hash map, then we create a new node and connect it to the tail of the linked list. Then we update the hash map accordingly.
iii) If it is contained in the hash map, and the number of occurrences is more than 1, it means that the corresponding node has already been deleted, and we do not need to do anything.

The head of the linked list is gives the answer.

- Ranjan December 07, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

This should work but you need to use a doubly linked list and you need to remember a pointer to the node in the doubly linked list in the hash map as well. Once an url is found to be duplicate you increment the counter, delete the node and also make the pointer in the hash map null. Rest should work as you mentioned.

- Anonymous December 12, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

You use mixed combination of trie and hashing on the reversed URL.
Basically by reversing the URL and treating the domains in it as complex keys you reduce the number of collisions (host.dom1.dom2 -> dom2(trie).dom1(trie).host(hash))

- kartikaditya December 07, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

<pre lang="" line="1" title="CodeMonkey48234" class="run-this">Dictionary<string, int> urlCount = new Dictionary<string, int>();
Queue<string> uniqueUrls = new Queue<string>();

foreach(string url in File.ReadLine())
{
if(!urlCount.Contains(url))
{
urlCount.Add(url, 1);
uniqueUrls.Enqueue(url);
}
else
{
//count > 1
urlCount[url]++;
if(uniqueUrls.Peek() == url)
{
uniqueUrls.Dequeue();
}
}
}

return uniqueUrls.Dequeue();</pre><pre title="CodeMonkey48234" input="yes">
</pre>

- Anonymous December 09, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

<pre lang="" line="1" title="CodeMonkey11335" class="run-this">package career.cup.google;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Iterator;
import java.util.LinkedHashSet;
import java.util.NoSuchElementException;
import java.util.Random;

/**
* @author Given a very long list of URLs, find the first URL which is
* unique ( occurred exactly once ). I gave a O(n) extra space and O(2n)
* time solution, but he was expecting O(n) time, one traversal.
*/
public class G94FirstUniqueURL {

/**
* @param url
* @returns 1st non duplicate URL, otherwise throws NoSuchElementFind exception...
*
* traverse the List check if the URL was find delete his
* correspondent index from the LinkedHashSet index keep that URL as duplicate....
* Will use a hash map to register the list traversal
* status, key = URL, Value = A list of indexes in the list,
* Will use A LinkedHashSet for the indexes which assume are not duplicates, and remove those which are...
* Used this data structure because will contain the first element which is unique at the end of the algorithm..
* If this hash set is empty throw No Such Element Find...
*/
public static String findFirstUniqueURL(ArrayList<String> url) {

HashMap<String, HashSet<Integer>> hm = new HashMap<String, HashSet<Integer>>();
LinkedHashSet<Integer> fIndex = new LinkedHashSet<Integer>();
Iterator<String> it = url.iterator();
int listIndex = -1;
while (it.hasNext()) {
String u = it.next();
listIndex++;
if (!hm.containsKey(u)) {
HashSet<Integer> idx = new HashSet<Integer>();
idx.add(listIndex);
hm.put(u, idx);
fIndex.add(listIndex);
} else {
hm.get(u).add(listIndex);
deleteDuplicateIndexes(hm.get(u), fIndex);
hm.put(u, new HashSet<Integer>());
}
}
if (fIndex.isEmpty())
throw new NoSuchElementException("No Unique URL");
return url.get(fIndex.iterator().next());
}

/**
* @param idx - duplicate indexes list
* @param in - indexes processed
*/
private static void deleteDuplicateIndexes(HashSet<Integer> idx,
LinkedHashSet<Integer> in) {
for (Integer i : idx) {
in.remove(i);
}
}

public static void main(String[] args) {
Random r = new Random();
ArrayList<String> url = new ArrayList<String>();
for (int i = 0; i < 50; i++) {
Integer v = r.nextInt(37);
url.add(v.toString());
System.out.printf("%d ", v);
}
System.out.println();
System.out.println(findFirstUniqueURL(url));

}
}

</pre><pre title="CodeMonkey11335" input="yes">
</pre>

- Anonymous December 10, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

- hello world December 14, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Wouldnt the algo below work ?

public static int findUnique(Url[] arr) {
		HashMap<Url, Integer> map = new HashMap<Url, Integer>();
		int minKey = Integer.MIN_VALUE;
		for (int i = 0; i < arr.length; ++i) {
			Url num = arr[i];
			if (map.containsKey(num)) {
				int count = map.get(num);
				count++;
				map.put(num, count);
				if (num == minKey) {
					minKey = Integer.MIN_VALUE;
				}
			} else {
				if (minKey == Integer.MIN_VALUE) {
					minKey = num;
				}
				map.put(num, 1);
			}
		}

		return minKey;
	}

- anon January 12, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

import java.util.ArrayList;
import java.util.HashMap;

public class UniqueUrl {
	
	HashMap<String, Boolean> urlKeyMap;
	ArrayList<String> urlList;
		
	public static void main(String[] args) {
		UniqueUrl uniqueUrl = new UniqueUrl();
		String[] stringArr = {"aab","abc","abc","abd","abd"};
		String nonRepeatedUrl = uniqueUrl.getUniqueUrl(stringArr);
		if(nonRepeatedUrl == null){
			System.out.println("No such string");
		}else{
			System.out.println(nonRepeatedUrl);
		}
	}

	private String getUniqueUrl(String[] arr) {
		urlKeyMap = new HashMap<String, Boolean>();
		urlList = new ArrayList<String>();
		for(String str : arr){
			if(urlKeyMap.containsKey(str)){
				if(urlList.contains(str)){
					urlList.remove(str);
				}
			}else{
				urlKeyMap.put(str,true);
				urlList.add(str);
			}
		}
		if(urlList.size()==0){
			return null;
		}
		return urlList.get(0);
	}
}

- Anonymous January 12, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

import java.util.ArrayList;
import java.util.HashMap;

public class UniqueUrl {
	
	HashMap<String, Boolean> urlKeyMap;
	ArrayList<String> urlList;
		
	public static void main(String[] args) {
		UniqueUrl uniqueUrl = new UniqueUrl();
		String[] stringArr = {"aab","abc","abc","abd","abd"};
		String nonRepeatedUrl = uniqueUrl.getUniqueUrl(stringArr);
		if(nonRepeatedUrl == null){
			System.out.println("No such string");
		}else{
			System.out.println(nonRepeatedUrl);
		}
	}

	private String getUniqueUrl(String[] arr) {
		urlKeyMap = new HashMap<String, Boolean>();
		urlList = new ArrayList<String>();
		for(String str : arr){
			if(urlKeyMap.containsKey(str)){
				if(urlList.contains(str)){
					urlList.remove(str);
				}
			}else{
				urlKeyMap.put(str,true);
				urlList.add(str);
			}
		}
		if(urlList.size()==0){
			return null;
		}
		return urlList.get(0);
	}
}

- Anonymous January 12, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

public static String findFirstUnique(String [] list) {
		Set<String> seen = new HashSet<String>();
		Set<String> notSeen = new HashSet<String>();
		
		for (String s : list) {
			notSeen.add(s);
			
			if (!seen.contains(s)) {
				seen.add(s);
			} else {
				notSeen.remove(s);
			}
		}
		
		return notSeen.isEmpty() ? null : notSeen.iterator().next();
	}

- Yong February 01, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

This is the answer: \w*?foo\w*?\s(?!foo+)

- Sarah Ross February 14, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I think you are on to something. This can all be done with Linux/Unix commands.

if the master list of urls are in a text file (master.txt) then

sort master.txt | uniq -c > occurances.txt

will give you a file of the occurances for each url.

Obviously, the url(s) with associated occurance of one is what you want.

If for some reason the urls are scattered in different files you can use grep

to pipe them into one file with something like

grep -r "http:*" *.* > master.txt

I don't know what the run time of unix sort is but why would anyone write

a C++ program when Unix/Linux can tell you with no effort.

- Anonymous February 28, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

You can traverse the list from the end to the begining, then the last unique one in this method is the first one and the problem will be easier.

- LiSheng April 04, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

ok. so create two has tables.
One will store dup urls. Other will store unique urls.

1. Read url. create hash.
2. Try to insert hash into dup url hash. If success remove it. else go to step 1. Ignore insertion failures in dup url hash table.
3. Insert into unique url hash. If hash exists error will be generated. Remove that key from uniqu url hash and insert it into dup url hash. Ignore insert failures for dup url hash table. go to step 1
3. The unique url hash table contains all the urls that are unique.

- maddy April 05, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I think all of you missed that the guy wanted a real string algo... pretty hard core:

#include <iostream>
#include <map>
#include <string>
#include <vector>
using namespace std;

class node{
public:
    map<char, node*> children;
    char c;
    int index;
};


int find_first_unique(vector<string> & urls)
{
    int result_index = -1;
    node * root = new node;
    root->c = '*';
    bool repeat[urls.size()];
    if(urls.size() <= 0)
    {
        return -1;

    }

    for(int i = 0; i < urls.size(); i++)

      {

	string & url = urls[i];

	node * n = root;

	for(int j = 0; j < url.length(); j ++)

	  {

	    node * child;

	    if((child = n->children[url[j]]) == NULL)

	      {

		child = new node;

		child->c = url[j];

		n->children[child->c] = child;

		child->index = i;

	      }

	    n = child;

	  }

	if(n->index != i)

	  {

	    repeat[i]=true;

	    repeat[n->index]=true;

	  }

	else{

	  repeat[n->index] = false;

	}

      }

    for(int i =0; i < urls.size(); i ++)

      {

	if(repeat[i] == false)

	  {

	    return i;

	  }

      }
    return -2;
}


int main(int argc, char ** a)
{

  vector<string> v;
  v.push_back("AAAA");
  v.push_back("AAAA");
  v.push_back("AAAAC");
  v.push_back("BBBB");
  cout <<find_first_unique(v)<< endl;
  return 0;
}

- Anonymous June 25, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

This does the job in cleaner, shorter, faster and least memory possible

static String findUniqueUrl(List<String> urls) {

           /** preserves insertion order, so you can eventually tell the first unique url. 
                it provides constant-time performance for the basic operations (add, contains and remove),
                assuming the the hash function disperses elements properly among the buckets           
            **/
           Set<String> unique = new LinkedHashSet<>();

        for (String url : urls) {
            if (unique.contains(url)) {
              /** removes a url if duplicated,
              only requires space O(unique urls)  **/
                   unique.remove(url);
            } else {
                /** add url the first time its found **/
                unique.add(url); 
                     }

        }

       /** get the first found unique url **/
        Iterator<String> it = unique.iterator();
        String firstunique = null;
        while (it.hasNext()) {
            firstunique = it.next();
            break;
        }
        output(unique);
        return firstunique;

    }

- Ebenezer August 01, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Can try
1) adding all values in hash table O(n)
2)Perform linear search on hash table O(n)

- brainkillerme November 15, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Are you serious?? Interviewer explicitly mentioned tht its a long list. Hashtable is rejected in the question itslf indirectly

- Anonymous November 15, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

z11z

- Anonymous November 25, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

Use min heap with url as data of heap and url_frequency as the heap comparator.
Top of the heap would give u an unique url if its frequency is exactly 1.

O(nlogn) time complexity, O(n) space complexity.

- Sid January 30, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Sorry, this doesnt work in O(nlogn).
When we see a duplicated url, to find the url and increase its url_frequency count is O(n) (Searching in heap is O(n))

Hence, the overall time complexity is O(n^2).

- Sid January 30, 2009 | Flag

Comment hidden because of low score. Click to expand.

-1

of 1 vote

i think following code will work.

package psan;
import java.io.*;
import java.util.*;

public class UniqueUrlSearch

{
public static void main(String...s) throws Exception
{
BufferedReader b=new BufferedReader(new InputStreamReader(System.in));
LinkedList<String> ans=new LinkedList<String>();
System.out.println("Enter the list");
String list[]=b.readLine().split(" ");
int len=list.length,itr=-1;
int old=1,current=1,v=0;
String tmp="",yy="";
HashMap<String,String> hs=new HashMap<String,String>();
for(int i=0;i< len;i++)
{

if(!hs.containsKey(list[i]))
{
try{++itr;
yy=yy+itr;
tmp=tmp+yy+" "+"1";
System.out.println("in if tmp="+tmp);
hs.put(list[i],tmp);
ans.add(list[i]);
}catch(Exception e){e.printStackTrace();}
}
else
{
try{

System.out.println("in else:");
String ss=hs.get(list[i]);
String x[]=ss.split(" ");
//for(int j=0;j<x.length;j++) System.out.print(" "+x[j]);
int key=new Integer(x[1]);
int index=new Integer(x[0]);
key+=old;
tmp="";
tmp=tmp+index+" "+key;
hs.put(list[i],tmp);
//System.out.println("index="+index+" "+"key="+key);
if((index-v) >= 0)
ans.remove(index-v);
//System.out.println("size of linked list="+ans.size());
++v;
}catch(Exception e1){e1.printStackTrace();}

}
yy="";
tmp="";

}
System.out.println("ans="+ans);
}

}

- Anonymous December 08, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

package psan;
import java.io.*;
import java.util.*;

public class UniqueUrlSearch

{
public static void main(String...s) throws Exception
{
BufferedReader b=new BufferedReader(new InputStreamReader(System.in));
LinkedList<String> ans=new LinkedList<String>();
System.out.println("Enter the list");
String list[]=b.readLine().split(" ");
int len=list.length,itr=-1;
int old=1,current=1,v=0;
String tmp="",yy="";
HashMap<String,String> hs=new HashMap<String,String>();
for(int i=0;i< len;i++)
{

if(!hs.containsKey(list[i]))
{
try{++itr;
yy=yy+itr;
tmp=tmp+yy+" "+"1";
System.out.println("in if tmp="+tmp);
hs.put(list[i],tmp);
ans.add(list[i]);
}catch(Exception e){e.printStackTrace();}
}
else
{
try{

System.out.println("in else:");
String ss=hs.get(list[i]);
String x[]=ss.split(" ");
//for(int j=0;j<x.length;j++) System.out.print(" "+x[j]);
int key=new Integer(x[1]);
int index=new Integer(x[0]);
key+=old;
tmp="";
tmp=tmp+index+" "+key;
hs.put(list[i],tmp);
//System.out.println("index="+index+" "+"key="+key);
if((index-v) >= 0)
ans.remove(index-v);
//System.out.println("size of linked list="+ans.size());
++v;
}catch(Exception e1){e1.printStackTrace();}

}
yy="";
tmp="";

}
System.out.println("ans="+ans);
}

}

- psan December 08, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

-2

of 2 vote

I was asked this question by an interviewer. I felt that no algorithm can be faster than O(n^2). But, I saw the methods using the hash table. Though I get the idea to use the hash table, the question simply shifts from finding the first unique URL to finding the optimal hashing function. There can still be collisions and that can end up increasing the search in the unique list from O(1).

CareerCup

Google Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews