Write a script to compare two

Microsoft Interview Question for Software Engineer / Developers

-1

of 1 vote

22
Answers
Write a script to compare two files.
One is primary file and other is secondary.
I need to check if secondary file contains each line of primary (may be in different order) and should not contain any extra data.

exmaple:
cat primary
abc
lmn
xyz

cat secondary:
cat secondary
xyz
abc
lmn

then in this case compare function should give true.

Note: file contains may be any thing like html or xml code or other.
- Abhi February 26, 2013 in India | Report Duplicate | Flag | PURGE
Microsoft Software Engineer / Developer Perl

Email me when people comment.

An error occurred in subscribing you.

Country: India
Interview Type: Written Test

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 1 vote

sort and diff might work...

- Anonymous February 27, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 2 votes

yes this can be done simply like

sort primary >primary_cp
sort secondary >secondary_cp
cmp primary secondary

- mike February 28, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I believe this approach will work only if the we need to compare two file for same contents, But here the problem statement says that whatever secondary have should be present in primary

But the above 3 line solution will not work if primary contains extra thing or repeated values
secondary pimary
abc def
xyz abc
def xyz
abc
stu
now if you sort these files
secondary primary
abc abc
def abc
xyz def
stu
xyz
so in this case , it wont work

- Gaurav Khurana March 05, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 2 vote

Use suffix tree
build suffix tree for 1st file and then search every string from second file..if not found return

- chiragtayal February 26, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 2 votes

instead of building suffix tree we can just have a consistent hash function. we can compute hashcode for a line and can compare it with the map populated with hashcode for a line from another file. takes lesser space o(number of lines) and o(n) time complexity

- Hinax February 26, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@Hinax: how do you plan on finding out the hash key for a line?

- alex March 07, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Main File:
abc
abc

Secondary File:
abc

your algorithm will give true, yet it should give false!

- Kevin March 19, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Well build a trie of all the lines in file A and before actually starting to match lines from file B, make sure both of these files have equal number of lines.

- Epic_coder May 04, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 2 vote

1. Read one line at a time from file called "primary"
2. Grep for the line in the file called "secondary"
3. Print line if missing.

#! /bin/bash
missing=0
while read line
do
grep "$line" secondary 2>&1 > /dev/null
if [ $? -ne 0 ]
then
let missing=$missing+1
echo "Missing line is: \"$line\"."
fi
done < primary
echo 
if [ $missing -ne 0 ]
then
echo "$missing number of lines are missing in \"secondary\""
else
echo "Nothing missing"
di

- Nitin February 26, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I think grep will not help because it might be the case that your file(Secondary) is having "abc" in first line and primary file may have xyzabc)

so if you grep abc in primary file ,, it will return true because abc is there as xyzabc

in this case files are not same, we need to check that file should not contain anything extra ,,

- Gaurav Khurana March 05, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

create a dictionary<int,bool>
Read every line from secondary and get hashcode and add to dict
then read primary , get one line hash code ,
if(dict.containskey(hashcode)) dict[hashcode] = true;

at last , check dict has false value.

- zhengliangjun February 27, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

/home/gkhurana> cat > primary
abc
lmn
xyz
/home/gkhurana> cat > secondary
xyz
abc
lmn

/home/gkhurana> perl
# The logic is , we will read first line of secondary and then compare with each line of primary
# if there is a match its fine that means that line exist in primary,
# similary you will proceed for each line of secondary
#
# Suppose the line is not there in primary, then it will se
#
#

open fh,"secondary";
$count=1;
foreach $sec (<fh>)
{
open fh1,"primary";

$found = 1;
print "\nsecondary line=$sec";
print "\nStarting Matching in primary file\n";
foreach $pr (<fh1>)
{
print "primary line=$pr";
if($sec eq $pr) #checking if secondary line is present in primary
{
print "Match found for line $count in primary\n";
$found = 0;
}
}
close(fh1);

if($found eq '1') #if a line is in secondary and not found in primary just exit, no need to check further
{
print "\n There is content that is present in secondary but not in primary\n.";
exit(0);
}
$count++;
}
close(fh);

if($found eq '0')
{
print "\n Whatever is there in secondary is present in primary.\n ";

}
^D
secondary line=xyz

Starting Matching in primary file
primary line=abc
primary line=lmn
primary line=xyz
Match found for line 1 in primary

secondary line=abc

Starting Matching in primary file
primary line=abc
Match found for line 2 in primary
primary line=lmn
primary line=xyz

secondary line=lmn

Starting Matching in primary file
primary line=abc
primary line=lmn
Match found for line 3 in primary
primary line=xyz

-------------------------
when the files dont match, we have added def in secondary file which is not there in primary

home/gkhurana> cat > secondary
abc
lmn
xyz
def
/home/gkhurana>

secondary line=abc

Starting Matching in primary file
primary line=abc
Match found for line 1 in primary
primary line=lmn
primary line=xyz

secondary line=lmn

Starting Matching in primary file
primary line=abc
primary line=lmn
Match found for line 2 in primary
primary line=xyz

secondary line=xyz

Starting Matching in primary file
primary line=abc
primary line=lmn
primary line=xyz
Match found for line 3 in primary

secondary line=def

Starting Matching in primary file
primary line=abc
primary line=lmn
primary line=xyz

There is content that is present in secondary but not in primary

/home/gkhurana>

i hope it helps

- Gaurav Khurana March 05, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

create a take the hashtable of first 4 words or less of each line in primary.
now read lines from the secondary if it exists match the rest of the line.
O(n+m)*l....where l if the length of a line ...n & m are the number of lines

- shr March 08, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Here is the code for the file comparison

#! /usr/bin/perl
use strict;
use warnings;

print("Enter Primary File path\n");
my $p_file = <STDIN>;
chomp($p_file);
if(!-e $p_file){
print("Input Primary file does not exist\n");
}

print("Enter Secondary File path\n");
my $s_file = <>;
chomp($s_file);
if(!-e $s_file){
print("Input Secondary File does not exist\n");
}

open (FILE1, $p_file) || die ("Can't open file $p_file for reading\n");
open (FILE2, $s_file) || die ("Can't open file $s_file for reading\n");

my @file1 = <FILE1>;
my @file2 = <FILE2>;

foreach my $line (@file2) {
if(grep(/$line/,@file1)) {
next;
} else {
print("Files are different.\n");
exit;
}
}

foreach my $line (@file1) {
if(grep(/$line/,@file2)) {
next;
} else {
print("Files are different.\n");
exit;
}
}

print("Both the input files are same\n");
close (FILE1) || die ("Can't close file $p_file for reading\n");
close (FILE2) || die ("Can't close file $s_file for reading\n");

- Nakul May 18, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

- Nakul May 18, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

#!/usr/bin/perl
# author: fructu
# v0.01
#
# description:
#
#  1) i use read_file_primary_to_hash to load the primary file
#     into a hash if the line is repited the hash has
#     the number of repetitions.
#
#  2) i use read_file_secondary_to_hash to read the secondary file
#     for each line i searh it in the hash and chek if the hash is
#     defined if not -> return error;
#     if the hash is defined for that line i check have more than 0
#     lines and substract 1 in order to mach the line of the primary
#     file.
#     Then in the same function i check all hash entries are 0 if
#     not i return 0. 
#

use warnings;
use strict;

sub read_file_primary_to_hash($ $)
{
  my $file_name = shift;
  my $hash      = shift;

  open file_in, "<$file_name" or die "$!\n";

  while(<file_in>){
    chomp();
    $hash->{$_}++;
  }

  close file_in;
}

sub read_file_secondary_to_hash($ $)
{
  my $file_name = shift;
  my $hash      = shift;

  open file_in, "<$file_name" or die "$!\n";

  while(<file_in>){
    chomp();
    if(defined $hash->{$_}){
      if($hash->{$_} <= 0){
        print "line not found [$_]\n";
        return 0;
      }
      $hash->{$_}--;
    }else{
      print "line not found [$_]\n";
      return 0;
    }
  }

  foreach my $k (keys %$hash){
    if( 0 != $hash->{$k} ){
      print "line count bad [$k] : [$hash->{$k}]\n"; 
      return 0;
    }
  }

  close file_in;
  return 1;
}

sub main
{
  my $h1 = {};
  my $result = 0;

  read_file_primary_to_hash('e_2_primary.txt', $h1);
  $result = read_file_secondary_to_hash('e_2_secundary.txt', $h1);

  if(1 == $result){
    print "the files have the same lines\n";
  }else{
    print "the files do not have the same lines\n";
  }
}

main();

This script can be improved using a function to generate a hash like

use Digest::SHA qw(sha1 sha1_hex sha1_base64 ...);

$digest = sha1($data);

and store it in the hash instead of the whole line.

- fructu July 21, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I dont know why everyone is make it so complicated. Its just a simple shell script

sort primary_file >primary_file.s
sort secondary_file >secondary_file.s
ret=`comm -13 primary_file.s secondary_file.s |wc -l`
if [ $ret -ne 0 ]
then
echo -1
exit
fi
echo 0
exit

- Aditya Kulkarni December 19, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I tried this and worked for me, please review:

use strict;
use warnings;

use Tie::File;

my @array;
tie @array, 'Tie::File',"input.txt" or die "Not able to tie file: $! \n";

print join ("\n",@array), "\n";

my @array2;
tie @array2, 'Tie::File', "output.txt" or die "Not able to read output file : $! \n";

foreach my $element (@array2)
{
if ($element ~~ @array)
{
print "found $element \n";
}
else
{
print "Element $element not found \n";
}
}

- jaing February 18, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

for line in `cat a.txt`
do
grepval=`grep $line b.txt`
if [ ! "$grepval" ]; then
echo "LINE $line does not exist in the secondary file"
exit
fi
done
echo "Each line in the primary file exists in the secondary file"

- Sammy April 29, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

my %content=();
open(FILE,"<temp.txt");
while( my $line = <FILE>)
{
chomp($line);
$content{$line}=$line;

}
close(FILE);
open(FILE,"<temp1.txt");
$flag=1;
while( my $line = <FILE>)
{
chomp($line);

if (!exists($content{$line}))
{
$flag=0;
}
}

if ($flag==1){
print "temp1 is secondary of temp\n";}

- sudarson May 02, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Please take the content of the two files in 2 arrays. After that the below logic would be fine.

#!/usr/bin/perl -w
use strict;

my @prim_array = qw(Cat Dog Snake Tiger);
my @sec_array = qw(Squirrel Cat Dog Snake);
my %hash;
my %hash1;
foreach (@prim_array)
{
	$hash{$_} = undef;
}

foreach (@sec_array)
{
	$hash1{$_} = undef;
}

my $flag = 0;
foreach my $key (sort keys %hash)
{
	if(!(exists($hash1{$key})))
	{
		$flag = 1;
		last;
	}
}

if($flag == 1)
{
	print "Secondary file does not contain all elements of primary file\n";
}
else
{
	print "Secondary file contains all elements of primary file\n";

}

- lochan.brijesh March 02, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

use strict;
use warnings;

open(FH1, "primary.txt");
open(FH2, "secondary.txt");



my @arr1=<FH1>;
my @arr2=<FH2>;

close(FH1);
close(FH2);

my $result=comp(\@arr1, \@arr2);

if($result) {
        print "pass\n";
}
else {
        print "fail\n";
}
sub comp {
        my $ref1=shift;
        my $ref2=shift;
        my %hash;
        foreach(@{$ref1}) {
                $hash{$_}=1;##We are not interested in value. Our only concern is key
        }
        foreach(@{$ref2}) {
                if(!exists($hash{$_})) {
                        return 0;
                }
        }
        return 1;
}

- Sughosh Divanji July 19, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

CareerCup

Microsoft Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews