Microsoft Interview Question
Software Engineer / DevelopersCountry: India
Interview Type: Written Test
yes this can be done simply like
sort primary >primary_cp
sort secondary >secondary_cp
cmp primary secondary
I believe this approach will work only if the we need to compare two file for same contents, But here the problem statement says that whatever secondary have should be present in primary
But the above 3 line solution will not work if primary contains extra thing or repeated values
secondary pimary
abc def
xyz abc
def xyz
abc
stu
now if you sort these files
secondary primary
abc abc
def abc
xyz def
stu
xyz
so in this case , it wont work
Use suffix tree
build suffix tree for 1st file and then search every string from second file..if not found return
instead of building suffix tree we can just have a consistent hash function. we can compute hashcode for a line and can compare it with the map populated with hashcode for a line from another file. takes lesser space o(number of lines) and o(n) time complexity
Main File:
abc
abc
Secondary File:
abc
your algorithm will give true, yet it should give false!
1. Read one line at a time from file called "primary"
2. Grep for the line in the file called "secondary"
3. Print line if missing.
#! /bin/bash
missing=0
while read line
do
grep "$line" secondary 2>&1 > /dev/null
if [ $? -ne 0 ]
then
let missing=$missing+1
echo "Missing line is: \"$line\"."
fi
done < primary
echo
if [ $missing -ne 0 ]
then
echo "$missing number of lines are missing in \"secondary\""
else
echo "Nothing missing"
di
I think grep will not help because it might be the case that your file(Secondary) is having "abc" in first line and primary file may have xyzabc)
so if you grep abc in primary file ,, it will return true because abc is there as xyzabc
in this case files are not same, we need to check that file should not contain anything extra ,,
/home/gkhurana> cat > primary
abc
lmn
xyz
/home/gkhurana> cat > secondary
xyz
abc
lmn
/home/gkhurana> perl
# The logic is , we will read first line of secondary and then compare with each line of primary
# if there is a match its fine that means that line exist in primary,
# similary you will proceed for each line of secondary
#
# Suppose the line is not there in primary, then it will se
#
#
open fh,"secondary";
$count=1;
foreach $sec (<fh>)
{
open fh1,"primary";
$found = 1;
print "\nsecondary line=$sec";
print "\nStarting Matching in primary file\n";
foreach $pr (<fh1>)
{
print "primary line=$pr";
if($sec eq $pr) #checking if secondary line is present in primary
{
print "Match found for line $count in primary\n";
$found = 0;
}
}
close(fh1);
if($found eq '1') #if a line is in secondary and not found in primary just exit, no need to check further
{
print "\n There is content that is present in secondary but not in primary\n.";
exit(0);
}
$count++;
}
close(fh);
if($found eq '0')
{
print "\n Whatever is there in secondary is present in primary.\n ";
}
^D
secondary line=xyz
Starting Matching in primary file
primary line=abc
primary line=lmn
primary line=xyz
Match found for line 1 in primary
secondary line=abc
Starting Matching in primary file
primary line=abc
Match found for line 2 in primary
primary line=lmn
primary line=xyz
secondary line=lmn
Starting Matching in primary file
primary line=abc
primary line=lmn
Match found for line 3 in primary
primary line=xyz
-------------------------
when the files dont match, we have added def in secondary file which is not there in primary
home/gkhurana> cat > secondary
abc
lmn
xyz
def
/home/gkhurana>
secondary line=abc
Starting Matching in primary file
primary line=abc
Match found for line 1 in primary
primary line=lmn
primary line=xyz
secondary line=lmn
Starting Matching in primary file
primary line=abc
primary line=lmn
Match found for line 2 in primary
primary line=xyz
secondary line=xyz
Starting Matching in primary file
primary line=abc
primary line=lmn
primary line=xyz
Match found for line 3 in primary
secondary line=def
Starting Matching in primary file
primary line=abc
primary line=lmn
primary line=xyz
There is content that is present in secondary but not in primary
/home/gkhurana>
i hope it helps
Here is the code for the file comparison
#! /usr/bin/perl
use strict;
use warnings;
print("Enter Primary File path\n");
my $p_file = <STDIN>;
chomp($p_file);
if(!-e $p_file){
print("Input Primary file does not exist\n");
}
print("Enter Secondary File path\n");
my $s_file = <>;
chomp($s_file);
if(!-e $s_file){
print("Input Secondary File does not exist\n");
}
open (FILE1, $p_file) || die ("Can't open file $p_file for reading\n");
open (FILE2, $s_file) || die ("Can't open file $s_file for reading\n");
my @file1 = <FILE1>;
my @file2 = <FILE2>;
foreach my $line (@file2) {
if(grep(/$line/,@file1)) {
next;
} else {
print("Files are different.\n");
exit;
}
}
foreach my $line (@file1) {
if(grep(/$line/,@file2)) {
next;
} else {
print("Files are different.\n");
exit;
}
}
print("Both the input files are same\n");
close (FILE1) || die ("Can't close file $p_file for reading\n");
close (FILE2) || die ("Can't close file $s_file for reading\n");
Here is the code for the file comparison
#! /usr/bin/perl
use strict;
use warnings;
print("Enter Primary File path\n");
my $p_file = <STDIN>;
chomp($p_file);
if(!-e $p_file){
print("Input Primary file does not exist\n");
}
print("Enter Secondary File path\n");
my $s_file = <>;
chomp($s_file);
if(!-e $s_file){
print("Input Secondary File does not exist\n");
}
open (FILE1, $p_file) || die ("Can't open file $p_file for reading\n");
open (FILE2, $s_file) || die ("Can't open file $s_file for reading\n");
my @file1 = <FILE1>;
my @file2 = <FILE2>;
foreach my $line (@file2) {
if(grep(/$line/,@file1)) {
next;
} else {
print("Files are different.\n");
exit;
}
}
foreach my $line (@file1) {
if(grep(/$line/,@file2)) {
next;
} else {
print("Files are different.\n");
exit;
}
}
print("Both the input files are same\n");
close (FILE1) || die ("Can't close file $p_file for reading\n");
close (FILE2) || die ("Can't close file $s_file for reading\n");
#!/usr/bin/perl
# author: fructu
# v0.01
#
# description:
#
# 1) i use read_file_primary_to_hash to load the primary file
# into a hash if the line is repited the hash has
# the number of repetitions.
#
# 2) i use read_file_secondary_to_hash to read the secondary file
# for each line i searh it in the hash and chek if the hash is
# defined if not -> return error;
# if the hash is defined for that line i check have more than 0
# lines and substract 1 in order to mach the line of the primary
# file.
# Then in the same function i check all hash entries are 0 if
# not i return 0.
#
use warnings;
use strict;
sub read_file_primary_to_hash($ $)
{
my $file_name = shift;
my $hash = shift;
open file_in, "<$file_name" or die "$!\n";
while(<file_in>){
chomp();
$hash->{$_}++;
}
close file_in;
}
sub read_file_secondary_to_hash($ $)
{
my $file_name = shift;
my $hash = shift;
open file_in, "<$file_name" or die "$!\n";
while(<file_in>){
chomp();
if(defined $hash->{$_}){
if($hash->{$_} <= 0){
print "line not found [$_]\n";
return 0;
}
$hash->{$_}--;
}else{
print "line not found [$_]\n";
return 0;
}
}
foreach my $k (keys %$hash){
if( 0 != $hash->{$k} ){
print "line count bad [$k] : [$hash->{$k}]\n";
return 0;
}
}
close file_in;
return 1;
}
sub main
{
my $h1 = {};
my $result = 0;
read_file_primary_to_hash('e_2_primary.txt', $h1);
$result = read_file_secondary_to_hash('e_2_secundary.txt', $h1);
if(1 == $result){
print "the files have the same lines\n";
}else{
print "the files do not have the same lines\n";
}
}
main();
This script can be improved using a function to generate a hash like
use Digest::SHA qw(sha1 sha1_hex sha1_base64 ...);
$digest = sha1($data);
and store it in the hash instead of the whole line.
I tried this and worked for me, please review:
use strict;
use warnings;
use Tie::File;
my @array;
tie @array, 'Tie::File',"input.txt" or die "Not able to tie file: $! \n";
print join ("\n",@array), "\n";
my @array2;
tie @array2, 'Tie::File', "output.txt" or die "Not able to read output file : $! \n";
foreach my $element (@array2)
{
if ($element ~~ @array)
{
print "found $element \n";
}
else
{
print "Element $element not found \n";
}
}
my %content=();
open(FILE,"<temp.txt");
while( my $line = <FILE>)
{
chomp($line);
$content{$line}=$line;
}
close(FILE);
open(FILE,"<temp1.txt");
$flag=1;
while( my $line = <FILE>)
{
chomp($line);
if (!exists($content{$line}))
{
$flag=0;
}
}
if ($flag==1){
print "temp1 is secondary of temp\n";}
Please take the content of the two files in 2 arrays. After that the below logic would be fine.
#!/usr/bin/perl -w
use strict;
my @prim_array = qw(Cat Dog Snake Tiger);
my @sec_array = qw(Squirrel Cat Dog Snake);
my %hash;
my %hash1;
foreach (@prim_array)
{
$hash{$_} = undef;
}
foreach (@sec_array)
{
$hash1{$_} = undef;
}
my $flag = 0;
foreach my $key (sort keys %hash)
{
if(!(exists($hash1{$key})))
{
$flag = 1;
last;
}
}
if($flag == 1)
{
print "Secondary file does not contain all elements of primary file\n";
}
else
{
print "Secondary file contains all elements of primary file\n";
}
use strict;
use warnings;
open(FH1, "primary.txt");
open(FH2, "secondary.txt");
my @arr1=<FH1>;
my @arr2=<FH2>;
close(FH1);
close(FH2);
my $result=comp(\@arr1, \@arr2);
if($result) {
print "pass\n";
}
else {
print "fail\n";
}
sub comp {
my $ref1=shift;
my $ref2=shift;
my %hash;
foreach(@{$ref1}) {
$hash{$_}=1;##We are not interested in value. Our only concern is key
}
foreach(@{$ref2}) {
if(!exists($hash{$_})) {
return 0;
}
}
return 1;
}
sort and diff might work...
- Anonymous February 27, 2013