Informatica Interview Question for Java Developers
- 0of 0 votes
AnswersCreate corpus reader and tokenizer
- fgfsdgs November 06, 2014 in United States
Write a program that creates an inverted index for this corpus, allowing searching for free
text, e.g. [dor cabeça], [efeitos adversos]
a) Use the index structure and contents that you consider more suitable/relevant;
b) Use a default list of stopwords and accept as optional argument a text file with
stopwords; Add the option to disable the use of the stopword filter;
c) Use the Porter stemmer as default, but you should allow disabling this;
d) Add the option to write / read the index to / from a text file.
2. Implement a ranked retrieval method based on the vector-space model and using the tf-idf
weighting scheme.
2.1. Use the queries and the list of relevant documents to evaluate your implementation (note:
these will be available later). Calculate and report the average precision for each query and
the mean average precision (MAP) over all queries.
docs-> EMEA Corpus ->https://drive.google.com/folderview?id=0B3Slz0zk1PRUSkxuTlE2VVl1Ym8&usp=sharing| Report Duplicate | Flag | PURGE
Informatica Java Developer
Interview Type: Written Test