Question Answering(QA) System

1.      Introduction

The objective of this project is to design, implement and evaluate a Question Answering (QA) System and gain experience in working with standard off-the-shelf NLP toolkits. Some of the standard NLP toolkits used in this project are Stanford Core NLP library and Apache Lucene library.

The designed QA system takes a question as input along with a set of relevant documents and finally outputs the top ten ranked guesses for the answer. There are two sets of data available- Dev_Set and Test_Set. The Dev_Set contains questions and relevant documents along with correct answers. Initially, the Dev_Set’s QA system Output is evaluated against the correct answers provided. . Once the performance of the system reaches an acceptable accuracy, the system is tested for the Test_Set and the output answers have been submitted for evaluation. The project implementation can be found at its github link.

2.      Block Diagram – QA System

Block Diagram

Figure 1. Block Diagram of QA System

In the above block diagram, the Green boxes refer to the various input/output files, Blue boxes refer to various classes of the QA system, Orange boxes refer to various threads running simultaneously and Grey boxes refer to buffers that store Question Objects passed between different threads.


3.      Design

The QA system consists of three sections-Question Processing, Passage Retrieval and Answer Formulation.

3.1.  Question Processing

The objective of this section is to process the question and generate the most relevant answer patterns. Answer Formulation section uses these generated answer types to extract the correct answers from relevant passages. The design of Question Processing is explained below.

3.1.1. Inputs

The QuestionProcessor takes two questions.txt files – one for training and another for testing. For training, the questions.txt file is read from the Dev_Set and for testing it is read from Test_Set. During instantiation of the QuestionProcessor, both files are read and processed. The training question.txt file is used for training the QA system by means of Supervised learning. This is explained later in the section.

3.1.2. Training QA Model

The QuestionProcessor defines various Question Types (Who, Where, What etc) and Answer Types. The Answer Types can be NER, POS or NP. These are defined to fetch the answer by NER and POS tagging on relevant passages.  QuestionProcessor tries to identify the Question Type for each question and tries to match the best Answer Types for each Question Type. For example, the Question Type “Where” best matches the Answer Type “LOCATION” which is an NER tag. Some Question Types can be easily mapped to Answer Types. These known standard Question Types are thus mapped to known Answer Types and stored in QAHashMap.

3.1.3. Modelling the QA system using Supervised Learning

The Question Types that cannot be mapped to Answer Types directly are handled by means of supervised learning. For example, a “What” question may lead to different Answer Types. Supervised learning is implemented by labelling each question in the training dataset with the most relevant Answer Type To train the system, the Question Processor identifies the question structure by fetching a combination of ‘question’ words (corresponding to POS tags “WP”, “WRB”, “WDT”) and the first “NN”. These question structures are then mapped to the labelled Answer Type and stored in trainedHashMap. Both these Hash Maps namely QAHashMap and trainedHashMap are then used to identify the Question Type and Answer Type for the Test_Set questions.

3.1.4. Question Processing

Once the Question Processor thread starts executing, each question is processed to identify the Question Type and Answer Type by referring to both QAHashMap and trainedHashMap. If the question cannot be mapped to any Question Type, then it is classified as OTHER Question Type for which the Answer Type will be the Noun Phrase (NP). Finally, various keywords are fetched from the question and stored in the Question Object. Each of these Question objects are queued up in the Questions Queue for processing by the Passage Retriever. Once all questions are processed the Question Processor thread closes.

3.2.  Passage Retrieval

Our approach for retrieving the most relevant passages was to calculate the tf-idf values for the passages and then find a similarity measure between the question and document vectors. We are using Apache’s Lucene to do the same as Lucene uses tf-idf and Cosine Similarity for scoring documents. Out of different variations, we found the following approach to be the best use of Lucene for passage retrieval:


Figure 2. Passage Retrieval Block Diagram

3.2.1. Keyword Extraction:

Keywords (query words) are extracted from the input question by removing STOP words. These words are used to find relevant passages from the Passage Retrieval System.


The project implemented three kinds of keyword extraction schemes: a simple POS-based Keyword Extraction, Noun- Phrase Based Keyword Extraction and Thesaurus-based keyword extraction using WordNet. Approach-1: POS- based Keyword Extraction:

The idea is to filter out words based on their part-of-speech. This was achieved by stripping off the following parts of speech: -LRB-, -RPB-, WDT, WP, DT, TO, CC, IN, ‘?’, ‘.’ .


Some other approaches used for POS based Keyword extraction are mentioned below.

  • The above POS list is optimal; removing any other parts of speech reduced the precision. So the initial thought that we can throw away adverbs and increase precision turned out to be wrong – apparently all meaningful parts of speech matter.


  • By deeming any phrase within double quotes in the question as important, the phrase is inserted as-is in the keyword List. This is also part of our final approach.


  • We also tried boosting/de-boosting query terms based on the POS. This did not  give us a significant improvement.


The final keyword list extracted based on POS tags are: {NN, NNP, JJ, VB, FW, JJR, JJS, NNS, NNPS, VBD, VBG, VBN, VBP, VBZ}. This actually worked well and we got the MRR (Mean Reciprocal Rank) scoring  of 25% for Dev_Set. Approach 2:     Noun Phrase based Keyword Extraction:

The noun phrase based keyword extraction is based on two notions: what phrases we extract and how we transform the keyword list to incorporate extracted phrases. To extract noun phrases we implemented Stanford parser. There were two points to be noted here:


  • What to do when one NP (outer) contains another NP (inner)?
  • What will be the regular expression that decides which POS tags to be included in the phrase?

We tried several combinations. The best results were achieved by only including outer NP’s and the regular expression extracts NP’s that contains NP or WHNP. Approach 3:     Thesaurus based Keyword Extraction:

This approach is based on including keywords, their synonyms, hypernyms and hyponyms in the keyword List to query through Lucene. Wordnet JWI API is incorporated to include the wordnet functionalities.


We retrieved synonyms, hypernyms and hyponyms of a word w by iterating over every sense ‘S’ of the word and retrieving every word ‘Ws’, which corresponds to the sense.


After performing multiple test cases we decided not to expand our keywords set by this method. Because some keywords (like ‘flower’) contain a very large set of hyponyms and it is not feasible to include all of these in our query words list.

Also, we did not include noun phrases in our keyword list as there is a low chance that the noun phrase will occur the same way in the document.

3.2.2. Passage Filtering:

  • The text file containing the 50 most relevant documents has been split into independent documents. The 50 documents and the keywords extracted from the question are sent to Lucene to retreive 10 most relevant documents.
  • Each independent document is parsed using an XML Parser to get Passages. We have implemented a custom parser function that runs before XML Parsing to fix the document at places so that it can be read in the standard XML format without losing any data. The Stanford PTBTokenizer has been used to tokenize the sentences. The documents have been put through 3 rounds of filtering to retrieve the 10 most relevant sentences.


Other approaches tried:

  • Combining the 50 documents to be treated as one large document and finding the 10 most relevant sentences out of those.
  • Retrieving the 2 most relevant passages out of each of the 50 documents and finding the 10 most relevant sentences amongst those.

These approaches did not give us the most desirable outputs.

3.3.  Answer Formulation

The Input to this section is the question and the 10 most relevant sentences retrieved from the Passage Retrieval process. For each of these sentences, NER Tagging  and POS tagging are applied using the Stanford NamedEntity Tagger and POS Tagger respectively. Of all the tags applied, only the tags relevant to the question (as defined by the Answer Type) are selected as the Answers.


If the question does not have the desired number of answers even after applying the above, noun phrases are retrieved from the relevant sentences and reported as the remaining plausible answers.



A simple algorithm has been applied to treat the tags occurring adjacently as a single phrase instead of treating the two words as separate answers.

Eg: [Will/PERSON Smith/PERSON]     →  Answer of the form [Will Smith]

[Los_NN Angeles_NN Kings_NN]  →   Answer of the form [Los Angeles Kings]

4.      Results and Conclusion

The QA System was successfully designed and implemented using some of the standard off-the-shelf NLP tool kits. The system was designed for various aspects as mentioned below.

4.1.  Incremental Design and Accuracy:

The system was designed incrementally by first implementing the basic functionality in each section. The baseline design was implemented to search for answers based on NER tags alone. With the baseline model, the system was executing with 9% MRR. To improve the performance the Answer Types were enhanced to incorporate POS Tags which gave 10% MRR. The QA System was further enhanced by removing duplicate answer guesses. With this enhancement, the system performance improved to 13% MRR. The Answer Formulation was further modified to incorporate Answer Tiling and Lemmatizing. However, both these modifications brought the system performance down by 3% and 5% respectively and these modifications were removed.


The Passage Retrieval was further enhanced by tokenizing passages into sentences thus narrowing down the relevant passages for answer extraction. This gave a huge boost to 20% MRR. Finally, the Question Processor was enhanced to train itself by Supervised Learning based on the Dev_Set. With this enhancement the system performance reached 23% MRR. The Answer Formulation was improvised by removing the keywords(words present in the question itself) from answer guesses to achieve the final performance of 25% MRR. These performances were measured for Dev_Set. With all these enhancements in place, the Test_Set questions were submitted through the QA system and the resulting answers have been submitted for evaluation.

4.2.  Speed:

The system was designed to be faster by employing threads for each section. With multiple cores,the sub systems namely Question Processor, Passage Retrieval and Answer Formulation run in different threads on multiple cores parallelly. This design significantly increases the speed of execution on multi core processor.

4.3.  Flexibility and Ease of Use:

This QA system has multiple packages one each for different sections. This allows import of individual packages in future projects thus providing flexibility to use individual sections. The system is designed with Interfaces thus providing flexibility to use any other implementations for each section. For example, The documents are presently being read from a Text file. If documents need to be read off the Internet or from a database, the PassageReader interface can be implemented.

4.4.  Additional Note:

The Answer guesses could be further improved by taking into account the answers that lie within a particular window size around the keywords from the questions. However, within the time constraints, we could not narrow down upon a plausible strategy to identify the order of significance to be given to the multiple keywords.


Identifying known Answer Types for certain words like Birthday, Zipcode,etc was also considered. If these words appeared in the question, the system could first look for a DATE NER and NUMBER NER respectively.

5.      External Toolkits Used

6.      References

  • Rohini Srihari and Wei Li – “A Question Answering System Supported by Information Extraction”, ANLC ’00 Proceedings of the sixth conference on Applied natural language processing.
  • Richard J Cooper and Stefan M Ruger – “A Simple Question Answering System” , TREC, 2000.
  • Eric Brill, Susan Dumais and Michele Banko – “An Analysis of the AskMSR Question-Answering System”, EMNLP ’02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing – Volume 10.
  • Christopher Manning- “Text-based Question Answering systems” :
  • Deepak Ravichandran, Eduard Hovy – “Learning Surface Text Patterns for a Question Answering System”, Proceedings of the ACL Conference, 2002
  • List of part-of-speech tags used in Penn Treebank Project:
Categories: NLP | Tags: , , , | Leave a comment

Post navigation

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at

%d bloggers like this: