The objective of this project is to design, implement and evaluate a Question Answering (QA) System and gain experience in working with standard off-the-shelf NLP toolkits. Some of the standard NLP toolkits used in this project are Stanford Core NLP library and Apache Lucene library.
The designed QA system takes a question as input along with a set of relevant documents and finally outputs the top ten ranked guesses for the answer. There are two sets of data available- Dev_Set and Test_Set. The Dev_Set contains questions and relevant documents along with correct answers. Initially, the Dev_Set’s QA system Output is evaluated against the correct answers provided. . Once the performance of the system reaches an acceptable accuracy, the system is tested for the Test_Set and the output answers have been submitted for evaluation. The project implementation can be found at its github link.
Language models capture which sequences of words are more likely to appear in a given domain, they are applicable to many natural language processing tasks. In this project, we consider a type of text classiﬁcation problem. The goal here is to guess whether the sender of the given email is of a lower rank than the recipient (UPSPEAK) or vice versa (DOWNSPEAK) based on the content of the email. The creation of a decently performing classiﬁer for this task would conﬁrm the idea that people indeed write diﬀerently depending on the relative rank of the recipient.
To do this, ﬁrst, language models are trained for UPSPEAK and DOWNSPEAK. Then these models were further enhancedusing smoothing methods. Then, given an email from updown test, the probability of the email assigned by the 2 language models were computed. If the UPSPEAK model assigns the higher probability, the email is classified as UPSPEAK, else it is classified as DOWNSPEAK. The complete implementation of the project can be accessed at the github link.