Master Project (IA)

Posts

Showing posts from September, 2017

Extraction Model

September 24, 2017

Extraction Model:A nswer sentence selection Due to increasing complexity in question answering, deep learning has become a popular trend in solving difficult problems. Approaches Word Count: counts the number of non-stopwords in the question that also occur in the answer sentence. Weighted Word Count: re-weights the counts by the IDF values of the question words. LCLR :makes use of rich lexical semantic features, including word/lemma matching, WordNet and vector-space lexical semantic models. Convolutional Neural Network Recurrent Neural Network

Wikipedia Corpora

September 20, 2017

To Do : Create a table comparing WikiQA and SQuAD and say we are going to choose WikiQA (for the moment) WikiQA: WikiQA is a set of question and sentence pairs, collected and annotated for research on open-domain QA. It includes questions for which there are no correct sentences, enabling researchers to work on answer triggering. WikiQA introduced the task of answer triggering and was the only answer triggering dataset. Questions type The questions are originally sampled from Bing query logs . This corpus has 3047 questions (in raw data file WikiQA.tsv ). It contains both general and factoid questions. -HOW: eg. how did Athenians make money? how does interlibrary loan work? ( NO WHY Questions ) -What, How many, Who, when, Where etc. Date/Version Version 1.0: August 25, 2015 Answer triggering - We propose the answer triggering task, a new challenge for the question answering problem, which requires QA s...

Definitions & How to

September 20, 2017

extract summary section from Wikipedia (Python) Generating a Plain Text Corpus from Wikipedia What is TSV? What Opens a TSV? How to import TSV file in MS Excel What is a pickle file? “Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy. Generating a Plain Text Corpus from Wikipedia remove HTML from Wikipedia text JAVA LIBS FOR PROCESSING WIKI MARKUP Remove Markup from wiki text Mylyn Wikitext example (plain java, using maven) de.tudarmstadt.ukp.wikipedia.parser A Java Wikipedia markup to plain text converter eclipse link Python library to extract summary section of Wikipedia article SQUAD Reformatter -Python Machine translation evaluation book

Links

September 19, 2017

DISEQuA Multisix corpus Bitbucket Account JWPL DataMachine Book:Speech and Language Processing(IR, IE, linguistic definitions) An example of a bilingual corpus Datasets for Natural Language Processing RNN QCRI Live Translator QCRI MT QCRI MT Article Multilingual QAS (steps explanation) Link for QA English datasets TF-IDF Links From Dr. Alberto Book : Machine Translation evaluation-Likert Scale Corpora referneces - QCRI -Memex "DISEQuA" and "Multisix" corpora

September 19, 2017