A Carnegie Mellon system designed to rapidly answer questions — even some seemingly off the wall — posed to the Yahoo! Answers website received the highest score by far in the LiveQA evaluation track at the Text Retrieval Conference (TREC 2015).
"This is the question answering that really matters to people," said LTI Professor and MCDS Director Eric Nyberg, who heads the Open Advancement of Question Answering (OAQA) group. "These are real questions by real users answered in real-time."
Di Wang, an LTI Ph.D. student, Yahoo! InMind fellow and a member of Nyberg's OAQA group, used a Deep Learning approach to develop the system for the TREC evaluation, building on more than 10 years of question-answering research at the LTI. Question answering is one of the hottest areas of artificial intelligence today, as many companies recognize it as being the next generation of online search.
In the TREC evaluation, the OAQA system answered more than a thousand questions, which were as varied as "Can lazy eyes fix themselves?" and "Pregnant cat? What to do?!"
Judges gave the OAQA answers an average score of 1.081 on a 3.0 scale. The score might not sound impressive — indicating that the system's answers were only "fair" on average — but the average score for all 21 teams in the evaluation track was just 0.467 and the second-ranked team had an average score of 0.677, evaluators announced at the conference this week in Gaithersburg, Maryland.
"This was a pretty tough challenge," Nyberg said. Unlike IBM's Watson, which Nyberg's group helped develop for 2011's famed Jeopardy! challenge, the system wasn't just retrieving factoids, but synthesizing information from multiple sources to create answers of up to 1,000 characters. It also had just one minute to generate an answer — far faster than humans could answer most of the questions.
"Jeopardy was fun, but putting together a website that answers Jeopardy clues is not really going to help people," Nyberg said. People want information to help them live their lives. "Answering a question such as 'What should I do in Munich?' requires more than just identifying a fact, because there is no one right answer. It requires summarizing multiple documents and then piecing together the most reasonable answer."
The LiveQA track was sponsored by Yahoo! and used questions submitted to the Yahoo! Answers website over 24 hours on Aug. 31. One question per minute was sent to each participant.
Wang used Deep Learning, a machine learning technique that automatically learns hierarchies of features, to train a statistical model that can rank candidate answers as to how well they can answer a question. Wang, along with the other participants, used a corpus of past Yahoo! Answers question-and-answer pairs to train the system.
"This was an outstanding effort by Di, who carried out this project by himself and in the process outperformed teams from around the world," Nyberg said. In keeping with the OAQA's dedication to open sourcing, all of the software used in the TREC evaluation will be publicly released so that other groups can duplicate this effort and advance the science of question-answering.
Yahoo! sponsored the LiveQA track at this year's TREC, which is run by the National Institute of Standards and Technology. As it happens, Wang's work was also supported in part through CMU's InMind intelligent agents project, which is funded through a generous gift from Yahoo!