From 3M Health Information Systems
AI Talk: Conversation bots
Over the weekend I came across a nice summarization of recent advances in Conversational AI Agents and found these advances interesting. I covered GPT-3 in an earlier blog post, but the other agents look interesting as well. So, here is a summary of the authors’ summary.
Meena, a Google bot, released earlier this year, has a quaint name. It’s a popular name in South India, where I hail from, and perhaps the Indian authors of this work came up with that name. They must have really liked it, as it was mentioned 228 times in their paper (2). The data for the bot was mined from social media conversations. The deep learning architecture used for building this chatbot utilizes a specialized transformer model, incorporating an encoder that captures the current context of the multiple turns of conversation. That then feeds a complex multi-layer decoder transformer which spits out the response. So, the question is: How good is this chatbot? The authors basically boiled it down to two questions:
- Does the response make sense? Remarkably they named that “Sensibleness.”
- Is the response specific to the context of the conversation? This was dubbed “Sensitivity.”
So, the combined metric is Sensibleness and Sensitivity Average (SSA), derived by crowdsourcing the grading of each conversation (basically, human evaluation). As conversational chatbots go, Meena can pretty much talk about anything, one interesting tweet by a Professor Graham Neubig of CMU stands out. He points out that conversation can enter into a “scary sociopath mode.” Well, I guess Google needs to fine tune this a lot more!
This chatbot, from Facebook AI Research released early this spring, does better than Meena in empirical evaluation. Its success comes from training a model that is 3.6 times larger than the Google’s chatbot. The strategy the researchers adopted was to first train the model on 1.5 billion publicly available Reddit conversations. With this foundation, they then combined individually fine-tuned conversations that focused on emotions, conversations that employed distinct personas and conversations that were knowledge intensive using the Wizard of Wikipedia (WoW conversation data).
In addition, they trained the model to switch between these different modes to exhibit empathy one moment, wiki-knowledge in another turn or share some personal information from the persona provided. For this last training task, switching between different modes, they crowdsourced the data in an interesting way. They asked real people to engage in conversations, where one was unguided and the other was provided feedback on what the next topic could be using one of the different engines, blending different conversational styles. In spite of the impressive results obtained, the bot is no good in long conversations and tended to hallucinate facts.
The latest entrant into this arena is GPT-3. I covered the impressive results obtained by this model in an earlier blog. This model is quite versatile, but one of its capabilities is to carry on a conversation on its own! Deep learning pioneer, Geoffrey Hinton, tweeted “Extrapolating the spectacular performance of GPT-3 into the future suggests that the answer to life, the universe and everything is just 4.398 trillion parameters.” – implying the steady march of more powerful models appears to be a monotonic increase in the size of the models. But there is hope: Another model (PET) is even better than GPT-3, but with 99 percent fewer parameters! We’ll explore this new model in another blog.
I am always looking for feedback and if you would like me to cover a story, please let me know. “See something, say something!” Leave me a comment below or ask a question on my blogger profile page.
V. “Juggy” Jagannathan, PhD, is Director of Research for 3M M*Modal and is an AI Evangelist with four decades of experience in AI and Computer Science research.