Language understanding has so far been the privilege of humans. That is why studying natural language processing (NLP) promises huge potential for approaching the holy grail of artificial general intelligence (A.G.I). Many researchers dive into the field of NLP — machine translation, question and answering, reading comprehension, natural conversations, and on and on.

Shining a spotlight on the latest research progress of language understanding, the Association for Computational Linguistics (ACL) conference this year honored Know What You Don’t Know: Unanswerable Questions for SQuAD as its best short paper. SQuAD, which stands for Stanford Question Answering Dataset, is recognized as the best reading comprehension dataset. It spawns some of the latest models achieving human-level accuracy in the task of question answering.

Dr. Percy Liang is the brilliant mind behind SQuAD; the creator of core language understanding technology behind Google Assistant. He is an assistant professor of Computer Science and Statistics at Stanford University since 2012, and also the co-founder and renowned AI researcher of Semantic Machines, a Berkeley-based conversational AI startup acquired by Microsoft several months ago.

A rising superstar in the community of machine learning and natural language processing, Dr. Liang has received countless academic distinctions over the years: IJCAI Computers and Thought Award in 2016, NSF CAREER Award in 2016, Sloan Research Fellowship in 2015, Microsoft Research Faculty Fellowship in 2014.

This year at the three-day AI Frontiers Conference, which will assemble AI and big data professionals in Silicon Valley, Dr. Liang will speak of his latest research progress in language understanding. The article is to get a glimpse of his academic career, research focus, and his vision for AI.

Explore language understanding

“How do I understand the language?”

That is the question that puzzled Dr. Liang when he was still at the high school. The idea of using some sort of methods to explore the mystic and fascinating process of language understanding make him excited.

In 2004, Dr. Liang received his Bachelor of Science degree from the elite Massachusetts Institute of Technology (MIT). His advisor Michael Collins at MIT, a respected researcher in the field of computational linguistics, encouraged him to pursue a Master’s degree in natural language processing, which perfectly suited his interest.

One year later, he was admitted to University of California at Berkeley (UC Berkeley), where he apprenticed to Dr. Dan Klein and Dr. Michael Jordan — top-tier experts in machine learning and language understanding. It is worth mentioning that many AI figures today — Andrew Ng, Yoshua Bengio, Eric Xing — are Dr. Jordan’s students.

“I am fortunate to have these two mentors. Not only did I learn a lot from them, but what I learned is complementary, and not just in the field of research (machine learning and NLP),” said Dr. Liang in an interview with Chinese media.

After spending a year as a post-doc at Google New York, where he developed language understanding technologies for Google Assistant, Dr. Liang joined Stanford University and started teaching students in AI courses. Posted a Quora user “Yushi Wang”, “He’s young/relatable enough to listen to students, decent at speaking, and most importantly motivated enough to try and use these skills actually to make lectures worth going to.”

Meanwhile, Dr. Liang’s mentor at UC Berkeley Dr. Klein founded Semantic Machines in 2014. The company uses the power of machine learning to enable users to discover, access and interact with information and services in a much more natural way, and with significantly less effort.

Dr. Klein tried to get his young talented apprentice on board. “Percy is one of the most extraordinary researchers I’ve ever worked with,” he commented. In 2016, Dr. Liang joined the company’s technical leadership team. This year, the company was acquired by Microsoft.

SQuAD 2.0 and interpretable machine learning

Much of Dr. Liang’s work has centered around the task of converting a user’s request to simple computer programs that specify the sequence of actions to be taken in response.

SQuAD is one of his standout innovations that spurs the creation of question-answering machines, which can understand and respond to complex, nuanced and out-of-context questions in natural language. SQuAD 1.0 was created in 2016 and includes 100,000 questions on Wikipedia articles for which the answer can be directly extracted from a segment of text.

This year, the research team led by Dr. Liang released SQuAD 2.0, which combines the SQuAD1.0 questions with over 50,000 new, unanswerable questions written adversarially by crowd workers to seem similar to answerable questions. The goal is to help AI models to recognize when questions cannot be answered based on the provided textual data.

While SQuAD is designed for reading comprehension, Dr. Liang believes it has greater impacts: the dataset encourages researchers to develop new generic models — neural machine translation produces an attention-based model, which is now one of the most common models in the field of machine learning; models trained on one dataset are valuable to other tasks.

Dr. Liang is also exploring agents that learn language interactively, or can engage in a collaborative dialogue with humans. The purpose of language understanding is not merely to imitate humans. Systems that aim to interact with humans should fundamentally understand how humans think and act, at least at a behavioral level.

While Dr. Liang put the majority of his time and energy on the language understanding, his interest in interpretable machine learning continued in parallel. Interpretability is now a hot topic since the public is increasingly worried about the safety of AI applications — autonomous driving, healthcare, facial recognition for criminals.

“Given our increasing reliance on machine learning, it is critical to building tools to help us make machine learning more reliable ‘in the wild,’” said Dr. Liang in an interview with Future of Life Institute.

Recently his research team has achieved some progress in explaining the black-box machine learning models. One of his papers proposed a statistics technique Influence Functions to trace a model’s prediction through the learning algorithm and back to its training data. His another paper introduces a method based on a semidefinite relaxation to prevent attacks from adversarial examples.

Machine learning and language understanding are still at an early stage. Its road to a mature engineering discipline is bound to be long and arduous. However, Dr. Liang is always up for a challenge.