專欄|香儂科技獨家對話AAAI、ACM、ACL三會會士UT Austin大學計算機系教授Raymond J. Mooney

機器之心專欄

策劃:香儂科技

德克薩斯大學奧斯汀分校計算機系教授、人工智能實驗室主任Raymond J. Mooney帶領他的人工智能小組研究了多個領域,目前他的主要研究方向是自然語言處理和計算語言學。其本人曾在2008-2011年間擔任國際機器學習協會(ICML主辦方)主席,曾多次擔任ICML、ACL、AAAI、EMNLP、NAACL等會議主席或領域主席,現在為美國計算機學會(ACM)、國際人工智能學會 (AAAI)、國際計算語言學會(ACL)三會會士 。他發表了百餘篇重要的論文,Google Scholar上有近3萬的論文引用量,其中2007年發表的關於語義分析的文章獲得了ACL 2007會議的最佳論文獎。目前正參與可解釋AI項目(XAI)的研究工作,開發可解釋的問答系統(EQUAS)。

香儂科技:您有一些論文是關於將邏輯方法和分佈語義相結合的(e.g., Beltagy et al. 2016),這種綜合方法顯然有很多優點。如果讓您對只研究這兩種方法中的一種的學者說幾句,請他們考慮結合這兩種看似不相容的方法,您會說什麼?

Mooney我將強調諾貝爾獎得主、心理學家丹尼爾卡內曼在他的書Thinking Fast and Slow中討論的“快速思考和慢速思考”之間的區別。邏輯學方法非常擅長利用複雜的符號推理來綜合不同的事實,這是回答含有量詞和邏輯連接詞的複雜問題所必需的,例如“巴拉克•奧巴馬在約翰•肯尼迪去世前還是後出生?”或者“Woody Allen是與Diane Keeton一起製作的電影更多還是和Mia Farrow更多?”。這是“慢速思考”。分佈和神經方法在“模式識別”方面更好,並且在特定情境下可以快速直觀地做出關於優選詞義或句法結構的決策。這是“快速思考”。人類推理和語言理解需要恰當地結合這兩者。如果想要像人一樣高效地理解語言,我認為我們需要建立同等擅長“快速”和“慢速”思考的計算系統。

香儂科技:近年來,深度學習模型的可解釋性受到了很多關注,領域中有很多方法被提出以解釋深度神經網絡的行為(解釋性熱圖 explanatory heat-map,自然語言等)。在您的工作中,您和您的同事曾證明,利用解釋性熱圖可以改善多個當下最先進的基於圖像的問答系統(Rajani and Mooney, IJCAI 2017 Workshope XAI,Rajani and Mooney, NIPS 2017 ViGIL)。那麼您認為,一般來說增強深層神經網絡的可解釋性能夠幫助我們建立更好的模型嗎?為什麼?

专栏|香侬科技独家对话AAAI、ACM、ACL三会会士UT Austin大学计算机系教授Raymond J. Mooney

Mooney深度學習模型的“不透明性”和“黑匣子”的特徵被公認為是限制其發展和用戶的信任程度的因素。因此,一年前,美國國防高級研究計劃局(DARPA)開始了可解釋人工智能(XAI)項目,試圖開發更透明的深度學習系統。深度學習的愛好者們聲稱他們已經從機器學習中刪除了“特徵工程”(feature engineering),因為他們的模型可以自己學習到特徵。然而,“黑匣子”僅僅是從“特徵工程”轉移到了“神經網絡結構工程”(network architecture engineering)中的設計層數,每層中的神經元數量,它們的連接方式以及連接各層時用到的函數(例如softmax,線性整流函數ReLU或平均池化mean-pooling)。如果深度網絡可以提供解釋,它或許就可以幫助這些開發人員更好地理解系統出錯的原因。不過,我認為解釋的真正作用是支持用戶而不是開發人員。通過提供解釋,用戶可以開始更多地相信系統並理解它們是如何做出決定的,更關鍵的是當系統不能提供容易理解的解釋時,知道哪些結論是不能相信的。

香儂科技:您的許多工作都集中在主動學習和遷移學習上(e.g., Acharya et al. SDM 2014, Acharya, PhD Thesis, 2015)。顯然這對於人機交互非常重要,因為在該場景下獲得有標註的數據的成本是非常高的。請問在實驗室中驗證的算法是否容易推廣到現實生活場景?將實驗室開發的主動學習和遷移算法應用於日常情況的最大障礙是什麼?

Mooney主動學習和遷移學習都是有用的技術,可以減少學習新任務所需的監督量。我們最近在機器人應用中探索的主動學習的特殊變體就是我們所說的“基於機會的主動學習”(opportunistic active learning)。在普通的“基於池”("pool-based")的主動學習中,系統可以隨時詢問整個未標註樣本池中任何一個的標註。但是,在機器人環境中,你正在為特定用戶完成特定任務,並且必須決定是否應詢問用戶當前環境中的某個物體的標籤,而這個問題可能與當前任務無關。我們開發了一種強化學習(Reinforcement Learning)方法,該方法可以學習到一種策略,從而最優地決定在特定環境下何時以及要詢問哪些問題。雖然這些系統會因為提出與用戶當前任務無關的問題而困擾用戶,獲得小的負反饋。然而,一個好的問題可以讓系統學習到一些使其能夠更快地解決用戶接下來的任務的東西,並會獲得巨大的正反饋。使用強化學習的系統可以學習到能夠最大化其長期獎勵的詢問策略,用現在較小的損失,換取稍後較大的潛在收益。

香儂科技:您的大量工作集中在連接語言和感知(如視覺和觸覺)上(e.g., Thomason et al. AAAI 2018, Thomason and Mooney, IJCAI 2017),這是自然語言處理中的一個經典問題,即語意落地(semantic grounding)。但也有許多自然語言處理任務似乎不需要語意落地(語言建模,機器翻譯等)。您為什麼認為在某些情況下建立語言和感知的聯繫是必要的?這樣做的代價和好處是什麼?

专栏|香侬科技独家对话AAAI、ACM、ACL三会会士UT Austin大学计算机系教授Raymond J. Mooney

圖2.通過對名詞短語的觀測特徵向量(observation feature vectors)進行聚類來推導其語義。圖片摘自 Thomason and Mooney, IJCAI 2017。

Mooney在許多自然語言處理任務中,語言和人的感知及動作有明顯聯繫,這時語意落地最為重要。涉及機器人學、計算機視覺和計算機圖形的許多應用都符合這一要求,例如用自然語言給機器人指令,機器理解或回答關於圖像和視頻的問題,以及生成圖形圖像和視頻。然而,許多抽象概念無法直接落地為感知覺體驗,如“博士申請人數量顯著上升”中的“上升”,它並不是視覺上可以看得見的物理上升,就需要我們根據它們原有的語意落地的隱喻用法推導出引申含義(參見Lakoff和Johnson的“Metaphors we Live By”)。因此,理解這些詞語原本的語意落地,對理解它們的隱喻有很大幫助,即使這時已經不直接涉及人的感知和行為。

香儂科技:過去您和您的學生的許多研究項目與人類和機器人的互動關係密切(e.g., Thomason et al., CoRL 2017, Thomason et al., Robo NLP 2017)。您覺得建立人人交互的模型與建立人機交互的模型有什麼區別呢?當您在研究人機交互中的語言學習時,您遇到了哪些特別的挑戰或機會?

专栏|香侬科技独家对话AAAI、ACM、ACL三会会士UT Austin大学计算机系教授Raymond J. Mooney

Mooney與人類相比,機器人執行復雜任務的能力仍然非常初級。我們在研究機器人掌握語言時曾遇到的主要問題之一是找到足夠複雜,因此需要由豐富多樣的語言來表述的任務,同時又不能太難,不然他們就根本無法完成。我們曾主要研究讓機器人尋找並把某一物體從一個位置送到另一個位置的任務,其中物體和位置可以用名詞短語的組合來描述,例如“把冰箱旁邊的藍色馬克杯拿到Ray辦公室旁邊的會議室去”。即使是這樣看似簡單的任務,讓機器人能夠準確無誤地選擇、傳送這些物體也是非常具有挑戰性的。人機交互(Human Robot Interaction,HRI)是一個複雜的領域,自然語言只是其中的一部分。讓機器人理解手勢和其他非語言的交流也非常具有挑戰性。目前,與機器人進行交流其實並不令人滿意,與和人進行交流時的體驗截然不同,而它們無法理解自然語言只是問題的一部分。

香儂科技:在過去,您做了很多有影響力的工作,並發表了許多被廣泛引用的論文。作為一名自然語言處理領域的學者,您的研究方法隨著時間的推移是如何變化的呢?關於對科研項目的良好品味的培養,您有什麼想與學生們分享的嗎?

Mooney在選擇研究問題這方面我很幸運,在我開始研究這些問題(例如語言和視覺的整合)之後,它們變得非常流行。但我確實有一些這方面的建議:其一,我不喜歡研究目前非常流行的問題。因為你很難跟上這個問題所有相關工作的進展,並提出一個其他人都沒想到的想法。所以我避免了當前的“熱門話題”。

其次,我大概每六年轉換一次我正在做的工作,這差不多是博士生導師引導一個博士解決一個科研難題的時間。在研究了一個問題大約6年後,我覺得自己對這個問題的看法變得過於固定、僵化,讓我不再能創造性地思考這個問題,所以我會去看看其他東西。而在經過足夠的時間讓我的思想發生了變化之後,再回過頭來看我過去曾經研究的問題是很有意思的。比如,我在80年代寫博士論文時曾研究了通過“腳本學習”來處理文本,而最近我又與我的博士生Karl Pichotta一起以深度學習為基礎,用全新的視角重新回來研究它。

香儂科技:我們都知道,目前在自然語言領域有很多分領域,您自己的研究就覆蓋了很多不同領域。那麼您認為在未來5年自然語言處理最大的挑戰在哪裡?

Mooney:我認為上面的採訪中談到的許多問題仍然是自然語言領域最大的挑戰。弄清楚如何有效地將符號、邏輯意義表示與詞/句向量分佈表示(distributional representation)相結合,是一個重要的問題,然而很少有研究人員探索它。另外,全面考慮語言各個層級間(語音,句法,詞彙,語義和語用等)的相互作用,來消除歧義,也是一個重要的挑戰。此外,我相信自然語言處理和計算機圖形學的結合也有許多未探索的新機會,如根據自然語言描述自動生成複雜的圖像和視頻。然而,要機器實現對自然語言完全的理解,期待其達到人類的理解能力,還有很長的路要走,畢竟這一領域的各個方面都存在著許多具有挑戰性的問題。

香儂科技 (http://shannon.ai/) ,是一家深耕金融領域的人工智能公司,旨在利用機器學習和人工智能算法提取、整合、分析海量金融信息,讓AI為金融各領域賦能。


香儂科技在2017年12月創立,獲紅杉中國基金獨家數千萬元融資。創始人之一李紀為是斯坦福大學計算機專業歷史上第一位僅用三年時間就獲得博士的人。在近日由劍橋大學研究員 Marek Rei 發佈的一項統計中,李紀為博士在最近三年世界所有人工智能研究者中,以第一作者發表的頂級會議文章數量高居第一位。公司碩士以上比例為100%,博士佔比超30%,成員皆來自斯坦福、MIT、CMU、Princeton、北京大學、清華大學、人民大學、南開大學等國內外知名學府。

參考文獻:

Thomason J, Sinapov J, Mooney R J, et al. Guiding exploratory behaviors for multi-modal grounding of linguistic descriptions [J]. Intelligence (AAAI-18), 2018.

Thomason J, Mooney R J. Multi-modal word synset induction [C]. Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI-17). 2017.

Lakoff G, Johnson M. Metaphors we live by [M]. University of Chicago Press, 2008.

Thomason J, Padmakumar A, Sinapov J, et al. Opportunistic active learning for grounding natural language descriptions [C]. Conference on Robot Learning. 2017: 67-76.

Corona R, Thomason J, Mooney R. Improving black-box speech recognition using semantic parsing [C]. Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2017, 2: 122-127.

Rajani, N.F. and Mooney, R.J., Stacking With Auxiliary features. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI), Melbourne, Australia, pp. 2634– 2640, Aug. 2017.

Rajani N F, Mooney R J. Ensembling visual explanations for VQA [J]. Proceedings ofthe NIPS 2017 workshop on Visually-Grounded Interaction and Language (ViGIL), Dec. 2017.

Acharya A, Mooney R J, Ghosh J. Active multitask learning using both latent and supervised shared topics [C]. Proceedings of the 2014 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics,2014: 190-198.

Acharya A. Knowledge transfer using latent variable models [D]. 2015.

Beltagy I, Roller S, Cheng P, et al. Representing meaning with a combination of logical and distributional models [J]. Computational Linguistics, 2016, 42(4): 763-808.

Kahneman D, Egan P. Thinking, fast and slow [M]. New York: Farrar, Straus and Giroux, 2011.

附英文原文采訪稿:

ShannonAI: You have several papers on combining logical approach and distributional semantics (e.g., Beltagy et al. 2016). There are obviously many advantages of this integrative approach. But if you are asked to say only one thing to researchers that have worked on only one of these two approaches, to invite them to consider combining these two seemingly incompatible approaches, what would it be?

Mooney: I would emphasize the difference between "Thinking Fast and Slow" discussed by Nobel-prize winning psychologist Daniel Kahneman in his book of this title. Logical approaches are really good at combining disparate facts using complex symbolic reasoning, which is required for answering intricate questions with quantifiers and logical connectives such as "Was Barack Obama born before or after John F. Kennedy died?" or "Did Woody Allen make more movies with Diane Keeton or Mia Farrow?". This is "thinking slow." Distributional and neural approaches are better at "pattern recognition," and quickly, intuitively making decisions about preferred word meaning or syntactic structure in particular contexts. This is "thinking fast." Human reasoning and language understanding requires a judicious mixture of both of these. If we are ever going to understand language as effectively as humans, I think we need to build computing systems that are equally good at thinking both "fast"and "slow".

ShannonAI: In recent years, there has been a lot of attention on the interpretability of deep learning models. Various approaches have been proposed to explain deep neural networks's behaviors (explanatory heat-map, natural language, etc.) In your own work, you and your colleagues have shown that using explanations could improve ensembling of three state-of-the-art VQA systems (Rajani and Mooney, IJCAI 2017XAI). Do you think in general enhancing the interpretability of deep neural nets could help us build better models? Why or why not?

Mooney: The"opaqueness" and "black box" quality of deep learning models are well recognized as limiting their development and users' willingness to trust them. Because of this, a year ago DARPA started the ExplainableAI (XAI) project to try to develop more transparent deep learning systems. Deep learning enthusiasts claim that they have removed "feature engineering" from machine learning since their models develop their own features; however, the "black art" has simply been moved from "feature engineering" to "network architecture engineering" designing the number of layers, the number of neurons in each layers, their connectivity, and the combining function used for each (e.g. softmax, ReLU, or mean pooling). If deep networks could provide explanations, it might aid such engineering by allowing the developer to understand better why the system is making errors. However, I think the real role of explanation is to support the user rather than the developer. By providing explanations, the user can learn to trust the system more and understand how it is making decisions, and more critically what conclusions not to trust when the system cannot provide a comprehensible explanation.

ShannonAI: Much of your work has concentrated on active learning and transfer learning (e.g., Acharyaet al. SDM 2014, Acharya 2015). Apparently this is extremely important in human-robot interaction since the cost of labeled data is very high. Do these promising results obtained in laboratory experiments easily generalize to real life scenarios? What are the biggest obstacles of applying those active learning and transfer algorithms developed in the lab to everyday situations?

Mooney: Both active learning and transfer are useful techniques to reduce the amount of supervision needed to learn a new task. The particular variant of active learning we have been recently exploring in robotics applications is what we have called "opportunistic active learning." In normal "pool-based" active learning, the system can ask for a label for any of the complete "pool" of unlabeled examples at any time. However, in a robotics context, you are engaged in a particular task for a particular user and have to decide if it is worth asking that user a question about one of the objects in the current environment, which may not be relevant to the current user's task. We have developed a reinforcement learning (RL) method which learns a policy about when and what active learning queries to ask while in a particular environment. The systems get a small negative reward for bothering the user and asking a question not relevant to their current task. However, a good question will allow the system to learn something that will allow it to more quickly solve a later users' task and receive a large positive reward. Using RL, the system can learn a query policy that maximizes its long-term reward; trading off small inconveniences now against larger potential gains later.

ShannonAI: A lot of your work has concentrated on connecting language and perception/ grounded language learning (e.g., Thomason et al. AAAI 2018, Thomason and Mooney, IJCAI 2017). But there are also many NLP tasks that seem not to require grounding (language modeling, machine translation, etc.). Why do you think is it necessary to model grounded language learning under certain circumstances? What are some costs and benefits of doing so?

Mooney: I believe grounding is mostly important in applications and concepts that clearly have connection to perception or action in the world. Many applications involving robotics, vision, and graphics fit this requirement such as instructing robots in natural language, captioning or answering questions about images and videos, and generating graphics images and video. However, many abstract concepts that are not directly grounded such as "up" in "the number of PhD applicants has gone up dramatically," actually require much of their semantics from metaphorical use of grounded terms (see Lakoff and Johnson's"Metaphors we Live By"). Therefore, understanding the physicallygrounded sense of such terms will allow productive understanding of such metaphors even for applications that do not directly involve perception and/or action.

ShannonAI: Many of you and your students' research projects have close connections with human interaction with robots (e.g., Thomason et al. CoRL 2017, Thomason et al. Robo NLP 2017). Is there any difference between modeling human-human interaction and human-robot interaction? What are some unique challenges/opportunities you have found when you work on language learning in human-robot interactions?

Mooney: Robots' abilities to perform complex tasks is still very primitive compared to humans. One of the main problems we have had working in robot language is finding tasks they can perform that are complex and interesting enough to require language instruction of reasonable richness and variety. We have mainly worked on instructions for finding and delivering a particular object from one location to another where the object and location can be described with compositional noun phrases, for example "Get the heavy blue mug next to the refrigerator and bring it to the meeting room next to Ray's office." Even for this task, getting the robot to be able to grasp a wide range of liftable objects is very challenging. Human Robot Interaction (HRI) is a whole complex field, of which natural language is only a part. Getting robots to understand gesture and other non-verbal communication is also very challenging. Currently, communicating with robots is frustrating and very different from interacting with other people, and their inability to understand natural language is only part of the problem.

ShannonAI: In the past you have done a lot of influential work and have published many widely cited papers. How does your approach as a NLP researcher change overtime? Is there anything you would like to share with students for developing good taste for research problems ?

Mooney: I've partly gotten lucky in choosing problems to work on (e.g. language and vision) that have gotten significantly more popular after I start working on them. But I do have a few recommendations about choosing problems. One, I don't like working on problems that are currently very popular. This makes it too hard to keep up with all the related work and to come up with truly new ideas that no one else is pursuing. So I avoid the current "hot topics". Second, I significantly switch what I am doing about every 6 years, about the time it takes to complete advising a PhD student on a topic. After working on a problem for about 6 years, I feel my view of the problem becomes too fixed and"set in stone" and I am no longer able to think creatively about it, so I switch to something else. It's interesting to come back to a problem I worked on in the past, after sufficient time has past to allow my thinking to have changed. For example, I worked on "script learning" for text processing for my PhD thesis in the 1980's and then recently returned to it with my PhD student Karl Pichotta with a completely new perspective based on deep learning. Finally, I'm a big believer in "inter-disciplinary" work that crosses traditional boundaries. Back in the 1990's when NLP and ML were quite different, disjoint sub-areas of AI, I worked on trying to apply the latest learning ideas to challenging problems in NLP such as semantic parsing and information extraction. Fortunately, these days there is a good flow of communication between ML and NLP so almost everyone is doing this now. My interest in language grounding tries to cross the boundary between NLP and computer vision and robotics. Two of my current areas of interest are integrating NLP and computer graphics, particularly generating 3-D animated video from natural language descriptions; and integrating NLP and software engineering, particularly language to code and helping automate changes to comments when software is changed. Most creativity involves combining and synthesizing previously disparate ideas. Looking at problems that cross traditional disciplinary boundaries opens up many possibilities combining ideas from the two separate areas.

ShannonAI: Obviously there is a variety of topics in the field of NLP, and your own research has covered a wide range of topics. What do you think is the biggest challenge of NLP in next 5 years?

Mooney:I think many of the problems discussed above still are some of the biggest challenges in NLP. Figuring out how to effectively combine symbolic, logical meaning representations with vector-space distributional ones is an important problem that few researchers are exploring. Developing computational architectures that allow efficiently and effectively jointly resolving inter-dependent ambiguities at all levels of linguistic processing (phonetic, syntactic,lexical, semantic, and pragmatic) also remains a key challenge. I believe there are also many unexplored opportunities at the intersection of NLP and computer graphics, allowing the automatic generation of complex images andvideos from natural language descriptions. However, we are still along way from achieving robust, human-level understanding of natural language, and there are many challenging problems remaining in almost all areas of the field.

✄------------------------------------------------

廣告 & 商務合作:[email protected]


分享到:


相關文章: