Reasoning with Multimodal Knowledge: Towards Explainability and Robustness

Abstract

My Ph.D. research focuses on harnessing multimodal knowledge to enhance the explainability and robustness of AI reasoning processes. Commonsense reasoning, a vital capability for achieving human-like cognitive processes in AI, requires integrating information across diverse modalities. This integi·ation helps AI models to better approximate the human reasoning process.
Conventionally, knowledge is perceived as a human concept—a projection of external world interpretations stored in our brains. Humans employ both structured representations (like graphs) and unstructured forms (such as concepts and sentences) to articulate this knowledge across various modalities. However, what proves clear and interpretable to humans may not align with the most effective formats for AI comprehension.
Throughout my doctoral studies, I have explored both structured and unstructured multimodal knowledge representations. The goal was to identify which forms are most effective for aiding AI models in learning to reason. This endeavor has led to achieving state-of-the-art results while ensuring that the models maintain self-explainability and exhibit robustness in off-domain or resource-scarce scenarios.

Speaker: Mr Zhecan James WANG
Date: 22 May 2024 (Wednesday)
Time: 9:15am – 10:15am
Poster: Click here

Latest Seminar

Biography

Mr Zhecan James WANG is currently a fifth-year Ph.D. Candidate at Columbia University. He is fortunate to be jointly supervised by Prof. Shih-fu Chang (Dean of Columbia Engineering, ACM Fellow, Member of the National Academy of Engineering) and Prof. Kaiwei Chang (UCLA, Amazon Scholar). His 5-year Ph.D. research is supported by the DARPA Machine Commonsense project and his research interests are centred on Multimodal Learning, Vision-Language, Commonsense Reasoning, Human-in-the-loop and Human-Centered AI. He is also interested in extending his research to Human-AI Interaction, learning with diverse modalities and other applications of AI like AI for health. He has engaged in collaborative research with Google Deepmind's multimodal team led by Dr. Quoc V. Le, Dr. Lu Yuan's team at Microsoft Research, Prof. Jiashi Feng's lab at NUS, MIT Media Lab, Panasonic AI Lab, and others. He has also worked at a Silicon Valley autonomous driving company under the guidance of Chief Scientist Dr. Yan Dong Guo (now VP at OPPO). He has published 23 papers in top-tier conferences, secured 8 AI-related patents, and amassed over 800 citations. At Columbia, he has contributed 14 papers to prestigious venues, serving as the first or co-first author on 8 of them, and his work has been acknowledged by organizations such as PaperWeekly, AI2, DARPA and新智元.