Towards AGI: Foundation Models for Language, Code, and Multimodality

Abstract

In this talk, I will present a summary of my research over the past five years on developing AI foundation models across language, code, and multimodal domains. I will begin by introducing our early work on multilingual pre-trained models, including the Unicoder series for natural language and the CodeBERT/CodeGPT series for programming languages, along with their applications in AI agent systems. Next, I will highlight our recent work on improving the training efficiency of large language models (LLMs), aimed at reducing reliance on expensive GPU resources—an effort recognized as the runner-up for Best Paper at NeurIPS 2024. I will then discuss our broader efforts to enhance the reasoning abilities of LLMs through techniques such as self-criticism, data synthesis, tool-augmented verification, and multimodal chain-of-thought reasoning. Following that, I will introduce our progress in building foundation models for multimodal understanding and generation—particularly in text-to-video and image-to-video generation—through the Unicoder-VL, NUWA, and Step-Video series. I will conclude by sharing my perspectives on promising directions toward artificial general intelligence (AGI) and look forward to an engaging discussion.

Speaker: Dr. Nan DUAN
Date: 16 May 2025 (Friday)
Time: 10:30am – 11:30am
Venue: LAU 6-209
Poster: Click here

Latest Seminar

Biography

Dr. Nan Duan is the Technical Fellow at StepFun, where he leads research on large-scale language-video foundation models. Prior to this, he was a Senior Principal Researcher and Research Manager of the Natural Language Computing Group at Microsoft Research Asia, driving research in natural language processing, code intelligence, multilingual multimodal foundation models, and AI agents. Dr. Duan has authored over 200 research papers in top-tier conferences and journals with more than 25,000 citations (h-index 73) and holds over 20 patents. His work has been recognized with several honors, including the Runner-Up Best Paper Award at NeurIPS 2024 and the Best Demo Award at CVPR 2022. He is an adjunct professor and Ph.D. supervisor at the University of Science and Technology of China and Xi’an Jiaotong University, as well as an adjunct professor at Tianjin University. Dr. Duan received his B.S. and Ph.D. in Computer Science from Tianjin University in 2004 and 2011, respectively. He was named as the CCF-NLPCC Distinguished Young Scientist in 2019 for his contributions to natural language processing (NLP) and was included in the list of DeepTech Intelligent Computing Innovators China in 2023 for his contributions to AI foundation models.