Shaoteng Liu

I am a 4th-year PhD student at CUHK, advised by Prof. Jiaya Jia.

I hold a B.Eng. from XJTU and previously worked as a research assistant at the BAIR Lab with Dequan Wang.

My research interests are in LLMs, VLMs, Agents, and AIGC including applications such as image/video generation, editing, and manipulation.

Email  /  Google Scholar  /  Github  /  Twitter

Work Experiences
Adobe Research
Research Scientist Intern, 2024.5-
Advisor: Soo Ye Kim and Zhe Lin
The Chinese University of Hong Kong
PhD Candidate, 2021.7-
Advisor: Jiaya Jia
Berkeley Artificial Intelligence Research (BAIR)
Research Assistant, 2019-2020
Advisor: Dequan Wang
Selected Research Full List
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Yanwei Li*, Yuechen Zhang*, Chengyao Wang*, Zhisheng Zhong, Yixin Chen, Ruihang Chu, Shaoteng Liu, Jiaya Jia

Preprint, 2024.
arXiv / Project Page / 机器之心 / Demo / Model / Data / Code GitHub Repo stars

Mining potential of open-source VLMs! Mini-Gemini is a novel framework ranges from 2B to 34B VLMs for hi-resolution image understanding. It has a impressive OCR capability, and can generate HQ images powered by its multi-modal reasoning ability.

RL-GPT: Integrating Reinforcement Learning and Code-as-policy
Shaoteng Liu, Haoqi Yuan, Minda Hu, Yanwei Li, Yukang Chen, Shu Liu, Zongqing Lu, Jiaya Jia

NeurIPS, 2024. Oral
arXiv / Project Page

The slow agent decomposes the task and determines "which actions" to learn. The fast agent writes code and RL configurations for low-level execution.

Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code
Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, Qiang Xu

ICLR, 2024.
arXiv / Project Page / Video / Data / Code GitHub Repo stars

Rethinking the inversion process. Boosting Diffusion-based Editing with 3 Lines of Code.

Video-P2P: Video Editing with Cross-attention Control
Shaoteng Liu, Yuechen Zhang, Wenbo Li, Zhe Lin, Jiaya Jia

CVPR, 2024. Most Influential CVPR Papers (Paper Digest )
arXiv / Project Page/ Twitter/ Code GitHub Repo stars

Add 'Lego' attribute to the child, an edited video is generated. Powered by a novel video inversion process and cross-attention control. We also find that a Decoupled-Guidance strategy is essential for video editing.

Tent: Fully test-time adaptation by entropy minimization
Dequan Wang*, Evan Shelhamer* Shaoteng Liu, Bruno Olshausen, Trevor Darrell

ICLR, 2021. Spotlight
arxiv/ Code GitHub Repo stars

Tent equips a model to adapt itself to new and different data during testing.

Selected Awards
  • Excellent Teaching Assistantship, CUHK, 2023

  • Hong Kong PhD Fellowship Scheme (HKPFS), 2021

  • Vice-Chancellor’s Scholarship, CUHK, 2021

  • Scientist Scholarship of China (top 1%), 2019

  • Top 10 Undergraduate of XJTU (top 0.1%), 2019

  • National Scholarship of China, 2018

Teaching
engg5104 ENGG5104 | Image Processing and Computer Vision | 2023 Spring
ENGG2780A | Probability for Engineers | 2022 Spring
CSCI1540 | Computer Principles and C++ Programming | 2021 Fall

Last updated: Jun 2024
Web page design credit to Jon Barron and Julian