Shaoteng Liu

I am a 4th-year PhD student at CUHK, advised by Prof. Jiaya Jia.

I hold a B.Eng. from XJTU and previously worked as a research assistant at the BAIR Lab with Dequan Wang.

My research interests are in LLMs, VLMs, Agents, and AIGC including applications such as image/video generation, editing, and manipulation.

Email / Google Scholar / Github / Twitter

Work Experiences

	Adobe Research Research Scientist Intern, 2024.5- Advisor: Soo Ye Kim and Zhe Lin
	The Chinese University of Hong Kong PhD Candidate, 2021.7- Advisor: Jiaya Jia
	Berkeley Artificial Intelligence Research (BAIR) Research Assistant, 2019-2020 Advisor: Dequan Wang

Selected Research Full List

	Generative Video Propagation Shaoteng Liu, Tianyu Wang, Jui-Hsien Wang, Qing Liu, Zhifei Zhang, Joon-Young Lee, Yijun Li, Bei Yu, Zhe Lin, Soo Ye Kim, Jiaya Jia CVPR, 2025. arXiv / Project Page / Video / Data / Twitter / Adobe Firefly / 机器之心 We demonstrate that through a careful design of a generative video propagation framework, various video tasks can be addressed in a unified way by leveraging the generative power of such models.
	Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Yanwei Li, Yuechen Zhang, Chengyao Wang, Zhisheng Zhong, Yixin Chen, Ruihang Chu, Shaoteng Liu, Jiaya Jia Preprint*, 2024. arXiv / Project Page / 机器之心 / Demo / Model / Data / Code Mining potential of open-source VLMs! Mini-Gemini is a novel framework ranges from 2B to 34B VLMs for hi-resolution image understanding. It has a impressive OCR capability, and can generate HQ images powered by its multi-modal reasoning ability.
	RL-GPT: Integrating Reinforcement Learning and Code-as-policy Shaoteng Liu, Haoqi Yuan, Minda Hu, Yanwei Li, Yukang Chen, Shu Liu, Zongqing Lu, Jiaya Jia NeurIPS, 2024. Oral arXiv / Project Page The slow agent decomposes the task and determines "which actions" to learn. The fast agent writes code and RL configurations for low-level execution.
	Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, Qiang Xu ICLR, 2024. arXiv / Project Page / Video / Data / Code Rethinking the inversion process. Boosting Diffusion-based Editing with 3 Lines of Code.
	Video-P2P: Video Editing with Cross-attention Control Shaoteng Liu, Yuechen Zhang, Wenbo Li, Zhe Lin, Jiaya Jia CVPR, 2024. Most Influential CVPR Papers (Paper Digest ) arXiv / Project Page/ Twitter/ Code Add 'Lego' attribute to the child, an edited video is generated. Powered by a novel video inversion process and cross-attention control. We also find that a Decoupled-Guidance strategy is essential for video editing.
	Tent: Fully test-time adaptation by entropy minimization Dequan Wang, Evan Shelhamer Shaoteng Liu, Bruno Olshausen, Trevor Darrell ICLR, 2021. Spotlight arxiv/ Code Tent equips a model to adapt itself to new and different data during testing.