Shaoteng Liu
I am a 4th-year PhD student at CUHK, advised by Prof. Jiaya Jia.
I hold a B.Eng. from XJTU and previously worked as a research assistant at the BAIR Lab with Dequan Wang.
My research interests are in LLMs, VLMs, Agents, and AIGC including applications such as image/video generation, editing, and manipulation.
Email  / 
Google Scholar  / 
Github  / 
Twitter
|
|
|
Adobe Research
Research Scientist Intern, 2024.5-
Advisor: Soo Ye Kim and Zhe Lin
|
|
The Chinese University of Hong Kong
PhD Candidate, 2021.7-
Advisor: Jiaya Jia
|
|
Berkeley Artificial Intelligence Research (BAIR)
Research Assistant, 2019-2020
Advisor: Dequan Wang
|
|
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Yanwei Li*, Yuechen Zhang*, Chengyao Wang*, Zhisheng Zhong, Yixin Chen, Ruihang Chu, Shaoteng Liu, Jiaya Jia
Preprint, 2024.
arXiv /
Project Page /
机器之心 /
Demo /
Model /
Data /
Code
Mining potential of open-source VLMs! Mini-Gemini is a novel framework ranges from 2B to 34B VLMs for hi-resolution image understanding. It has a impressive OCR capability, and can generate HQ images powered by its multi-modal reasoning ability.
|
|
RL-GPT: Integrating Reinforcement Learning and Code-as-policy
Shaoteng Liu, Haoqi Yuan, Minda Hu, Yanwei Li, Yukang Chen, Shu Liu, Zongqing Lu, Jiaya Jia
NeurIPS, 2024. Oral
arXiv /
Project Page
The slow agent decomposes the task and determines "which actions" to learn. The fast agent writes code and RL configurations for low-level execution.
|
|
Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code
Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, Qiang Xu
ICLR, 2024.
arXiv /
Project Page /
Video /
Data /
Code
Rethinking the inversion process. Boosting Diffusion-based Editing with 3 Lines of Code.
|
|
Video-P2P: Video Editing with Cross-attention Control
Shaoteng Liu, Yuechen Zhang, Wenbo Li,
Zhe Lin, Jiaya Jia
CVPR, 2024. Most Influential CVPR Papers (Paper Digest )
arXiv /
Project Page/
Twitter/
Code
Add 'Lego' attribute to the child, an edited video is generated. Powered by a novel video inversion process and cross-attention control. We also find that a Decoupled-Guidance strategy is essential for video editing.
|
|
Tent: Fully test-time adaptation by entropy minimization
Dequan Wang*, Evan Shelhamer*
Shaoteng Liu,
Bruno Olshausen, Trevor Darrell
ICLR, 2021. Spotlight
arxiv/
Code
Tent equips a model to adapt itself to new and different data during testing.
|
Excellent Teaching Assistantship, CUHK, 2023
Hong Kong PhD Fellowship Scheme (HKPFS), 2021
Vice-Chancellor’s Scholarship, CUHK, 2021
Scientist Scholarship of China (top 1%), 2019
Top 10 Undergraduate of XJTU (top 0.1%), 2019
National Scholarship of China, 2018
|
ENGG5104 | Image Processing and Computer Vision | 2023 Spring
ENGG2780A | Probability for Engineers | 2022 Spring
CSCI1540 | Computer Principles and C++ Programming | 2021 Fall
|
|