Hao Zhou - Homepage

Hao Zhou 周浩

Researcher, Ph.D.

Monetization GenAI, ByteDance Inc.

Email: zhouh156 (at) mail.ustc.edu.cn

About Me

I am currently a researcher in Bytedance and focus on developing generative AI in the ads tech and creative industry. I obtained my Ph.D. degree in University of Science and Technology of China (USTC) in 2022. My supervisors are Prof. Wengang Zhou and Prof. Houqiang Li. Prior to that, I received my B.S. degree from Xidian University (XDU) in 2017.

My research interests are in computer vision, and I am currently working on video understanding, generation and editing for creative ads.

Publications

*Equal contribution, †Corresponding author

Selected Publications

	JoVA: Unified Multimodal Learning for Joint Video-Audio Generation Xiaohu Huang, Hao Zhou, Qiangpeng Yang, Shilei Wen, Kai Han arXiv preprint, 2025 pdf project code
	PruneVid: Visual Token Pruning for Efficient Video Large Language Models Xiaohu Huang, Hao Zhou, Kai Han The Annual Meeting of the Association for Computational Linguistics (ACL), Findings, 2025 pdf code
	StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond Pengyuan Lyu, Yulin Li, Hao Zhou, Weihong Ma, Xingyu Wan, Qunyi Xie, Liang Wu, Chengquan Zhang, Kun Yao, Errui Ding, Jingdong Wang arXiv preprint, 2024 pdf
	Improving Sign Language Translation with Monolingual Data by Sign Back-Translation Hao Zhou, Wengang Zhou, Weizhen Qi, Junfu Pu, Houqiang Li IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021 pdf

Other Publications

	Retrieval-Augmented Sign Language Translation Huijie Yao, Wengang Zhou, Hao Zhou, Hezhen Hu, Houqiang Li ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 2026 doi dblp
	Change3D: Revisiting Change Detection and Captioning from A Video Modeling Perspective Duowang Zhu, Xiaohu Huang, Haiyan Huang, Hao Zhou, Zhenfeng Shao IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Highlight, 2025 pdf code
	Skim then Focus: Integrating Contextual and Fine-grained Views for Repetitive Action Counting Zhengqi Zhao, Xiaohu Huang, Hao Zhou†, Kun Yao, Errui Ding, Jingdong Wang, Xinggang Wang, Wenyu Liu, Bin Feng† International Journal of Computer Vision (IJCV), 2025 pdf
	Semi-Supervised Spoken Language Glossification Huijie Yao, Wengang Zhou, Hao Zhou, Houqiang Li The Annual Meeting of the Association for Computational Linguistics (ACL), 2024 pdf code
	FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action Recognition Xiaohu Huang, Hao Zhou, Kun Yao, Kai Han International Conference on Learning Representations (ICLR), 2024 pdf code
	HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception Junkun Yuan, Xinyu Zhang, Hao Zhou, Jian Wang, Zhongwei Qiu, Zhiyin Shao, Shaofeng Zhang, Sifan Long, Kun Kuang, Kun Yao, Junyu Han, Errui Ding, Lanfen Lin, Fei Wu and Jingdong Wang Conference on Neural Information Processing Systems (NeurIPS), 2023 pdf code
	Sign Language Translation with Iterative Prototype Huijie Yao, Wengang Zhou, Hao Feng, Hezhen Hu, Hao Zhou, Houqiang Li International Conference on Computer Vision (ICCV), 2023 pdf
	Graph Contrastive Learning for Skeleton-based Action Recognition Xiaohu Huang, Hao Zhou†, Jian Wang, Haocheng Feng, Junyu Han, Errui Ding, Jingdong Wang, Xinggang Wang, Wenyu Liu, Bin Feng International Conference on Learning Representations (ICLR), 2023 pdf code
	Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition and Translation Hao Zhou, Wengang Zhou, Yun Zhou, Houqiang Li IEEE Transactions on Multimedia, 2021 pdf
	Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition Hao Zhou, Wengang Zhou, Yun Zhou, Houqiang Li AAAI Conference on Artificial Intelligence (AAAI), Oral, 2020 pdf
	Dynamic Pseudo Label Decoding for Continuous Sign Language Recognition Hao Zhou, Wengang Zhou, Houqiang Li IEEE International Conference on Multimedia and Expo (ICME), 2019 pdf