Currently, I am an algorithm researcher in the ByteDance Seed LLM team, and my main job responsibilities involve training algorithms and data mining. In 2022, I obtained a Master’s degree in Computer Science from Nanjing University, and in 2019, I received a Bachelor’s degree in Computer Science from Nanjing University.
Research Interests
My main research direction focuses on machine learning and data mining methods with demonstrable theoretical foundations. Meanwhile, I am passionate about efficient deep learning training approaches. Therefore, my main research areas cover the following topics:
- Exploration of training optimization and training methods
- High-quality training data mining
- Data distribution shift on deep learning
- The integration of fundamental theories of machine learning with ultra-large-scale training
- Reasoning ability of LLM/VLM and its generalization
Selected Publications
- Jian-Hui Duan, Wenzhong Li, Derun Zou, Ruichen Li, Sanglu Lu, Federated Learning with Data-Agnostic Distribution Fusion, The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023), Vancouver, Canada, Jun 18-22, 2023.
- Jian-Hui Duan, Wenzhong Li, Sanglu Lu, FedDNA: Federated Learning with Decoupled Normalization-Layer Aggregation for Non-IID Data, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2021), Bilbao, Spain, Sep 13-17, 2021.
- Jian-Hui Duan, Wenzhong Li, Xiao Zhang, Sanglu Lu, Forecasting fine-grained city-scale cellular traffic with sparse crowdsourced measurements, Computer Networks, 39(2461-2475), Volume 214, pp 1-14, Sep 4 2022.
- Wangxiang Ding, Wenzhong Li, Zhijie Zhang, Chen Wan, Jianhui Duan, Sanglu Lu, Time-varying Gaussian Markov Random Fields Learning for Multivariate Time Series Clustering, IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 35, no. 11, Nov 2023.
- Derun Zou, Xusheng Liu, Lintan Sun, Jian-Hui Duan, Ruichen Li, Yeting Xu, Wenzhong Li, Sanglu Lu, FedMC: Federated Reinforcement Learning on the Edge with Meta-Critic Networks, IEEE International Performance, Computing, and Communications Conference (IPCCC’22), Austin, Texas, USA, November 11-13, 2022.
目前为字节跳动大模型团队(Seed)的一名算法研究员,主要工作内容为训练算法与数据挖掘。2022年于南京大学获得计算机工学硕士学位,2019年于南京大学获得计算机理学学士学位。
研究方向
我的主要研究方向为理论基础可论证的机器学习和数据挖掘方法,同时对高效的深度学习训练方式充满热情。因此主要的研究领域为以下课题:
- 训练优化与训练方式探索
- 高质量训练数据挖掘
- 数据分布漂移对深度学习的影响
- 机器学习基础理论与超大规模训练的结合
- 大语言模型推理能力及其泛化性
发表论文
- Jian-Hui Duan, Wenzhong Li, Derun Zou, Ruichen Li, Sanglu Lu, Federated Learning with Data-Agnostic Distribution Fusion, The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023), Vancouver, Canada, Jun 18-22, 2023.
- Jian-Hui Duan, Wenzhong Li, Sanglu Lu, FedDNA: Federated Learning with Decoupled Normalization-Layer Aggregation for Non-IID Data, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2021), Bilbao, Spain, Sep 13-17, 2021.
- Jian-Hui Duan, Wenzhong Li, Xiao Zhang, Sanglu Lu, Forecasting fine-grained city-scale cellular traffic with sparse crowdsourced measurements, Computer Networks, 39(2461-2475), Volume 214, pp 1-14, Sep 4 2022.
- Wangxiang Ding, Wenzhong Li, Zhijie Zhang, Chen Wan, Jianhui Duan, Sanglu Lu, Time-varying Gaussian Markov Random Fields Learning for Multivariate Time Series Clustering, IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 35, no. 11, Nov 2023.
- Derun Zou, Xusheng Liu, Lintan Sun, Jian-Hui Duan, Ruichen Li, Yeting Xu, Wenzhong Li, Sanglu Lu, FedMC: Federated Reinforcement Learning on the Edge with Meta-Critic Networks, IEEE International Performance, Computing, and Communications Conference (IPCCC’22), Austin, Texas, USA, November 11-13, 2022.