Jian-Hui Duan

却顾所来径,苍苍横翠微

LLM Researcher | ByteDance Seed LLM | djhbarca[at]163.com

🌱💼: We are currently recruiting interns for LLM research at Seed. If you are interested in contributing to this field, please reach out to me directly.

Currently, I am an algorithm researcher in ByteDance Seed LLM team, and my main job responsibilities involve training algorithms and data mining.

Education

  • Master of Computer Science @ Nanjing University (2019–2022)
  • Bachelor of Computer Science @ Nanjing University (2015–2019)

Research Interests

My work centers on machine learning and data mining methods with demonstrable theoretical foundations. Key research directions include:

  • 🔬 Training optimization algorithms and scalable training methodologies
  • 🕵️ High-quality training data mining & curation
  • ⚖️ Data distribution shift mitigation in deep learning
  • 🧠 Theoretical foundations of ultra-large-scale model training
  • 🤖 Understanding paradigm and generalization of LLM/VLM

Selected Publications [Google Scholar]

  • Seed VLM&LLM Team. Seed1.5-VL Technical Report. arXiv:2505.07062. May. 2025.
  • Seed LLM Team. Seed-Thinking-v1.5: Advancing Superb Reasoning Models with Reinforcement Learning. arXiv:2504.13914. Apr. 2025.
  • Haoran Zong, Xiao Zhang, Ruichen Li, Jian-Hui Duan, Derun Zou, Wenzhong Li. Convergence Guaranteed Federated Learning through Gradient Trajectory Smoothing with Triple-Objective Decomposition., ACM Transactions on Knowledge Discovery from Data, DOI: 10.1145/3743142, Jun 2025.
  • Jian-Hui Duan, Wenzhong Li, Derun Zou, Ruichen Li, Sanglu Lu, Federated Learning with Data-Agnostic Distribution Fusion, The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023), Vancouver, Canada, Jun 18-22, 2023.
  • Jian-Hui Duan, Wenzhong Li, Sanglu Lu, FedDNA: Federated Learning with Decoupled Normalization-Layer Aggregation for Non-IID Data, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2021), Bilbao, Spain, Sep 13-17, 2021.
  • Jian-Hui Duan, Wenzhong Li, Xiao Zhang, Sanglu Lu, Forecasting fine-grained city-scale cellular traffic with sparse crowdsourced measurements, Computer Networks, 39(2461-2475), Volume 214, pp 1-14, Sep 4 2022.
  • Wangxiang Ding, Wenzhong Li, Zhijie Zhang, Chen Wan, Jian-Hui Duan, Sanglu Lu, Time-varying Gaussian Markov Random Fields Learning for Multivariate Time Series Clustering, IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 35, no. 11, Nov 2023.
  • Derun Zou, Xusheng Liu, Lintan Sun, Jian-Hui Duan, Ruichen Li, Yeting Xu, Wenzhong Li, Sanglu Lu, FedMC: Federated Reinforcement Learning on the Edge with Meta-Critic Networks, IEEE International Performance, Computing, and Communications Conference (IPCCC’22), Austin, Texas, USA, November 11-13, 2022.

Contact

大模型算法研究员 | ByteDance Seed LLM | | djhbarca[at]163.com

目前为字节跳动大模型团队(Seed)的一名算法研究员,主要工作内容为训练算法与数据挖掘。

教育经历

  • 工学硕士 (计算机科学与技术@南京大学 ) (2019–2022)
  • 理学学士 (计算机科学与技术@南京大学 ) (2015–2019)

研究方向

我的主要研究方向为理论基础可论证的机器学习和数据挖掘方法,同时对高效的深度学习训练方式充满热情。主要的研究领域为以下课题:

  • 训练优化与训练方式探索
  • 高质量训练数据挖掘
  • 数据分布漂移对深度学习的影响
  • 机器学习基础理论与超大规模训练的结合
  • 大语言模型理解性范式与泛化性研究

发表论文 [Google Scholar]

  • Seed VLM&LLM Team. Seed1.5-VL Technical Report. arXiv:2505.07062. May. 2025.
  • Seed LLM Team. Seed-Thinking-v1.5: Advancing Superb Reasoning Models with Reinforcement Learning. arXiv:2504.13914. Apr. 2025.
  • Haoran Zong, Xiao Zhang, Ruichen Li, Jian-Hui Duan, Derun Zou, Wenzhong Li. Convergence Guaranteed Federated Learning through Gradient Trajectory Smoothing with Triple-Objective Decomposition., ACM Transactions on Knowledge Discovery from Data, DOI: 10.1145/3743142, Jun 2025.
  • Jian-Hui Duan, Wenzhong Li, Derun Zou, Ruichen Li, Sanglu Lu, Federated Learning with Data-Agnostic Distribution Fusion, The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023), Vancouver, Canada, Jun 18-22, 2023.
  • Jian-Hui Duan, Wenzhong Li, Sanglu Lu, FedDNA: Federated Learning with Decoupled Normalization-Layer Aggregation for Non-IID Data, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2021), Bilbao, Spain, Sep 13-17, 2021.
  • Jian-Hui Duan, Wenzhong Li, Xiao Zhang, Sanglu Lu, Forecasting fine-grained city-scale cellular traffic with sparse crowdsourced measurements, Computer Networks, 39(2461-2475), Volume 214, pp 1-14, Sep 4 2022.
  • Wangxiang Ding, Wenzhong Li, Zhijie Zhang, Chen Wan, Jian-Hui Duan, Sanglu Lu, Time-varying Gaussian Markov Random Fields Learning for Multivariate Time Series Clustering, IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 35, no. 11, Nov 2023.
  • Derun Zou, Xusheng Liu, Lintan Sun, Jian-Hui Duan, Ruichen Li, Yeting Xu, Wenzhong Li, Sanglu Lu, FedMC: Federated Reinforcement Learning on the Edge with Meta-Critic Networks, IEEE International Performance, Computing, and Communications Conference (IPCCC’22), Austin, Texas, USA, November 11-13, 2022.

联系方式