学术报告:Two Statistical Results in Deep Learning

Two Statistical Results in Deep Learning

主题
Two Statistical Results in Deep Learning
活动时间
-
活动地址
信息管理学院B312
主讲人
霍晓明 美国佐治亚理工大学
主持人
陆遥 教授

题目:Two Statistical Results in Deep Learning

主讲人:霍晓明 美国佐治亚理工大学

日期:2021123日(星期五)

时间:16:00 - 17:00

地点:信息管理学院B312

主持:陆遥 教授

摘要:This talk has two parts.

  1. Regularization Matters for Generalization of Overparametrized Deep Neural Network under Noisy Observations. In part one, we study the generalization properties of the overparameterized deep neural network (DNN) with ReLUactivations. Under the non-parametric regression framework, it is assumed that the ground-truth functionis from a reproducing kernel Hilbert space (RKHS) induced by a neural tangent kernel (NTK) of ReLUDNN, and a dataset is given with the noises. Without a delicate adoption of early stopping, we prove thattheoverparametrizedDNNtrainedbyvanillagradientdescentdoesnotrecovertheground-truthfunction.It turns out that the estimated DNN’s L2 prediction error is bounded away from 0. As a complement ofthe above result, we show that the L2-regularized gradient descent enables the overparametrized DNNachievetheminimaxoptimalconvergencerateoftheL2prediction error,withoutearlystopping. Notably,therateweobtainedisfasterthantheonethatisknownintheliterature.
  2. Directional Bias Helps SGD to Generalize. We study the Stochastic Gradient Descent (SGD) algorithm in kernel regression. Specifically, SGD with moderate and annealing step size converges along the direction corresponding to the large eigenvalue of the Kernel matrix, on the contrary the Gradient Descent (GD) with a moderate or small step size converges along the direction corresponding to the small eigenvalue. For a general squared risk minimization problem, we show that directional bias towards a large eigenvalue of the Hessian (which is the Kernel matrix in our case) results in an estimator that is closer to the ground truth. Adopt this result to kernel regression, the directional bias helps SGD estimator generalize better. This result gives one way to explain how noise helps in generalization when learning with a nontrivial step size, which may be useful for promoting further understanding of stochastic algorithms in deep learning.

个人介绍:Dr. Huo received the B.S. degree in mathematics from the University of Science and Technology, China, in 1993, and the M.S. degree in electrical engineering and the Ph.D. degree in statistics from Stanford University, Stanford, CA, in 1997 and 1999, respectively. Since August 1999, he has been an Assistant/Associate/Full Professor with the School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta. He represented China in the 30th International Mathematical Olympiad (IMO), which was held in Braunschweig, Germany, in 1989, and received a golden prize. From August 2013 to August 2015, he served the US National Science Foundation as a Program Director in the Division of Mathematical Sciences (DMS).

Dr. Huo has presented keynote talks in major conferences (including, The 2nd IEEE Global Conference on Signal and Information Processing, Atlanta, GA, and the IMA-HK-IAS Joint program on statistics and computational interfaces to big data, The Hong Kong University of Science and Technology, Hong Kong, etc.) and numerous invited colloquia and seminar presentations in the US, Asia, and Europe. He is the Specialty Chief Editor in Frontiers in Applied Mathematics and Statistics - Statistics, April 2021 – present. 

Huo is now the Executive Director of TRIAD (Transdisciplinary Research Institute for Advancing Data Science), http://triad.gatech.edu, an NSF funded research center located at Georgia Tech. Dr. Huo is an Associate Director in the program of the Master of Science in Analytics -- https://analytics.gatech.edu/ -- being in charge of creating a new branch in the Shenzhen-China campus of Georgia Institute of Technology. Dr. Huo is the Associate Director for Research of Institute for Data Engineering and Science (https://research.gatech.edu/data).