Signal propagation and Initialization in Deep Neural Networks

SINGH, DAYAL

DR Home
→
THESES & PROJECT REPORTS
→
MS THESES
→
View Item

dc.contributor.advisor	SREEJITH, G. J.	en_US
dc.contributor.author	SINGH, DAYAL	en_US
dc.date.accessioned	2021-07-08T04:46:41Z
dc.date.available	2021-07-08T04:46:41Z
dc.date.issued	2021-07
dc.identifier.citation	74	en_US
dc.identifier.uri	http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/6040
dc.description	TL;DR: ReLU networks initialized with asymmetric anti-correlated weights learn faster.	en_US
dc.description.abstract	Despite their successful application in various fields of science and technology, deep neural networks remain poorly understood. The ability of overparameterized neural networks to express complex functions (expressivity) is one of the major theoretical questions, among others. Furthermore, the expressivity of a deep neural network at the initialization is a crucial theoretical aspect due to the use of local (gradient-based) algorithms for training, which can be analyzed by considering signal propagation in infinitely wide networks (mean-field limit) as a model. This mean-field analysis suggests that deep neural networks have an ordered and a chaotic phase, and they achieve exponential expressivity as a function of depth in the chaotic phase. However, signals become highly correlated in deep ReLU (Rectified Linear Unit) networks with uncorrelated weights due to the non-existence of a chaotic phase; this suggests that deep ReLU networks have low expressive power. Using the mean-field theory of signal propagation, we analyze the evolution of correlations between signals propagating through a ReLU network with correlated weights. Furthermore, we show that ReLU networks with anti-correlated weights can avoid this low expressivity outcome and have a chaotic phase where the correlations saturate below unity. Consistent with this analysis, we find that networks initialized with anti-correlated weights can train faster by taking advantage of the increased expressivity in the chaotic phase. Combining this with a previously proposed strategy of using an asymmetric initialization to reduce dead node probability (probability of the propagated signals reaching a low sensitivity domain of the ReLU activation function), we propose an initialization scheme that allows faster training and learning than other initialization schemes on various tasks.	en_US
dc.description.sponsorship	INSPIRE-SHE program of Department of Science & Technology, India	en_US
dc.language.iso	en	en_US
dc.subject	Signal propagation	en_US
dc.subject	Deep Neural Networks	en_US
dc.subject	Mean-field theory	en_US
dc.title	Signal propagation and Initialization in Deep Neural Networks	en_US
dc.type	Thesis	en_US
dc.type.degree	BS-MS	en_US
dc.contributor.department	Dept. of Physics	en_US
dc.contributor.registration	20161127	en_US

Files in this item

Name: Dayal_thesis_sign ...

Size: 8.084Mb

Format: PDF

View/Open

This item appears in the following Collection(s)

MS THESES [2219]
Thesis submitted to IISER Pune in partial fulfilment of the requirements for the BS-MS Dual Degree Programme/MSc. Programme/MS-Exit Programme

Show simple item record

Search Repository

Advanced Search

Browse

All of Repository
This Collection
- Titles
- Authors
- By Advisor
- By Issue Date
- Subjects
- By Type
- By Department

Signal propagation and Initialization in Deep Neural Networks

Files in this item

This item appears in the following Collection(s)

Search Repository

Browse

All of Repository

This Collection

My Account