Our understanding of modern neural networks lags behind their practical successes. This growing gap poses a challenge to the pace of progress in machine learning because fewer pillars of knowledge are available to designers of models and algorithms. This workshop aims to close this understanding gap. We solicit contributions that view the behavior of deep nets as natural phenomena, to be investigated with methods inspired from the natural sciences like physics, astronomy, and biology. We call for empirical work that isolates phenomena in deep nets, describes them quantitatively, and then replicates or falsifies them.
As a starting point for this effort, we focus on the interplay between data, network architecture, and training algorithms. We seek contributions that identify precise, reproducible phenomena, and studies of current beliefs such as “sharp local minima do not generalize well” or “SGD navigates out of local minima”. Through the workshop, we hope to catalogue quantifiable versions of such statements, and demonstrate whether they occur reliably.
Schedule & Videos
|Video: Morning session|
|8:45 - 9:00||Opening Remarks|
|9:00 - 9:30||Nati Srebro: Optimization’s Untold Gift to Learning: Implicit Regularization|
|9:30 - 9:45||Shengchao Liu: Bad Global Minima Exist and SGD Can Reach Them|
|9:45 - 10:00||Hattie Zhou: Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask|
|10:00 - 10:30||Chiyuan Zhang: Are all layers created equal? - Studies on how neural networks represent functions|
|10:30 - 11:00||Break and Posters|
|Video: Pre-lunch session|
|11:00 - 11:15||Niru Maheswaranathan: Line attractor dynamics in recurrent networks for sentiment classiﬁcation|
|11:15 - 11:30||Karttikeya Mangalam: Do deep neural networks learn shallow learnable examples first?|
|11:30 - 12:00||Crowdsourcing Deep Learning Phenomena|
|12:00 - 1:30||Lunch and Posters|
|Video: Post-lunch session|
|1:30 - 2:00||Aude Oliva: Reverse engineering neuroscience and cognitive science principles|
|2:00 - 2:15||Beidi Chen: Angular Visual Hardness|
|2:15 - 2:30||Lior Wolf: On the Convex Behavior of Deep Neural Networks in Relation to the Layers’ Width|
|2:30 - 3:00||Andrew Saxe: Intriguing phenomena in training and generalization dynamics of deep networks|
|3:00 - 4:00||Break and Posters|
|Video: Afternoon session|
|4:00 - 4:30||Olga Russakovsky: Strategies for mitigating social bias in deep learning systems|
|4:30 - 5:30||Panel Discussion: Kevin Murphy, Nati Srebro, Aude Oliva, Andrew Saxe, Olga Russakovsky. Moderator: Ali Rahimi|
- Line attractor dynamics in recurrent networks for sentiment classiﬁcation
Niru Maheswaranathan, Alex H. Williams, Matthew D. Golub, Surya Ganguli, and David Sussillo
- Do deep neural networks learn shallow learnable examples first?
Karttikeya Mangalam and Vinay Uday Prabhu
- The Difficulty of Training Sparse Neural Networks
Utku Evci, Fabian Pedregosa, Aidan Gomez, and Erich Elsen
- Channel Normalization in Convolutional Neural Network avoids Vanishing Gradients
Zhenwei Dai and Reinhard Heckel
- Batch Normalization is a Cause of Adversarial Vulnerability
Angus Galloway, Anna Golubeva, Thomas Tanay, Medhat Moussa, and Graham W. Taylor
- Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask
Hattie Zhou, Janice Lan, Rosanne Liu, and Jason Yosinski
- A Systematic Framework for Natural Perturbations from Videos
Vaishaal Shankar, Achal Dave, Rebecca Roelofs, Deva Ramanan, Benjamin Recht, and Ludwig Schmidt
- Identity Crisis: Memorization and Generalization Under Extreme Overparameterization
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Michael C. Mozer, and Yoram Singer
- Scaling Characteristics of Sequential Multitask Learning: Networks naturally learn to learn
Guy Davidson and Michael C. Mozer
- Emergence of Implicit Filter Sparsity in Convolutional Neural Networks
Dushyant Mehta, Kwang In Kim, and Christian Theobalt
- In Support of Over-Parametrization in Deep Reinforcement Learning: an Empirical Study
Brady Neal and Ioannis Mitliagkas
- A Modern Take on the Bias-Variance Tradeoff in Neural Networks
Brady Neal, Sarthak Mittal, Aristide Baratin, Vinayak Tantia, Matthew Scicluna, Simon Lacoste-Julien, and Ioannis Mitliagkas
- Memorization in Overparameterized Autoencoders
Adityanarayanan Radhakrishnan, Mikhail Belkin, and Caroline Uhler
- The Effect of Network Depth on the Optimization Landscape
Behrooz Ghorbani, Shankar Krishnan, and Ying Xiao
- Does adversarial training hurt or help generalization?
Aditi Raghunathan, Michael Xie, Fanny Yang, and Percy Liang
- Bad Global Minima Exist and SGD Can Reach Them
Shengchao Liu, Dimitris Papailiopoulos, and Dimitris Achlioptas
- Sparsity Emerges Naturally in Neural Language Models
Naomi Saphra and Adam Lopez
- On the Convex Behavior of Deep Neural Networks in Relation to the Layers’ Width
Etai Littwin and Lior Wolf
- Using effective dimension to analyze feature transformations in deep neural networks
Kavya Ravichandran, Ajay Jain, and Alexander Rakhlin
- On Understanding the Hardness of Samples in Neural Networks
Beidi Chen, Weiyang Liu, Animesh Garg, Zhiding Yu, Bartley Richardson, Anshumali Shrivastava, and Anima Anandkumar
- Are All Layers Created Equal?
Chiyuan Zhang, Samy Bengio, and Yoram Singer
- Invariance-inducing regularization using worst-case transformations suffices to boost accuracy and spatial robustness
Fanny Yang, Zuowen Wang, and Christina Heinze-Deml
- Layer rotation: a surprisingly simple indicator of generalization in deep networks?
Simon Carbonnelle and Christophe De Vleeschouwer
- Predicting the accuracy of neural networks from final and intermediate layers
Chad DeChant, Seungwook Han, and Hod Lipson
- Sensitivity of Deep Convolutional Networks to Gabor Noise
Kenneth T. Co, Luis Muñoz-González, and Emil C. Lupu
Call for Papers
We solicit the following kind of experimental work:
Interesting and unusual behaviour observed in deep nets: Interactions between datasets, architecture, and optimization procedures can give rise to curious behaviors in deep learning. “Interesting” and “unusual” are subjective, but any result that is carefully studied - whether it violates or supports intuition and folklore - is welcome. The phenomenon should be defined in a (mathematically) unambiguous way. For example, if the phenomenon is about “internal covariate shift” or “sharpness of a local minimum”, the submission should present quantifiable candidate definitions of these quantities.
Using sufficiently large and well-known models and datasets: We request that experiments to be run on the largest model and dataset that are appropriate for the phenomenon of interest. We prefer well-known published models such as Resnet-50 or Inception V3 and large, standard datasets such as ImageNet or LibriSpeech. CIFAR-10 is appropriate for phenomena with heavy computational demands, or in a study of the scaling behavior of deep nets as the size of the dataset varies, when CIFAR-10 is paired with at least one other large dataset. We discourage results that have been demonstrated only on MNIST. Toy models (1-hidden layer MLP and the like) on synthetic datasets are appropriate for isolating a phenomenon once it has been observed in a realistic setting.
Reproducible: The phenomenon should be easy to reproduce by the community. We seek phenomena that occur when the experiment is run many times, across diverse models and datasets. We encourage a thorough statistical analysis of the experiments to demonstrate the extent and significance of a phenomenon. We also encourage submissions to release code, and statistical analysis in Jupyter notebook format.
Isolated and analyzed: We prefer small and precisely described phenomena over complex ones with imprecise descriptions. Phenomena should be stated in a quantifiable way. For example, when stating that “residual connections reduce the sensitivity of training procedures to parameter initialization”, the terms “reduce”, “sensitivity”, and “residual connections” should be defined explicitly.
We specifically do not require the phenomenon to be novel. We value instead a formalization of the phenomenon, followed by reliable evidence to support it or a thorough refutation of it. We especially welcome work that carefully characterizes the limits of the phenomenon observed, and show that it only occurs under specific conditions and settings. We do not require an explanation of why a phenomenon might occur, only demonstrations that it does so reliably (or refutations). We hope that the catalogue of phenomena we accumulate will serve as a starting point for a better understanding of deep learning.
Submissions are closed!
Please submit your paper through OpenReview.
The main part of a submission should be at most 4 pages long. These first four pages should contain a definition of the phenomena of interest and the main experimental results. There is no space limit for references, acknowledgements, and details included in appendices.
Papers should be formatted with at least a 10 pt font, standard line spacing, and a 1 inch margin. We do not require a specific formatting style beyond these constraints.
We welcome all unpublished results and also papers that were published in 2018 or later. Submission must be anonymized.
- Submission Deadline (extended): May 24, 2019 (11:59 pm PST)
- Notification: May 31, 2019
- Workshop: June 15th, 2019
- Samy Bengio (Google)
- Kenji Hata (Google)
- Aleksander Mądry (MIT)
- Ari Morcos (Facebook)
- Behnam Neyshabur (NYU)
- Maithra Raghu (Cornell University and Google)
- Ali Rahimi (Google)
- Ludwig Schmidt (UC Berkeley)
- Hanie Sedghi (Google)
- Ying Xiao (Google)
Please email firstname.lastname@example.org with any questions.