# C6.5 Theories of Deep Learning (2019-2020)

## Primary tabs

Only elementary linear algebra and probability are assumed in this course; with knowledge from the following prelims courses also helpful: linear algebra, probability, analysis, constructive mathematics, and statistics and data analysis. It is recommended that students have familiarity with some of: more advanced statistics, optimisation (B6.3, C6.2), networks (C5.4), and numerical linear algebra (C6.1), though none of these courses are required as the material is self contained.

### Assessment type:

- Mini Project. Mini-projects will be available for collection from 12noon on Friday of week 7 and the submission deadline will be 12noon on Thursday of week 10.

A course on theories of deep learning.

Students will become familiar with the variety of architectures for deep nets, including the scattering transform and ingredients such as types of nonlinear transforms, pooling, convolutional structure, and how nets are trained. Students will focus their attention on learning a variety of theoretical perspectives on why deep networks perform as observed, with examples such as: dictionary learning and transferability of early layers, energy decay with depth, Lipschitz continuity of the net, how depth overcomes the curse of dimensionality, constructing adversarial examples, geometry of nets viewed through random matrix theory, and learning of invariance.

Deep learning is the dominant method for machines to perform classification tasks at reliability rates exceeding that of humans, as well as outperforming world champions in games such as go. Alongside the proliferating application of these techniques, the practitioners have developed a good understanding of the properties that make these deep nets effective, such as initial layers learning weights similar to those in dictionary learning, while deeper layers instantiate invariance to transforms such as dilation, rotation, and modest diffeomorphisms. There are now a number of theories being developed to give a mathematical theory to accompany these observations; this course will explore these varying perspectives.

- I. Goodfellow, Y. Bengio, and A. Courville,
*Deep Learning*, Adaptive computation and machine learning series.

Additional reading from recent publications will be provided as needed.

*Please note that e-book versions of many books in the reading lists can be found on SOLO and ORLO.*