# Machine Learning, Dynamical Systems, and All That

## Personal Take

**Deep learning is some kind of optimal control problem** (with the control parameters optimized, for a proper objective, using gradient descent based algorithms and some randomization tricks) **for randomly initialized open dynamical systems** (deep architectures) **interacting with a noisy environment** (large amount of typically noisy data)**, with the hope that the solution found can be applied successfully to new environments** (test data, possibly poor-quality)**.**

## Selected Papers

### Modern ML x Must-Read:

- Learning internal representations by error-propagation (1986)
- A theoretical framework for back-propagation (1988)
- Gradient-based learning applied to document recognition (1998)
- Random features for large-scale kernel machines (2007)
- Learning Deep Architectures for AI (2009)
- Understanding the difficulty of training deep feedforward neural networks (2010)
- ImageNet classification with deep convolutional neural networks (2012)
- Intriguing properties of neural networks (2013)
- Explaining and harnessing adversarial examples (2014)
- Identifying and attacking the saddle point problem in high-dimensional non-convex optimization (2014)
- Neural machine translation by jointly learning to align and translate (2014)
- Adam: A method for stochastic optimization (2014)
- Dropout: A simple way to prevent neural networks from overfitting (2014)
- Deep residual learning for image recognition (2016)
- Generative Adversarial Nets (2014)
- Batch normalization: accelerating deep network training by reducing internal covariate shift (2015)
- Train faster, generalize better: Stability of stochastic gradient descent (2016)
- Attention is all you need (2017)
- On large-batch training for deep learning: generalization gap and sharp minima (2017)
- Towards deep learning models resistant to adversarial attacks (2017)
- Train longer, generalize better: closing the generalization gap in large batch training of neural networks (2018)
- Automatic Differentiation in Machine Learning: a Survey (2018)
- Super-convergence: very fast training of neural networks using large learning rates (2019)
- Reconciling modern machine-learning practice and the classical bias–variance trade-off (2019)
- Fantastic Generalization Measures and Where to Find Them (2019)
- Language models are few-shot learners (2020)
- Hopfield networks is all you need (2020)
- Understanding deep learning requires rethinking generalization (2017, 2021)

### Sequence Modeling:

- A Field Guide to Dynamical Recurrent Networks (2001)
- On the difficulty of training Recurrent Neural Networks (2013)
- Unitary Evolution Recurrent Neural Networks (2016)
- RNNs without Chaos (2017)
- Recent Advances in Recurrent Neural Networks (2018)
- An empirical evaluation of generic convolutional and recurrent networks for sequence modeling (2018)
- Can recurrent neural networks warp time? (2018)
- Legendre memory units: Continuous-time representation in recurrent neural networks (2019)
- Do RNN and LSTM have Long Memory? (2020)
- Long Range Arena: A Benchmark for Efficient Transformers (2020)
- Lipschitz Recurrent Neural Networks (2021)
- UnICORNN: A recurrent model for learning very long time dependencies (2021)
- Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers (2021)
- Autoregressive Denoising Diffusion Models for Multivariate Probabilistic Time Series Forecasting (2021)
- Efficiently Modeling Long Sequences with Structured State Spaces (2022)
- Long Expressive Memory for Sequence Modeling (2022)
- A Neural Programming Language for the Reservoir Computer (2022)
- Efficient Transformers: A Survey (2022)
- Block-Recurrent Transformers (2022)
- Transformer with Fourier Integral Attentions (2022)
- Deep Learning for Time Series Forecasting: Tutorial and Literature Survey (2022)
- Diffusion-based Time Series Imputation and Forecasting with Structured State Space Models (2022)
- Mega: Moving Average Equipped Gated Attention (2022)

### Neural Differential Equations and All That:

- FractalNet: Ultra-deep neural networks without residuals (2016)
- A proposal on machine learning via dynamical systems (2017)
- Stable architectures for deep neural networks (2017)
- The reversible residual network: Backpropagation without storing activations (2017)
- Mean field residual networks: On the edge of chaos (2017)
- Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations (2018)
- Neural Ordinary Differential Equations (2018)
- Dynamical isometry and a mean field theory of RNNs: Gating enables signal propagation in recurrent neural networks (2018)
- ODE-inspired network design for single image super-resolution (2019)
- Invertible Residual Networks (2019)
- Learning differential equations that are easy to solve (2020)
- Score-based generative modeling through stochastic differential equations (2020)
- Continuous-in-Depth Neural Networks (2020)
- Optimizing neural networks via Koopman operator theory (2020)
- Momentum Residual Neural Networks (2021)
- Learning strange attractors with reservoir systems (2021)
- Neural Delay Differential Equations (2021)
- MALI: A memory efficient and reverse accurate integrator for Neural ODEs (2021)
- On Neural Differential Equations (2022)
- LyaNet: A Lyapunov Framework for Training Neural ODEs (2022)
- HyperMixer: An MLP-based Green AI Alternative to Transformers (2022)
- Do Residual Neural Networks discretize Neural Ordinary Differential Equations? (2022)
- Neural Differential Equations for Learning to Program Neural Nets Through Continuous Learning Rules (2022)

### Understanding Modern ML + DS:

- Neural Tangent Kernel: Convergence and generalization in neural networks (2018)
- A mean-field optimal control formulation of deep learning (2018)
- Wide neural networks of any depth evolve as linear models under gradient descent (2019)
- Implicit regularization of discrete gradient dynamics in linear neural networks (2019)
- Stochastic Modified Equations and Dynamics of Stochastic Gradient Algorithms I: Mathematical Foundations (2019)
- Continuous-time models for stochastic optimization algorithms (2019)
- Finite depth and width corrections to the Neural Tangent Kernel (2019)
- High-dimensional dynamics of generalization error in neural networks (2020)
- Stochasticity of deterministic gradient descent: Large learning rate for multiscale objective function (2020)
- Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks (2021)
- The heavy-tail phenomenon in SGD (2021)
- Gradient descent on neural networks typically occurs at the edge of stability (2021)
- SGD in the large: Average-case analysis, asymptotics, and stepsize criticality (2021)
- Scaling properties of deep residual networks (2021)
- The future is log-Gaussian: ResNets and their infinite-depth-and-width limit at initialization (2021)
- The high-dimensional asymptotics of first order methods with random data (2021)
- Interpolation and approximation via Momentum ResNets and Neural ODEs (2021)
- Phase diagram of SGD in high-dimensional two-layer neural networks (2022)
- Continuous-time stochastic gradient descent for optimizing over the stationary distribution of stochastic differential equations (2022)
- Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice (2022)
- On the Theoretical Properties of Noise Correlation in Stochastic Optimization (2022)
- Rigorous dynamical mean field theory for stochastic gradient descent methods(2022)

### Using ML to Study DS:

- PDE-Net: Learning PDEs from data (2017)
- Universal Differential Equations for Scientific Machine Learning (2020)
- Bridging physics-based and data-driven modeling for learning dynamical systems (2021)
- Machine learning prediction of critical transition and system collapse (2021)
- Using machine learning to predict statistical properties of non-stationary dynamical processes: System climate,regime transitions, and the effect of stochasticity (2021)
- Emergence of transient chaos and intermittency in machine learning (2021)
- An end-to-end deep learning approach for extracting stochastic dynamical systems with α-stable Levy noise (2022)
- Scientific Machine Learning through Physics-Informed Neural Networks: Where we are and What’s next (2022)
- When Physics Meets Machine Learning: A Survey of Physics-Informed Machine Learning (2022)
- Auto-SDE: Learning effective reduced dynamics from data-driven stochastic dynamical systems (2022)
- Using Machine Learning to Anticipate Tipping Points and Extrapolate to Post-Tipping Dynamics of Non-Stationary Dynamical Systems (2022)
- Physics-informed neural networks, data-driven discovery of complex systems using neural networks, solving/simulating differential equations using neural networks, integrating ML with physics-based modeling, etc. (too many to list here)

### Model Robustness:

- Towards Deep Learning Models Resistant to Adversarial Attacks (2017)
- Sensitivity and Generalization in Neural Networks: an Empirical Study (2018)
- Adversarially Robust Generalization Requires More Data (2018)
- Large Margin Deep Networks for Classification (2018)
- Mixup: Beyond Empirical Risk Minimization (2017)
- Adversarial Examples Are Not Bugs, They Are Features (2019)
- Robustness May Be at Odds with Accuracy (2019)
- Relating Adversarially Robust Generalization to Flat Minima (2021)
- PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures (2021)
- On the robustness of randomized classifiers to adversarial examples (2021)
- Probabilistic robustness estimates for feed-forward neural networks (2021)
- Recent Advances in Large Margin Learning (2022)
- Robustness Implies Generalization via Data-Dependent Generalization Bounds (2022)
- RobustBench

### Generative Modeling:

- Deep Unsupervised Learning using Nonequilibrium Thermodynamics (2015)
- Pixel Recurrent Neural Networks (2016)
- Denoising Diffusion Probabilistic Models (2020)
- Score-Based Generative Modeling through Stochastic Differential Equations (2020)
- Diffusion Models Beat GANs on Image Synthesis (2021)
- Deep Generative Learning via Schrödinger Bridge (2021)
- Score-based Generative Modeling in Latent Space (2021)
- Deep Generative Modelling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models (2021)
- DiffWave: A Versatile Diffusion Model for Audio Synthesis (2021)
- Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image (2021)
- Video Diffusion Models (2022)
- Flexible Diffusion Modeling of Long Videos (2022)
- A Continuous Time Framework for Discrete Denoising Models (2022)
- High-Resolution Image Synthesis with Latent Diffusion Models (2022)
- Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise (2022)
- Understanding Diffusion Models: A Unified Perspective (2022)
- Diffusion Models in Vision: A Survey (2022)
- Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions (2022)
- Poisson Flow Generative Models (2022)
- Flow Matching for Generative Modeling (2022)
- What’s the Score?
- Deep Equilibrium Approaches to Diffusion Models (2022)
- From Points to Functions: Infinite-dimensional Representations in Diffusion Models (2022)

### Complex Networks x Dynamical Systems:

- Collective dynamics of ‘small-world’ networks (1998)
- Emergence of Scaling in Random Networks (1999)
- Exploring complex networks (2001)
- Complex networks: Structure and dynamics (2005)
- Synchronization in complex networks (2008)
- Recurrence networks—a novel paradigm for nonlinear time series analysis (2010)
- Deep Reservoir Computing (2021)
- Constructing Neural Network-Based Models for Simulating Dynamical Systems (2022)

### Other Noteworthy Papers:

- The Loss Surfaces of Multilayer Networks (2014)
- Visualizing the Loss Landscape of Neural Nets (2017)
- Probabilistic supervised learning (2019)
- A critical analysis of self-supervision, or what we can learn from a single image (2019)
- Expressivity of Deep Neural Networks (2020)
- On Learning Rates and Schrödinger Operators (2020)
- Fourier Neural Operator for Parametric Partial Differential Equations (2020)
- Neural Operator: Learning Maps Between Function Spaces (2021)
- How Data Augmentation affects Optimization for Linear Regression (2021)
- Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms (2021)
- Gradient Descent on Infinitely Wide Neural Networks: Global Convergence and Generalization (2021)
- MLP-Mixer: An all-MLP Architecture for Vision (2021)
- Pre-training without Natural Images (2021)
- Learning to See by Looking at Noise (2021)
- Predicting deep neural network generalization with perturbation response curves (2021)
- The curse of overparametrization in adversarial training: Precise analysis of robust generalization for random features regression (2022)
- Improving generalization via uncertainty driven perturbations (2022)
- Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data (2022)
- Learning from Randomly Initialized Neural Network Features (2022)
- How to Learn when Data Reacts to Your Model: Performative Gradient Descent (2022)
- Kernel Methods and Multi-layer Perceptrons Learn Linear Models in High Dimensions (2022)
- Anticorrelated Noise Injection for Improved Generalization (2022)
- How Do Vision Transformers Work? (2022)
- Patches are all you need? (2022)
- Learning by Directional Gradient Descent (2022)
- Gradients without Backpropagation (2022)
- Continuous-Time Meta-Learning with Forward Mode Differentiation (2022)
- General Cyclical Training of Neural Networks (2022)
- Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer (2022)
- Understanding Gradient Descent on Edge of Stability in Deep Learning (2022)
- Reconstructing Training Data from Trained Neural Networks (2022)
- Learning with Differentiable Algorithms (2022)
- Neural Tangent Kernel: A Survey (2022)
- The Mori-Zwanzig formulation of deep learning (2022)
- Neural Networks are Decision Trees (2022)

## Related Review Papers/Monographs/Textbooks/Lecture Notes

- The Fractal Geometry of Nature (1977)
- Coding The Matrix: Linear Algebra Through Computer Science Applications
- Solving Ordinary Differential Equations I (1993)
- Dynamical Systems and Numerical Analysis (1996)
- Information Theory, Inference, and Learning Algorithms (2003)
- All of Statistics (2004)
- Information, Physics, and Computation (2009)
- The Elements of Statistical Learning (2009)
- Probability and Stochastics (2011)
- Understanding Machine Learning: From Theory to Algorithms (2014)
- Deep Learning (2016)
- Brownian Motion, Martingales, and Stochastic Calculus (2016)
- Stochastic Calculus, Filtering, and Stochastic Control(2017)
- High-Dimensional Probability: An Introduction with Applications in Data Science (2018)
- High-Dimensional Statistics: A Non-Asymptotic Viewpoint (2019)
- Neural Networks and Deep Learning (2019)
- Statistical mechanics of deep learning (2019)
- Machine learning and the physical sciences (2019)
- A Course in Machine Learning by Hal Daumé III
- Dive Into Deep Learning
- Deep Learning (NYU), Spring 2020
- Deep Learning Theory Review: An Optimal Control and Dynamical Systems Perspective (2019)
- Machine Learning, Dynamical Systems and Control (2020)
- Towards a Mathematical Understanding of Neural Network-Based Machine Learning: what we know and what we don’t (2020)
- Theoretical issues in deep networks (2020)
- Mathematics for Machine Learning (2020)
- An Introduction to the Numerical Simulation of Stochastic Differential Equations (2021)
- Patterns, predictions, and actions: A story about machine learning (2021)
- Foundations of Deep Learning (Maryland), Fall 2021
- Dynamical Systems and Machine Learning (2021)
- The Principles of Deep Learning Theory (2021)
- Deep learning: A statistical viewpoint (2021)
- Fit without fear: Remarkable mathematical phenomena of deep learning through the prism of interpolation (2021)
- The Modern Mathematics of Deep Learning (2021)
- Rough Path Theory (ETH), Spring 2021
- Nonlinear Dynamics (Georgia Tech), Spring 2022
- Probabilistic Machine Learning (2022)
- Parallel Computing and Scientific Machine Learning (MIT)
- Ergodic Theory (Lecture Notes)
- Modern applications of machine learning in quantum sciences (2022)
- A high bias low-variance introduction to Machine Learning for physicists
- CS 231N Stanford
- Practical Deep Learning for Coders
- Transformer Circuits Thread
- Parallel Computing and Scientific Computing
- Deep Implicit Layers
- Probabilistic Numerics
- A course in time series analysis

## Softwares/Libraries

## Others (related articles/blogposts/tutorials and random cool stuffs)

- Predicting tipping points in chaotic systems with ML
- Papers with Codes
- Windows on Theory
- Off the Convex Path
- Francis Bach’s ML Research Blog
- How to Differentiate with a Computer
- Almost Sure
- Fabrice Baudoin
- Terrence Tao
- Deep Learning: Our Miraculous Year 1990-1991
- Universality for mathematical and physical systems (2006)
- One model to learn them all (2017)
- Winner’s curse? On pace, progress, and empirical rigor (2018)
- The science of deep learning (2020)
- The Dawning of a New Era in Applied Mathematics (2021)
- AI Accelerators
- AI Summer
- Full Stack Deep Learning
- Machine Learning Systems Design
- Physics-based Deep Learning
- A Recipe for Training Neural Networks
- Implicit Bias in Some Machine Learning Problems
- Sebastian Raschka’s Resources
- ML-University
- Differential Programming with JAX
- Teach Yourself Computer Science
- Software Carpentry
- First Contributions
- Software 2.0 (see also here)
- Lightning.ai
- Stability.ai
- Lil’Log
- Green AI
- Hugging Face
- Deep Learning Drizzle
- Chris Olah
- Sebastian Raschka
- Andrej Karpathy
- Tesla AI
- Google AI
- Web3 University
- Xanadu