What is a good book for math students to learn machine learning in depth?

I am a math master student and have done fundamental math courses like probability theory, measure theory, linear algebra and know a little bit about functional analysis. What is good way for me to learn machine learning in depth?

I have read the classical text Pattern-Recognition and Machine Learning last summer; my impression was that it was very ineffective to read the book chapter by chapter like a mathematical text. The book does not go deep enough for many algorithms and skip too many steps considered too technical by engineers.

Is there a machine learning book that maybe does not cover too many topics, but treat each one in depth and takes advantage of math when necessary? It will be great to be able connect fundamental mathematical objects with machine learning (I am thinking about Lp spaces, hilbert space etc).


Solution 1:

Hmm, that is a very good question!

I see a couple avenues you could proceed:

  1. In general, once you start doing rigorous machine learning, the distinction with modern statistics really starts to vanish. So you could pick up a book from that literature. I would recommend for example, the book by Peter Bühlmann and Sara van de Geer. It essentially only considers one model: The LASSO (as well as slight variations, such as L1-penalized logistic regression), and is a "math" book (lots of definitions, lemmata, theorems, proofs).

  2. Keeping with the idea of only studying one method, you could read the book on boosting by Schapire and Freund. This would give a flavour of rigorous results, but with more of a CS rather than stats perspective. [Caveat: I have not actually read this one, but have heard good things about it.]

  3. The two previous recommendation focus on understanding one specific method (L1 regularization, boosting respectively); if instead you want a unifying framework in terms of a specific mathematical space, then there is a very nice review paper of RKHS (reproducing kernel Hilbert space) machine learning methods in the Annals of Statistics. It's not a textbook, but it seems to perfectly suit your needs.

  4. Larry Wasserman has been teaching an amazing course on Statistical Machine Learning. There he goes through a whole bunch of unrelated methods (what you complained about), but for each one he explains what the main mathematical tools and results are, and proves many of these. The website provides both videos of the lectures, exercises and lecture notes (which I think he is compiling into a textbook). Highly recommended.

Solution 2:

I think, your difficulty arises from being used to developed unified theories (e.g., theory of bounded linear operators on Hilbert spaces), whereas Machine Learning is no such thing. It is, rather, a collection of (classes of) techniques, most based on optimization of some sort.

So, I would start with reading the individual Wikipedia articles on the different techniques and areas of Machine Learning: regression, logistic regression, Principal Component Analysis, Support Vector Machines, Vapnik-Chervonenkis theory, deep learning, and nonlinear dimensionality reduction.

If you want to connect these to fundamental mathematical objects, then there are these articles, but the objects have more to do with differential geometry than with functional analysis:

*) Smale et. al., ''Finding the homology of submanifolds with high confidence'' *) R. Ghrist, “Three examples of applied and computational homology," Nieuw Archief voor Wiskunde 5/9(2).

Solution 3:

With a background in pure math you will surely enjoy these books:

Mohri's Foundations of Machine Learning which is available for free

Shalev-Shwartz's Understanding Machine Learning: From Theory to Algorithms also available for free

Devroye's A Probabilistic Theory of Pattern Recognition

Lattimore's Bandit Algorithms

You will also like the publications of Foundations and Trends in Machine Learning.