Mathematical introduction to machine learning

At first glance, this is once again a reference request for "How to start machine learning".

However, my mathematical background is relatively strong and I am looking for an introduction to machine learning using mathematics and actually proving things.

Most references are relatively imprecise and use tons of bla bla where simple formulae and only one example would provide the same content. Also proofs are only found in rare instances.

Starting from standard hand-waving literature (e.g. first Amazon results), I discovered Andrew Ng's Coursera course, then Bishop's book on pattern recognition and finally Smola's book on Machine Learning. The latter seems to be the first that suits my expectations. Unfortunately, the book is only in draft state.

Are there other references that provide a similar level of rigor as Smola's book? Potentially with different or additional content?

Maybe I should add a little bit more about my background:

I have a (German) PhD in mathematics (in the field of PDEs). Particularly, I am used to applied analysis, optimal control theory, calculus of variations, some measure and probability theory, numerics and differential geometry. During my diploma, my minor subject was computer science. Hence, somehwere inside my head, I still have some knowledge on algorithms, computational geometry and geometric modelling.

Edit: Would it potentially be better to ask this question at Data Science Stack Exchange? I don't want to spam the board with the same question, but if you think that I have a higher chance to obtain an answer there, I would post the question there. Of course, I would link those questions and answers. Any comment on that?


Solution 1:

My opinion is that it depends on which subarea of machine learning interests you. Unfortunately, at this point, much of the relevant literature (especially for theory) exists only in publications, rather than books. But this question is just about where to start, I suppose.

The more popular, "practically oriented" undergraduate targeting books like Hastie, The Elements of Statistical Learning, or Bishop, Pattern Recognition and Machine Learning are essentially non-mathematical. Books that target the probabilistic model point of view, such as the ones by Murphy and Bach (Machine Learning: A Probabilistic Perspective) and Koller et al (Probabilistic Graphical Models: Principles and Techniques) have a bit more mathematical content, mostly in the area of Bayesian modelling and applied probability (e.g., MCMC, variational inference). I think books in these categories are great introductions to ML, but perhaps not its mathematics.

The most popular book as of writing, Goodfellow et al's Deep Learning, is also non-rigorous and generally mathematically light. However, it does cover more advanced subjects at the end and it is such a comprehensive introduction to the subject that I still recommend it as a starting point.

Classical ML theory is (to a decent extent) concerned with the Probably Approximately Correct (PAC) framework. Two lovely books that focus on basic theory of introductory ML and are mathematically oriented are Shalev-Schwartz and Ben-David, Understanding Machine Learning, and Mohri et al, Foundations of Machine Learning. These are probably good starting points for people interested in starting ML theory, in terms of error bounds, sample complexities, etc... with plenty of theorems.

Specialized books in particular ML topics can be mathematically demanding as well. Schölkopf et al's Learning with Kernels and Rasmussen et al's Gaussian Processes for Machine Learning are, in my opinion, examples of these. There's also the book Information Theory, Inference and Learning Algorithms by Mackay, which covers neural networks from an information theoretic and compression point of view, and Graphical Models, Exponential Families, and Variational Inference by Wainwright and Jordan.

One short-coming of the ML literature, as of writing, is the lack of introductory books helping people access the more mathematically demanding advanced literature (e.g., the game and information theory, and optimal transport concepts used to analyze deep generative models; differential geometry and spectral methods in manifold learning and Riemannian optimization for deep learning). Hopefully one day there will be more expository material to help introduce us to these more mathematically intensive areas.


In my answer to this question, I link to other questions on the same topic, incidentally.

Solution 2:

You may be interested in Kevin P. Murphy's book: http://www.cs.ubc.ca/~murphyk/MLbook/

Good luck :)