Probability
General introductions
-
Pitman, Probability (1e)
Considered by many to be the best introduction to probability.
-
Ross, A First Course in Probability (6e)
Lots of great problems.
-
Morin. Probability: For the Enthusiastic Beginner (At Createspace, At Amazon)
An excellent-looking non-calculus introduction to probability. As a non-calculus approach, it focuses on discrete distributions, but it discusses the Gaussian distribution from the perspective of discrete approximation. (I think this is a pretty useful way to do it.) The author’s website has some sample chapters (including TOC) that you can view: http://www.people.fas.harvard.edu/~djmorin/book.html
-
Ash. The Probability Tutoring Book: An Intuitive Course for Engineers and Scientists (Revised paperback, Original hardcover)
-
Feller, An Introduction to Probability Theory and its Applications (Vol I: 3e, 3e intl; Vol II: 2e, 2e intl)
The most classic entry in this section. Many still consider it to be the best. Vol I is introductory (though maybe it would go down smoother after another book in this list), while Vol II is considered grad-level as it involves measure theory.
-
Pishro-Nik. Introduction to Probability, Statistics, and Random Processes (1e, 1e solns)
-
Blitzstein and Hwang. Introduction to Probability (1e)
-
Bertsekas and Tsitsiklis. Introduction to Probability (2e, 1e) - Goes with MIT OCW course 6.041/6.431 “Probabilistic Systems Analysis and Applied Probability”.
Further
- Ross, Introduction to Probability Models (9e)
Stochastic processes (without measure theory)
- Ross, Stochastic Processes (2e, 2e intl, 2e intl @AbeBooks, 1e @AbeBooks)
- Karlin and Taylor, A First Course in Stochastic Processes (2e) (used price fluctuates a lot but can sometimes be in the $15-30 range)
- Karlin and Taylor, A Second Course in Stochastic Processes (2e) (used price fluctuates a lot but can sometimes be in the $20-30 range)
- Karlin and Taylor, An Introduction to Stochastic Modeling (3e preferred: 3e)
- Gallager. Stochastic Processes: Theory for Applications (1e) - Goes with MIT OCW courses 6.262 “Discrete Stochastic Processes” and 6.432 “Stochastic Processes, Detection, and Estimation”.
Applied
- Trivedi. Probability and Statistics with Reliability, Queueing, and Computer Science Applications (2e)
- Mitzenmacher & Upfal, Probability and Computing: Randomized Algorithms and Probabilistic Analysis (1e)
- Ross, Applied Probability Models with Optimization Applications (Dover)
With measure theory
Modern probability theory builds its mathematical foundation on measure theory, which is generally regarded as an intermediate-to-advanced topic in real analysis. Some of these books assume exposure to it, others aim to teach it as they go.
-
Rosenthal, A First Look at Rigorous Probability Theory (2e)
This is, in particular, an introduction to measure theory. It assumes a background in undergraduate-level probability (e.g. Ross or Feller) and analysis (e.g. Rudin’s Principles of Mathematical Analysis).
-
Schilling, Measures, Integrals and Martingales (1e)
Another book that teaches measure theory in the context of probability, assuming undergraduate-level probability and analysis.
-
Dudley, Real Analysis and Probability (2e)
A well-regarded introduction to measure theory from a probability perspective. From the introduction: “The first half of the book gives an exposition of real analysis: basic set theory, general topology, measure theory, integration, an introduction to functional analysis in Banach and Hilbert spaces, convex sets and functions, and measure on topological spaces. The second half introduces probability based on measure theory, including laws of large numbers, ergodic theorems, the central limit theorem, conditional expectations, and martingale convergence. A chapter on stochastic processes introduces Brownian motion and the Brownian bridge.”
-
Williams, Probability with Martingales (1e)
Very popular book at the lower end of measure-theoretic probability.
-
Ross and Peköz, A Second Course in Probability (1e) - Introduces measure theory. (Seems not very popular.)
-
Shiryaev, Probability (2e, 3e Vol I)
Considered one of the best textbooks for graduate students coming to grips with rigorous probability theory. The third edition splits the book into two volumes.
-
Chung, A Course in Probability Theory (3e, 2e)
Another of the best rigorous probability textbooks.
-
Billingsley, Probability and measure (3e preferred)
Classic, very popular graduate text on measure-theoretic probability.
-
Durrett, Probability: Theory and Examples (4e)
Another very standard graduate text on measure-theoretic probability. This seems to be one of those books that a lot of people don’t like, but it’s so important that they have to read it anyway.
-
Kallenberg, Foundations of Modern Probability (2e)
Encyclopedic reference to probability theory at the advanced level.
Statistics
-
Freedman, Pisani and Purves, Statistics (4e, 3e, 4e intl)
Conceptual introduction to statistics with minimal math. Widely viewed as the best introduction to how to think about statistics.
-
Diez, Barr, Çetinkaya-Rundel. OpenIntro Statistics (f)
Get it free online (or order it) here: https://www.openintro.org/stat/textbook.php?stat_book=os
There are also a couple of other editions that de-emphasize math in order to teach students who have less background. (You can find them as well via the link above.)
-
OpenStax, Introductory Statistics (f)
-
DeGroot and Schervish, Probability and Statistics (2e, 3e, 4e, 4e intl at AbeBooks)
A popular introduction to mathematical statistics.
-
McElreath. Statistical Rethinking: A Bayesian Course with Examples in R and Stan (1e)
An overview of the philosophical and practical aspects of statistics from a modern beyesian perspective. “The principle audience is researchers in the natural and social sciences, whether new PhD students or seasoned professionals, who have had a basic course on regression but nevertheless remain uneasy about statistical modeling.”
-
Cassela and Berger, Statistical Inference (2e, 2e intl)
A standard grad-level introduction to mathematical/theoretical statistics.
-
Schervish, Theory of Statistics (1e)
More advanced and complete book on theoretical statistics.
-
Bickel and Doksum, Mathematical Statistics: Basic Ideas and Selected Topics (1e 1977 at AbeBooks; Vol I: 2e CRC 2015, 2e PH 2006 updated PB; Vol II: 2e CRC 2015, 2e PH 2006; Set 2e CRC 2015)
Standard grad-level text on mathematical statistics.
-
Gelman et al, Bayesian Data Analysis (3e, 2e)
Good first book on Bayesian analysis.
-
Gelman and Hill, Data Analysis Using Regression and Multilevel/Hierarchical Models (1e)
Good first book on multilevel/hierarchical models
-
Fleiss, The Design and Analysis of Clinical Experiments (x)
Standard book on clinical study design.
-
Mandel, The Statistical Analysis of Experimental Data (c)
-
Robert, The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation (2e)
-
Erich Lehmann books
These are classic standards, but somewhat old now (and, I think, out of print).
Biostats
- Wassertheil-Smoller. Biostatistics and Epidemiology: A Primer for Health and Biomedical Professionals (3e)
- Glantz. Primer of Biostatistics (6e)
Machine Learning
AKA statistical learning, data mining, predictive modeling.
The free books
-
James, Witten, Hastie and Tibshirani, 2013. An Introduction to Statistical Learning: with Applications in R
AKA ISL or ISLR. Probably the most popular introduction to maching learning. http://www-bcf.usc.edu/~gareth/ISL/
-
Hastie, Tibshirani and Friedman, 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction
AKA ESL. The standard textbook for serious machine learning courses. http://statweb.stanford.edu/~tibs/ElemStatLearn/
-
Goodfellow, Bengio and Courville, forthcoming/online. Deep Learning
Due for publication in 2016 (or 2017?). This book isn’t even out in paper form yet (MIT Press will publish it soon), but you can read it for free online here: http://www.deeplearningbook.org/
-
Barber, 2012. Bayesian Reasoning and Machine Learning
AKA BRML. A popular machine learning textbook from a Bayesian viewpoint. http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=Brml.HomePage
-
MacKay, 2003. Information Theory, Inference, and Learning Algorithms
An older, but respected, introduction to ML from an information theory viewpoint. http://www.inference.phy.cam.ac.uk/itila/p0.html
-
Boyd and Vandenberghe, Convex Optimization
While this book is not exactly about machine learning, many (most?) ML techniques rely on the optimization techniques covered here. The book’s web page also links to a free online course.
Other big books
-
Kuhn and Johnson, 2013. Applied Predictive Modeling (1e)
This is a guide to machine learning at the level of detail necessary to implement techniques in R. Much attention is paid to how to make each method perform well. The body of each chapter is a description of the techniques involved, then at the end of the chapter is a “Computing” section which describes how to do what you just learned in R. The author’s approach is to tell you just as much as you need to know to use the techniques, then point you to primary literature where you can read the details.
-
Reed and Marks, 1999. Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks (1e)
This one is old, not particularly in-depth and only covers a limited subset of NN techniques, but it remains one of the better introductions to the topic of neural networks. It’s also relatively short and affordable.
-
Murphy, 2012. Machine Learning: a Probabilistic Perspective (1e)
-
Izenman, 2008. Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning (1e)
This book is statistics-oriented, but it largely covers ML techniques.
-
Bishop, 2006. Pattern Recognition and Machine Learning (1e)
AKA PRML. Bayesian viewpoint. This book used to be very influential but it’s getting a bit dated, and I get the impression that it’s generally regarded as not the best-written ML book around. http://research.microsoft.com/en-us/um/people/cmbishop/PRML/index.htm
-
Bishop, 1996. Neural Networks for Pattern Recognition (1e)
Old but still relevant (because there aren’t a lot of in-depth books about neural networks).
-
Abu-Mostafa, Magdon-Ismail, Lin, 2012. Learning From Data (1e)
Now out of print? Goes with an online course from Caltech: https://work.caltech.edu/telecourse.html
-
Mohri, Rostamizadeh, Talwalkar, 2012. Foundations of Machine Learning (1e)
-
Koller, Friedman, 2009. Probabilistic Graphical Models: Principles and Techniques (1e)
This is the reigning book on PGMs, but it demands more mathematical background (e.g. abstract algebra) than a lot of the other books listed here. It’s also a very physically imposing volume (1280 pages).
-
Korb and Nicholson, Bayesian Artificial Intelligence (2e)
Bayesian network techniques.
Natural language processing
-
Jurafsky and Martin, 2008. Speech and Language Processing (2e)
The main book on NLP. A 3rd edition is in progress, and the draft can be seen here: https://web.stanford.edu/~jurafsky/slp3/
-
Manning and Schütze, 1999. Foundations of Statistical Natural Language Processing (1e)
Older, but still the other big book in this field.
Information retrieval
-
Manning, Raghavan, Schütze, 2008. Introduction to Information Retrieval (1e)
Free on the web: http://nlp.stanford.edu/IR-book/
-
Büttcher, Clarke, Cormack, 2010. Information Retrieval: Implementing and Evaluating Search Engines (1e)