An Introduction to Statistical Data Science by Giorgio Picci
This graduate textbook on the statistical approach to Data Science describes the basic ideas, scientific principles and common techniques for the extraction of mathematical models from observed data. Aimed at young scientists, and motivated by their scientific prospects, it provides first principle derivations of various algorithms and procedures, thereby supplying a solid background for their future specialization to diverse fields and applications. The beginning of the book presents the basics of statistical science, with an exposition on linear models. This is followed by an analysis of some numerical aspects and various regularization techniques, including LASSO, which are particularly important for large scale problems. Decision problems are studied both from the classical hypothesis testing perspective and, particularly, from a modern support-vector perspective, in the linear and non-linear context alike. Underlying the book is the Bayesian approach and the Bayesian interpretation of various algorithms and procedures. This is the key to principal components analysis and canonical correlation analysis, which are explained in detail. Following a chapter on nonlinear inference, including material on neural networks, the book concludes with a discussion on time series analysis and estimating their dynamic models. Featuring examples and exercises partially motivated by engineering applications, this book is intended for graduate students in applied mathematics and engineering with a general background in Probability Theory and Linear Algebra.