This text provides deep and comprehensive coverage of the mathematical background for data science, including machine learning, optimal recovery, compressed sensing, optimization, and neural networks. In the past few decades, heuristic methods adopted by big tech companies have complemented existing scientific disciplines to form the new field of Data Science. This text embarks the readers on an engaging itinerary through the theory supporting the field. Altogether, twenty-seven lecture-length chapters with exercises provide all the details necessary for a solid understanding of key topics in data science. While the book covers standard material on machine learning and optimization, it also includes distinctive presentations of topics such as reproducing kernel Hilbert spaces, spectral clustering, optimal recovery, compressed sensing, group testing, and applications of semidefinite programming. Students and data scientists with less mathematical background will appreciate the appendices that provide more background on some of the more abstract concepts.