With the rise in popularity of distributed systems like Hadoop, more and more people are working in big data processing. A growing number of companies want to build dataflow systems, which can churn huge amounts of data to gain insights for their business. Since Hadoop was a first generation, open source distributed system, there is a need for a next generation distributed system to take data processing to next level. Apache Spark is the next step in that direction. Spark brings a great flexibility and compositional system to the big data world by revolutionizing the field itself. In this book, the author takes a deep dive into Spark and the big data ecosystem. The author discusses and illustrates how different concepts of Spark are brought together in order to solve complex issues with a data flow system. The reader will acquire an understanding of the Next generation of distribution systems, Apache Spark architecture and abstraction, and the Spark ecosystem including Spark QL, GraphX and MLlib.