This
book highlights the different types of data architecture and illustrates the
many possibilities hidden behind the term "Big Data", from the usage of No-SQL
databases to the deployment of stream analytics architecture, machine learning,
and governance.
Scalable
Big Data Architecture covers
real-world, concrete industry use cases that leverage complex distributed
applications , which involve web applications, RESTful API, and high throughput
of large amount of data stored in highly scalable No-SQL data stores such as
Couchbase and Elasticsearch. This book demonstrates how data processing can be
done at scale from the usage of NoSQL datastores to the combination of Big Data
distribution.
When
the data processing is too complex and involves different processing topology
like long running jobs, stream processing, multiple data sources correlation,
and machine learning, it’s often necessary to delegate the load to Hadoop or
Spark and use the No-SQLto serve processed data in real time.
This
book shows you how to choose a relevant combination of big data technologies
available within the Hadoop ecosystem. It focuses on processing long jobs,
architecture, stream data patterns, log analysis, and real time analytics. Every
pattern is illustrated with practical examples, which use the different open
sourceprojects such as Logstash, Spark, Kafka, and so on.
Traditional
data infrastructures are built for digesting and rendering data synthesis and
analytics from large amount of data. This book helps you to understand why you
should consider using machine learning algorithms early on in the project,
before being overwhelmed by constraints imposed by dealing with the high
throughput of Big data.
Scalable
Big Data Architecture is for
developers, data architects, and data scientists looking for a better
understanding of how to choose the most relevant pattern for a Big Data project
and which tools tointegrate into that pattern.