Learning Apache Spark 2

Learn about the fastest-growing open source project in the world, and find out how it revolutionizes big data analytics
Preview in Mapt

Learning Apache Spark 2

Muhammad Asif Abbasi

1 customer reviews
Learn about the fastest-growing open source project in the world, and find out how it revolutionizes big data analytics
Mapt Subscription
FREE
$29.99/m after trial
eBook
$25.20
RRP $35.99
Save 29%
Print + eBook
$44.99
RRP $44.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$25.20
$44.99
$29.99p/m after trial
RRP $35.99
RRP $44.99
Subscription
eBook
Print + eBook
Start 30 Day Trial

Frequently bought together


Learning Apache Spark 2 Book Cover
Learning Apache Spark 2
$ 35.99
$ 25.20
Apache Spark 2.x Machine Learning Cookbook Book Cover
Apache Spark 2.x Machine Learning Cookbook
$ 39.99
$ 28.00
Buy 2 for $35.00
Save $40.98
Add to Cart
Subscribe and access every Packt eBook & Video.
 
  • 5,000+ eBooks & Videos
  • 50+ New titles a month
  • 1 Free eBook/Video to keep every month
Start Free Trial
 

Book Details

ISBN 139781785885136
Paperback356 pages

Book Description

Spark juggernaut keeps on rolling and getting more and more momentum each day. Spark provides key capabilities in the form of Spark SQL, Spark Streaming, Spark ML and Graph X all accessible via Java, Scala, Python and R. Deploying the key capabilities is crucial whether it is on a Standalone framework or as a part of existing Hadoop installation and configuring with Yarn and Mesos.

The next part of the journey after installation is using key components, APIs, Clustering, machine learning APIs, data pipelines, parallel programming. It is important to understand why each framework component is key, how widely it is being used, its stability and pertinent use cases.

Once we understand the individual components, we will take a couple of real life advanced analytics examples such as ‘Building a Recommendation system’, ‘Predicting customer churn’ and so on.

The objective of these real life examples is to give the reader confidence of using Spark for real-world problems.

Table of Contents

Chapter 1: Architecture and Installation
Apache Spark architecture overview
Installing Apache Spark
Writing your first Spark program
Spark architecture
Apache Spark cluster manager types
Running Spark examples
Brain teasers
References
Summary
Chapter 2: Transformations and Actions with Spark RDDs
What is an RDD?
Operations on RDD
Passing functions to Spark (Scala)
Passing functions to Spark (Java)
Passing functions to Spark (Python)
Transformations
Set operations in Spark
Actions
PairRDDs
Shared variables
References
Summary
Chapter 3: ETL with Spark
What is ETL?
How is Spark being used?
Commonly Supported File Formats
Commonly supported file systems
Structured Data sources and Databases
References
Summary
Chapter 4: Spark SQL
What is Spark SQL?
What is DataFrame API?
What is DataSet API?
What's new in Spark 2.0?
The Sparksession
Creating a DataFrame
Parquet files
Working with Hive
SparkSQL CLI
References
Summary
Chapter 5: Spark Streaming
What is Spark Streaming?
Steps involved in a streaming app
Architecture of Spark Streaming
Caching and persistence
Checkpointing
DStream best practices
Fault tolerance
What is Structured Streaming?
References
Summary
Chapter 6: Machine Learning with Spark
What is machine learning?
Why machine learning?
Types of machine learning
Introduction to Spark MLLib
Why do we need the Pipeline API?
How does it work?
Feature engineering
Classification and regression
Clustering
Collaborative filtering
ML-tuning - model selection and hyperparameter tuning
References
Summary
Chapter 7: GraphX
Graphs in everyday life
What is a graph?
Why are Graphs elegant?
What is GraphX?
Creating your first Graph (RDD API)
Basic graph operators (RDD API)
Caching and uncaching of graphs
Graph algorithms in GraphX
GraphFrames
Comparison between GraphFrames and GraphX
References
Summary
Chapter 8: Operating in Clustered Mode
Clusters, nodes and daemons
Running Spark in standalone mode
Using the Cluster Launch Scripts to Start a Standalone Cluster
Running Spark in YARN
Running Spark in Mesos
References:
Summary
Chapter 9: Building a Recommendation System
What is a recommendation system?
User specific recommendations
Key issues with recommendation systems
Recommendation system in Spark
References
Summary
Chapter 10: Customer Churn Prediction
Overview of customer churn
Why is predicting customer churn important?
How do we predict customer churn with Spark?
Exploring customer service calls
References
Summary

What You Will Learn

  • Get an overview of big data analytics and its importance for organizations and data professionals
  • Delve into Spark to see how it is different from existing processing platforms
  • Understand the intricacies of various file formats, and how to process them with Apache Spark.
  • Realize how to deploy Spark with YARN, MESOS or a Stand-alone cluster manager.
  • Learn the concepts of Spark SQL, SchemaRDD, Caching and working with Hive and Parquet file formats
  • Understand the architecture of Spark MLLib while discussing some of the off-the-shelf algorithms that come with Spark.
  • Introduce yourself to the deployment and usage of SparkR.
  • Walk through the importance of Graph computation and the graph processing systems available in the market
  • Check the real world example of Spark by building a recommendation engine with Spark using ALS.
  • Use a Telco data set, to predict customer churn using Random Forests.

Authors

Table of Contents

Chapter 1: Architecture and Installation
Apache Spark architecture overview
Installing Apache Spark
Writing your first Spark program
Spark architecture
Apache Spark cluster manager types
Running Spark examples
Brain teasers
References
Summary
Chapter 2: Transformations and Actions with Spark RDDs
What is an RDD?
Operations on RDD
Passing functions to Spark (Scala)
Passing functions to Spark (Java)
Passing functions to Spark (Python)
Transformations
Set operations in Spark
Actions
PairRDDs
Shared variables
References
Summary
Chapter 3: ETL with Spark
What is ETL?
How is Spark being used?
Commonly Supported File Formats
Commonly supported file systems
Structured Data sources and Databases
References
Summary
Chapter 4: Spark SQL
What is Spark SQL?
What is DataFrame API?
What is DataSet API?
What's new in Spark 2.0?
The Sparksession
Creating a DataFrame
Parquet files
Working with Hive
SparkSQL CLI
References
Summary
Chapter 5: Spark Streaming
What is Spark Streaming?
Steps involved in a streaming app
Architecture of Spark Streaming
Caching and persistence
Checkpointing
DStream best practices
Fault tolerance
What is Structured Streaming?
References
Summary
Chapter 6: Machine Learning with Spark
What is machine learning?
Why machine learning?
Types of machine learning
Introduction to Spark MLLib
Why do we need the Pipeline API?
How does it work?
Feature engineering
Classification and regression
Clustering
Collaborative filtering
ML-tuning - model selection and hyperparameter tuning
References
Summary
Chapter 7: GraphX
Graphs in everyday life
What is a graph?
Why are Graphs elegant?
What is GraphX?
Creating your first Graph (RDD API)
Basic graph operators (RDD API)
Caching and uncaching of graphs
Graph algorithms in GraphX
GraphFrames
Comparison between GraphFrames and GraphX
References
Summary
Chapter 8: Operating in Clustered Mode
Clusters, nodes and daemons
Running Spark in standalone mode
Using the Cluster Launch Scripts to Start a Standalone Cluster
Running Spark in YARN
Running Spark in Mesos
References:
Summary
Chapter 9: Building a Recommendation System
What is a recommendation system?
User specific recommendations
Key issues with recommendation systems
Recommendation system in Spark
References
Summary
Chapter 10: Customer Churn Prediction
Overview of customer churn
Why is predicting customer churn important?
How do we predict customer churn with Spark?
Exploring customer service calls
References
Summary

Book Details

ISBN 139781785885136
Paperback356 pages
Read More
From 1 reviews

Read More Reviews

Recommended for You

Python Machine Learning Book Cover
Python Machine Learning
$ 35.99
$ 25.20
Big Data Analytics with R and Hadoop Book Cover
Big Data Analytics with R and Hadoop
$ 29.99
$ 21.00
Machine Learning with Spark Book Cover
Machine Learning with Spark
$ 29.99
$ 3.00
Scala for Machine Learning Book Cover
Scala for Machine Learning
$ 35.99
$ 25.20
Mastering Hadoop Book Cover
Mastering Hadoop
$ 29.99
$ 6.00
Mastering Apache Spark Book Cover
Mastering Apache Spark
$ 43.99
$ 30.80