Taming Big Data with Apache Spark and Python - Hands On! [Video]

Preview in Mapt

Taming Big Data with Apache Spark and Python - Hands On! [Video]

Frank Kane

More than 15 hands-on examples to help you analyze large data sets with Apache Spark
Mapt Subscription
FREE
$29.99/m after trial
Video
$68.00
RRP $79.99
Save 14%
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$68.00
$29.99 p/m after trial
RRP $79.99
Subscription
Video
Start 14 Day Trial

Frequently bought together


Taming Big Data with Apache Spark and Python - Hands On! [Video] Book Cover
Taming Big Data with Apache Spark and Python - Hands On! [Video]
$ 79.99
$ 68.00
Tensorflow Solutions for Text [Video] Book Cover
Tensorflow Solutions for Text [Video]
$ 124.99
$ 106.25
Buy 2 for $35.00
Save $169.98
Add to Cart

Video Details

ISBN 139781787129931
Course Length5 hours 11 minutes

Video Description

Apache Spark has emerged as the next big thing in the Big Data domain – quickly rising from an ascending technology to an established superstar in just a matter of years. Spark allows you to quickly extract actionable insights from large amounts of data, on a real-time basis. This course will be your companion to learn Apache Spark in a hands-on manner. Start with understanding how to set up Spark on a single system or on a cluster. From analyzing large data sets using Spark RDD, to developing and running effective Spark jobs quickly using Python, this course will teach you everything. Packed with over 15 interactive, fun-filled examples relevant to the real-world, the course will empower you to understand the Spark ecosystem and implement production-grade real-time Spark projects with ease.

Table of Contents

Getting Started with Spark
Introduction
How to Use This Course
Getting Set Up – Installing Python, a JDK, Spark, and its Dependencies.
Installing the MovieLens Movie Rating Dataset
Run Your First Spark Program – Ratings Histogram Example
Spark Basics and Simple Examples
Introduction to Spark
The Resilient Distributed Dataset (RDD)
Ratings Histogram Walkthrough
Key/Value RDD's, and the Average Friends by Age Example
Running the Average Friends by Age Example
Filtering RDD's, and the Minimum Temperature by Location Example
Running the Minimum Temperature Example and Modifying It for Maximums
Running the Maximum Temperature by Location Example
Counting Word Occurrences Uusing flatmap()
Improving the Word Count Script with Regular Expressions
Sorting the Word Count Results
Advanced Examples of Spark Programs
Find the Most Popular Movie
Use Broadcast Variables to Display Movie Names Instead of ID Numbers
Find the Most Popular Superhero in a Social Graph
Run the Script – Discover Who the Most Popular Superhero is!
Superhero Degrees of Separation – Introducing Breadth-First Search
Superhero Degrees of Separation – Accumulators, and Implementing BFS in Spark
Superhero Degrees of Separation – Review the Code and Run it
Item-Based Collaborative Filtering in Spark, cache(), and persist()
Running the Similar Movies Script Using Spark's Cluster Manager
Improve the Quality of Similar Movies
Running Spark on a Cluster
Introducing Elastic MapReduce
Setting Up Your AWS / Elastic MapReduce Account and PuTTY
Partitioning
Create Similar Movies from One Million Ratings – Part 1
Create Similar Movies from One Million Ratings – Part 2
Create Similar Movies from One Million Ratings – Part 3
Troubleshooting Spark on a Cluster
More Troubleshooting and Managing Dependencies
SparkSQL, DataFrames, and DataSets
Introducing SparkSQL
Executing SQL Commands and SQL-Style Functions on a DataFrame
Using DataFrames Instead of RDDs
Other Spark Technologies and Libraries
Introducing MLLib
Using MLLib to Produce Movie Recommendations
Analyzing the ALS Recommendations Results
Using DataFrames with MLLib
Spark Streaming and GraphX
You Made It! Where to Go from Here
Learning More about Spark and Data Science

What You Will Learn

  • Learn how you can identify the Big Data problems as Spark problems
  • Install and run Apache Spark on your computer or on a cluster
  • Analyze large data sets across many CPUs using Spark’s Resilient Distributed Datasets (RDDs)
  • Implement machine learning on Spark using the MLlib library
  • Process continuous streams of data in real time using the Spark streaming module
  • Perform complex network analysis using Spark’s GraphX library
  • Use Amazon's Elastic MapReduce service to run your Spark jobs on a cluster

Authors

Table of Contents

Getting Started with Spark
Introduction
How to Use This Course
Getting Set Up – Installing Python, a JDK, Spark, and its Dependencies.
Installing the MovieLens Movie Rating Dataset
Run Your First Spark Program – Ratings Histogram Example
Spark Basics and Simple Examples
Introduction to Spark
The Resilient Distributed Dataset (RDD)
Ratings Histogram Walkthrough
Key/Value RDD's, and the Average Friends by Age Example
Running the Average Friends by Age Example
Filtering RDD's, and the Minimum Temperature by Location Example
Running the Minimum Temperature Example and Modifying It for Maximums
Running the Maximum Temperature by Location Example
Counting Word Occurrences Uusing flatmap()
Improving the Word Count Script with Regular Expressions
Sorting the Word Count Results
Advanced Examples of Spark Programs
Find the Most Popular Movie
Use Broadcast Variables to Display Movie Names Instead of ID Numbers
Find the Most Popular Superhero in a Social Graph
Run the Script – Discover Who the Most Popular Superhero is!
Superhero Degrees of Separation – Introducing Breadth-First Search
Superhero Degrees of Separation – Accumulators, and Implementing BFS in Spark
Superhero Degrees of Separation – Review the Code and Run it
Item-Based Collaborative Filtering in Spark, cache(), and persist()
Running the Similar Movies Script Using Spark's Cluster Manager
Improve the Quality of Similar Movies
Running Spark on a Cluster
Introducing Elastic MapReduce
Setting Up Your AWS / Elastic MapReduce Account and PuTTY
Partitioning
Create Similar Movies from One Million Ratings – Part 1
Create Similar Movies from One Million Ratings – Part 2
Create Similar Movies from One Million Ratings – Part 3
Troubleshooting Spark on a Cluster
More Troubleshooting and Managing Dependencies
SparkSQL, DataFrames, and DataSets
Introducing SparkSQL
Executing SQL Commands and SQL-Style Functions on a DataFrame
Using DataFrames Instead of RDDs
Other Spark Technologies and Libraries
Introducing MLLib
Using MLLib to Produce Movie Recommendations
Analyzing the ALS Recommendations Results
Using DataFrames with MLLib
Spark Streaming and GraphX
You Made It! Where to Go from Here
Learning More about Spark and Data Science

Video Details

ISBN 139781787129931
Course Length5 hours 11 minutes
Read More

Read More Reviews

Recommended for You

Tensorflow Solutions for Text [Video] Book Cover
Tensorflow Solutions for Text [Video]
$ 124.99
$ 106.25
Kotlin in Practice [Video] Book Cover
Kotlin in Practice [Video]
$ 124.99
$ 106.25
Go : Building DevOps Tools [Integrated Course] Book Cover
Go : Building DevOps Tools [Integrated Course]
$ 124.99
$ 106.25
Kali Linux Wireless Penetration Testing Cookbook Book Cover
Kali Linux Wireless Penetration Testing Cookbook
$ 31.99
$ 22.40
Responsive Web Design by Example Book Cover
Responsive Web Design by Example
$ 31.99
$ 22.40
Security Automation with Ansible 2 Book Cover
Security Automation with Ansible 2
$ 35.99
$ 25.20