Scala Data Analysis Cookbook

Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
Preview in Mapt

Scala Data Analysis Cookbook

Arun Manivannan

1 customer reviews
Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
Mapt Subscription
FREE
$29.99/m after trial
eBook
$25.20
RRP $35.99
Save 29%
Print + eBook
$44.99
RRP $44.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$25.20
$44.99
$29.99p/m after trial
RRP $35.99
RRP $44.99
Subscription
eBook
Print + eBook
Start 30 Day Trial

Frequently bought together


Scala Data Analysis Cookbook Book Cover
Scala Data Analysis Cookbook
$ 35.99
$ 25.20
Advanced Techniques for Data Analysis with Scala [Video] Book Cover
Advanced Techniques for Data Analysis with Scala [Video]
$ 124.99
$ 106.25
Buy 2 for $35.00
Save $125.98
Add to Cart
Subscribe and access every Packt eBook & Video.
 
  • 5,000+ eBooks & Videos
  • 50+ New titles a month
  • 1 Free eBook/Video to keep every month
Start Free Trial
 

Book Details

ISBN 139781784396749
Paperback254 pages

Book Description

This book will introduce you to the most popular Scala tools, libraries, and frameworks through practical recipes around loading, manipulating, and preparing your data. It will also help you explore and make sense of your data using stunning and insightfulvisualizations, and machine learning toolkits.

Starting with introductory recipes on utilizing the Breeze and Spark libraries, get to grips withhow to import data from a host of possible sources and how to pre-process numerical, string, and date data. Next, you’ll get an understanding of concepts that will help you visualize data using the Apache Zeppelin and Bokeh bindings in Scala, enabling exploratory data analysis. iscover how to program quintessential machine learning algorithms using Spark ML library. Work through steps to scale your machine learning models and deploy them into a standalone cluster, EC2, YARN, and Mesos. Finally dip into the powerful options presented by Spark Streaming, and machine learning for streaming data, as well as utilizing Spark GraphX.

Table of Contents

Chapter 1: Getting Started with Breeze
Introduction
Getting Breeze – the linear algebra library
Working with vectors
Working with matrices
Vectors and matrices with randomly distributed values
Reading and writing CSV files
Chapter 2: Getting Started with Apache Spark DataFrames
Introduction
Getting Apache Spark
Creating a DataFrame from CSV
Manipulating DataFrames
Creating a DataFrame from Scala case classes
Chapter 3: Loading and Preparing Data – DataFrame
Introduction
Loading more than 22 features into classes
Loading JSON into DataFrames
Storing data as Parquet files
Using the Avro data model in Parquet
Loading from RDBMS
Preparing data in Dataframes
Chapter 4: Data Visualization
Introduction
Visualizing using Zeppelin
Creating scatter plots with Bokeh-Scala
Creating a time series MultiPlot with Bokeh-Scala
Chapter 5: Learning from Data
Introduction
Supervised and unsupervised learning
Gradient descent
Predicting continuous values using linear regression
Binary classification using LogisticRegression and SVM
Binary classification using LogisticRegression with Pipeline API
Clustering using K-means
Feature reduction using principal component analysis
Chapter 6: Scaling Up
Introduction
Building the Uber JAR
Submitting jobs to the Spark cluster (local)
Running the Spark Standalone cluster on EC2
Running the Spark Job on Mesos (local)
Running the Spark Job on YARN (local)
Chapter 7: Going Further
Introduction
Using Spark Streaming to subscribe to a Twitter stream
Using Spark as an ETL tool
Using StreamingLogisticRegression to classify a Twitter stream using Kafka as a training stream
Using GraphX to analyze Twitter data

What You Will Learn

  • Familiarize and set up the Breeze and Spark libraries and use data structures
  • Import data from a host of possible sources and create dataframes from CSV
  • Clean, validate and transform data using Scala to pre-process numerical and string data
  • Integrate quintessential machine learning algorithms using Scala stack
  • Bundle and scale up Spark jobs by deploying them into a variety of cluster managers
  • Run streaming and graph analytics in Spark to visualize data, enabling exploratory analysis

Authors

Table of Contents

Chapter 1: Getting Started with Breeze
Introduction
Getting Breeze – the linear algebra library
Working with vectors
Working with matrices
Vectors and matrices with randomly distributed values
Reading and writing CSV files
Chapter 2: Getting Started with Apache Spark DataFrames
Introduction
Getting Apache Spark
Creating a DataFrame from CSV
Manipulating DataFrames
Creating a DataFrame from Scala case classes
Chapter 3: Loading and Preparing Data – DataFrame
Introduction
Loading more than 22 features into classes
Loading JSON into DataFrames
Storing data as Parquet files
Using the Avro data model in Parquet
Loading from RDBMS
Preparing data in Dataframes
Chapter 4: Data Visualization
Introduction
Visualizing using Zeppelin
Creating scatter plots with Bokeh-Scala
Creating a time series MultiPlot with Bokeh-Scala
Chapter 5: Learning from Data
Introduction
Supervised and unsupervised learning
Gradient descent
Predicting continuous values using linear regression
Binary classification using LogisticRegression and SVM
Binary classification using LogisticRegression with Pipeline API
Clustering using K-means
Feature reduction using principal component analysis
Chapter 6: Scaling Up
Introduction
Building the Uber JAR
Submitting jobs to the Spark cluster (local)
Running the Spark Standalone cluster on EC2
Running the Spark Job on Mesos (local)
Running the Spark Job on YARN (local)
Chapter 7: Going Further
Introduction
Using Spark Streaming to subscribe to a Twitter stream
Using Spark as an ETL tool
Using StreamingLogisticRegression to classify a Twitter stream using Kafka as a training stream
Using GraphX to analyze Twitter data

Book Details

ISBN 139781784396749
Paperback254 pages
Read More
From 1 reviews

Read More Reviews

Recommended for You

Scala for Machine Learning Book Cover
Scala for Machine Learning
$ 35.99
$ 25.20
Python Machine Learning Book Cover
Python Machine Learning
$ 35.99
$ 25.20
Machine Learning with Spark Book Cover
Machine Learning with Spark
$ 29.99
$ 3.00
Spark Cookbook Book Cover
Spark Cookbook
$ 35.99
$ 25.20
Practical Data Science Cookbook Book Cover
Practical Data Science Cookbook
$ 29.99
$ 21.00
Learning Concurrent Programming in Scala Book Cover
Learning Concurrent Programming in Scala
$ 26.99
$ 18.90