Mastering Spark with R The Complete Guide to Large Scale Analysis and Modeling 1st Edition by Javier Luraschi, Kevin Kuo, Edgar Ruiz – Ebook PDF Instant Download/Delivery: 149204637X, 978-1492046370
Full download Mastering Spark with R The Complete Guide to Large Scale Analysis and Modeling 1st Edition after payment

Product details:
ISBN 10: 149204637X
ISBN 13: 978-1492046370
Author: Javier Luraschi, Kevin Kuo, Edgar Ruiz
If you’re like most R users, you have deep knowledge and love for statistics. But as your organization continues to collect huge amounts of data, adding tools such as Apache Spark makes a lot of sense. With this practical book, data scientists and professionals working with large-scale data applications will learn how to use Spark from R to tackle big data and big compute problems.
Authors Javier Luraschi, Kevin Kuo, and Edgar Ruiz show you how to use R with Spark to solve different data analysis problems. This book covers relevant data science topics, cluster computing, and issues that should interest even the most advanced users.
Table of contents:
1. Introduction.
Overview
Hadoop
Spark
R
sparklyr
Recap
2. Getting Started.
Overview
Prerequisites
Installing sparklyr
Installing Spark
Connecting
Using Spark
Web Interface
Analysis
Modeling
Data
Extensions
Distributed R
Streaming
Logs
Disconnecting
Using RStudio
Resources
Recap
3. Analysis.
Overview
Import
Wrangle
Built-in Functions.
Correlations
Visualize
Using ggplot2
Using dbplot
Model
Caching
Communicate
Recap
4. Modeling.
Overview
Exploratory Data Analysis
Feature Engineering
Supervised Learning
Generalized Linear Regression
Other Models
Unsupervised Learning
Data Preparation
Topic Modeling
Recap
5. Pipelines.
Overview
Creation
Use Cases
Hyperparameter Tuning
Operating Modes
Interoperability
Deployment
Batch Scoring
Real-Time Scoring
Recap
6. Clusters.
Overview
On-Premises
Managers
Distributions
Cloud
Amazon
Databricks
IBM
Microsoft
Qubole
Kubernetes
Tools
RStudio
Jupyter
Livy
Recap
7. Connections.
Overview
Edge Nodes
Spark Home
Local
Standalone
YARN
YARN Client
YARN Cluster
Livy
Mesos
Kubernetes
Cloud
Batches
Tools
Multiple Connections
Troubleshooting
Logging
Spark Submit
Windows
Recap
8. Data.
Overview
Reading Data
Paths
Schema
Memory
Columns
Writing Data
Copying Data
File Formats
CSV
JSON
Parquet
Others
File Systems
Storage Systems
Hive
Cassandra
JDBC
Recap
9. Tuning.
Overview
Graph
Timeline
Configuring
Connect Settings
Submit Settings
Runtime Settings
sparklyr Settings
Partitioning
Implicit Partitions
Explicit Partitions
Caching
Checkpointing
Memory
Shuffling
Serialization
Configuration Files
Recap
10. Extensions.
Overview
H2O
Graphs
XGBoost
Deep Learning
Genomics
Spatial
Troubleshooting
Recap
11. Distributed R.
Overview
Use Cases
Custom Parsers
Partitioned Modeling
Grid Search
Web APIs
Simulations
Partitions
Grouping
Columns
Context
Functions
Packages
Cluster Requirements
Installing R
Apache Arrow
Troubleshooting
Worker Logs
Resolving Timeouts
Inspecting Partitions
Debugging Workers
Recap
12. Streaming.
Overview
Transformations
Analysis
Modeling
Pipelines
Distributed R
Kafka
Shiny
Recap
13. Contributing.
Overview
The Spark API
Spark Extensions
Using Scala Code
Recap
People also search for:
mastering the mix review
spark master configuration
mastering spark sql
mastering with reason 12
mastering with span
Tags: Javier Luraschi, Kevin Kuo, Edgar Ruiz, Mastering Spark, Complete Guide, Large Scale Analysis, Modeling