Big Data Analytics Tools and Technology for Effective Planning 1st Edition by Arun Somani, Ganesh Chandra Deka – Ebook PDF Instant Download/Delivery: 9781315391243 ,1315391244
Full download Big Data Analytics Tools and Technology for Effective Planning 1st Edition after payment
Product details:
ISBN 10: 1315391244
ISBN 13: 9781315391243
Author: Arun Somani, Ganesh Chandra Deka
Big Data Analytics Tools and Technology for Effective Planning 1st Edition Table of contents:
Chapter 1 Challenges in Big Data
Introduction
Background
Goals and Challenges of Analyzing Big Data
Paradigm Shifts
Organization of This Paper
Algorithms for Big Data Analytics
k-Means
Classification Algorithms: k-NN
Application of Big Data: A Case Study
Economics and Finance
Other Applications
Salient Features of Big Data
Heterogeneity
Noise Accumulation
Spurious Correlation
Coincidental Endogeneity
Impact on Statistical Thinking
Independence Screening
Dealing with Incidental Endogeneity
Impact on Computing Infrastructure
Literature Review
MapReduce
Cloud Computing
Impact on Computational Methods
First-Order Methods for Non-Smooth Optimization
Dimension Reduction and Random Projection
Future Perspectives and Conclusion
Existing Methods
Proposed Methods
Probabilistic Graphical Modeling
Mining Twitter Data: From Content to Connections
Late Work: Location-Specific Tweet Detection and Topic Summarization in Twitter
Tending to Big Data Challenges in Genome Sequencing and RNA Interaction Prediction
Single-Cell Genome Sequencing
RNA Structure and RNA–RNA Association Expectation
Identifying Qualitative Changes in Living Systems
Acknowledgments
References
Additional References for Researchers and Advanced Readers for Further Reading
Key Terminology and Definitions
Chapter 2 Challenges in Big Data Analytics
Introduction
Data Challenges
Storing the Data
Velocity of the Data
Data Variety
Computational Power
Understanding the Data
Data Quality
Data Visualization
Management Challenges
Leadership
Talent Management
Technology
Decision Making
Company Culture
Process Challenges
Introduction to Hadoop
Why Not a Data Warehouse for Big Data?
What Is Hadoop?
How Does Hadoop Tackle Big Data Challenges?
Storage Problem
Various Data Formats
Processing the Sheer Volume of Data
Cost Issues
Capturing the Data
Durability Problem
Scalability Issues
Issues in Analyzing Big Data
HDFS
Architecture
MapReduce
Hadoop: Pros and Cons
Other Big Data-Related Projects
Data Formats
Apache Avro
Apache Parquet
Data Ingestion
Apache Flume
Apache Sqoop
Data Processing
Apache Pig
Apache Hive
Apache Crunch
Apache Spark
Storage
HBase
Coordination
ZooKeeper
References
Chapter 3 Big Data Reference Model
Introduction into Big Data Management Reference Model
Information Visualization Based on the IVIS4BigData Reference Model
Interaction with Visual Data Views
Interaction with the Visualization Pipeline and Its Transformation Mappings
Introduction to the IVIS4BigData Reference Model
Introduction to Big Data Process Management Based on the CRISP4BigData Reference Model
The CRISP4BigData Reference Model
Data Collection, Management, and Curation
Analytics
Interaction and Perception
Deployment, Collaboration, and Visualization
Data Enrichment
Insight and Effectuation
Potentialities and Continuous Product and Service Improvement
Data Enrichment
Knowledge-Based Support
Knowledge Generation and Management
Retention and Archiving
Preparatory Operations for Evaluation of the CRISP4BigData Reference Model within a Cloud-Based Hadoop Ecosystem
Architecture Instantiation
Use Case 1: MetaMap Annotation of Biomedical Publications via Hadoop
Use Case 2: Emotion Recognition in Video Frames with Hadoop
Hadoop Cluster Installation in the EGI Federated Cloud
MetaMap Annotation Results
Conclusions and Outlook
References
Key Terminology and Definitions
Chapter 4 A Survey of Tools for Big Data Analytics
Survey on Commonly Used Big Data Tools
Potential Growth Versus Commitment for Big Data Analytics Options
Potential Growth
Commitment
Balance of Commitment and Potential Growth
Trends for Big Data Analytics Options
Group 1: Strong to Moderate Commitment, Strong Potential Growth
Advanced Analytics
Visualization
Real Time
In-Memory Databases
Unstructured Data
Group 2: Moderate Commitment, Good Potential Growth
Group 3: Weak Commitment, Good Growth
Hadoop Distributed File System (HDFS)
MapReduce
Complex Event Processing (CEP)
SQL
Clouds in TDWI Technology Surveys
Group 4: Strong Commitment, Flat or Declining Growth
Understanding Internet of Things Data
Challenges for Big Data Analytics Tools
Tools for Using Big Data
Jaspersoft BI Suite
Benefits
Pentaho Business Analytics
Karmasphere Studio and Analyst
Direct Access to Big Data for Analysis
Operationalization of the Results
Flexibility and Independence
Talend Open Studio
Skytree Server
Tableau Desktop and Server
Splunk
Splice Machine
Cost-Effective Scaling and Performance with Commodity Hardware
Real-Time Updates with Transactional Integrity
Conclusions
References
Chapter 5 Understanding the Data Science behind Business Analytics
Introduction
Types of Big Data Analytics
Descriptive Analytics
Diagnostic Analytics
Predictive Analytics
Prescriptive Analytics
Analytics Use Case: Customer Churn Prevention
Descriptive Analytics
Application of Descriptive Analytics in Customer Churn Prevention
Techniques Used for Descriptive Analytics
Diagnostic Analytics
Predictive Analytics
Techniques Used for Predictive Analytics
Machine Learning Techniques
Artificial Neural Networks
Supervised Learning
Artificial Neural Network Structure and Training
Back-Propagation Weight Adjustment Scheme
Prescriptive Analytics
Application of Prescriptive Analytics in the Customer Churn Prevention Use Case
Prescriptive Analytics Techniques
Big Data Analytics Architecture
Tools Used for Big Data Analytics
IBM InfoSphere
IBM SPSS
Apache Mahout
Azure Machine Learning Studio
Halo
Tableau
SAP Infinite Insight
@Risk
Oracle Advanced Analytics
TIBCO SpotFire
R
Wolfram Mathematica
Future Directions and Technologies
From Batch Processing to Real-Time Analytics
In-Memory Big Data Processing
Prescriptive Analytics
Conclusions
References
Online Sources
Chapter 6 Big Data Predictive Modeling and Analytics
Introduction
The Power of Business Planning with Precise Predictions
Predictive Modeling for Effective Business Planning: A Case Study
Effect of Big Data in Predictive Modeling
Predictive Modeling
Predictive Modeling Process
Selecting and Preparing Data
Fitting a Model
Feature Vectors
Estimating and Validating the Model
Types of Predictive Models
Linear and Nonlinear Regression
Types of Regression Algorithms
Decision Trees
Use of Decision Trees in Big Data Predictive Analytics
Inference
Random Forests
Use of Random Forests for Big Data
Support Vector Machines
Use of Support Vector Machines for Big Data Predictive Analytics
Unsupervised Models: Cluster Analysis
Cluster Analysis
Algorithms for Cluster Analysis
Use of Cluster Analysis for Big Data Predictions
Inference
Measuring Accuracy of Predictive Models
Target Shuffling
Lift Charts
ROC Curves
Bootstrap Sampling
Tools and Techniques Used for Predictive Modeling and Analytics
Data Mining Using CRISP-DM Technique
CRISP-DM Tool
Predictive Analytics Using R Open-Source Tool
Research Trends and Conclusion
References
Chapter 7 Deep Learning for Engineering Big Data Analytics
Introduction
Overview of Deep Learning as a Hierarchical Feature Extractor
Flow Physics for Inertial Flow Sculpting
Problem Definition
Design Challenges and the State of the Art
Broad Implications
Micropillar Sequence Design Using Deep Learning
Deep CNNs for Pillar Sequence Design
Action Sequence Learning for Flow Sculpting
Representative Training Data Generation
Summary, Conclusions, and Research Directions
References
Chapter 8 A Framework for Minimizing Data Leakage from Nonproduction Systems
Introduction
Nonproduction Environments
Legal, Business, and Human Factors
Existing Frameworks, Solutions, Products, and Guidelines
Limitations
Research for Framework Development
Research the Use and Protection of Data in Nonproduction Systems
Freedom of Information: An Organizational View
Questionnaire: Opinion
Simplified Business Model
Six Stages of the Framework, Detailing from Organization to Compliance
Know the Legal and Regulatory Standard
Know the Business Data
Know the System
Know the Environment
Data Treatment and Protection
Demonstrate Knowledge
Tabletop Case Study
Hypothetical Case Study Scenario
Discussions Using the Simplified Business Model and Framework
Know the Legal and Regulatory Standards
Know the Business Data
Know the System
Know the Environment
Data Treatment and Protection
Demonstrate Knowledge
Summary of the Impact of the Framework on the Case Study
Conclusions
Glossary
References
Chapter 9 Big Data Acquisition, Preparation, and Analysis Using Apache Software Foundation Tools
Introduction
Data Acquisition
Freely Available Sources of Data Sets
Data Collection through Application Programming Interfaces
Web Scraping
Create the Scraping Template
Explore Site Navigation
Automate Navigation and Extraction
Web Crawling
Data Preprocessing and Cleanup
Need for Hadoop MapReduce and Other Languages for Big Data Preprocessing
When to Choose Hadoop over Python or R
Comparison of Hadoop MapReduce, Python, and R
Cleansing Methods and Routines
Loading Data from Flat Files
Merging and Joining Data Sets
Query the Data
Data Analysis
Big Data Analysis Using Apache Foundation Tools
Various Language Support via Apache Hadoop
Apache Spark for Big Data Analytics
Case Study: Dimensionality Reduction Using Apache Spark, an Illustrative Example from Scientometric Big Data
Why SVD? The Solution May Be the Problem
Need for SVD
Relationship between Singular Values and the Input Matrix
SVD Using Hadoop Streaming with Numpy
Complexity of Using MapReduce
SVD Using Apache Mahout
Generating Sequence Files for Mahout
Lanczo’s Algorithm
Reading the Mahout-Generated Vectors
SVD Using Apache Spark
Data Analysis through Visualization
Why Big Data Visualization Is Challenging and Different from Traditional Data Visualization
Conclusions
References
Appendix 1
Spark Implementation and Configuration
Building Apache Spark Using Maven
Setting Up Maven’s Memory Usage
build/mvn
Required Cluster Configurations for Spark
Key Terminology and Definitions
Chapter 10 Storing and Analyzing Streaming Data: A Big Data Challenge
Introduction
Streaming Algorithms
Modeling Data Pipeline Problems
Vanilla Model
Sliding Window Model
Turnstile Model
Cash Register Model
Probablistic Data Structures and Algorithms
Bloom Filter
Count-Min Sketch
HyperLogLog
T Digest
Filtered Space Saving
Suitability of Data Models for Storing Stream Data
Key Value Stores Model
Document Stores Model
Wide-Column Stores Model
Other Models
Challenges and Open Research Problems
Industrial Challenges
Open Research Problems
Conclusions
References
Further Readings
Chapter 11 Big Data Cluster Analysis: A Study of Existing Techniques and Future Directions
Introduction
Overview of Cluster Analysis
An Illustrative Example
Similarity
Termination
Types of Clustering Techniques
Hierarchical Clustering
Overview
Agglomerative Hierarchical Clustering
Divisive Clustering
ROCK Clustering Technique
Expected Effects of the Three Vs on the ROCK Clustering Technique
Density-Based Clustering
Overview
DBSCAN Clustering Technique
Expected Effects of the Three Vs on the DBSCAN Clustering Technique
Partitioning-Based Clustering
Overview
k-Means Clustering Technique
Expected Effects of 3 Vs on the k-Means Clustering Technique
Grid-Based Clustering
Overview
CLIQUE Clustering Technique
Expected Effects of the Three Vs on the CLIQUE Clustering Technique
Miscellaneous Clustering Techniques
Future Directions for Big Data Cluster Analysis Research
Distributed Techniques
Density-Based Distributed Clustering
Graph Partitioning-Based Clustering
MapReduce-Based k-Means-Type Clustering
Robust Techniques
Clustering Engines
Conclusions
References
Key Terminology and Definitions
Chapter 12 Nonlinear Feature Extraction for Big Data Analytics
Introduction
Traditional Methods: Feature Extraction with PCA
Algorithm
PCA for Big Data
Kernel Methods
Problems Associated with Kernel Methods
Kernel Functions as New Features
Kernel Properties
The Most Common Kernel Functions
Selecting Basis Elements
Random Sampling
k-Means
Sparse Greedy Matrix Approximation
Example of Semiparametric SVMs
Stochastic Gradient Descent
Iterative Reweighted Least Squares
Deep Feature Learning
Auto-Encoders
Measuring Discrepancies in AE
Architecture of AEs
Regularizing AEs
Learning
Constructing Deep Structures
Benefits for Big Data
Probabilistic Models
Learning
Contrastive Divergence
Constructing Deep Structures
Deep Distributed Learning
Data Parallelism
Model Parallelism
Data and Model Parallelism
Updating Parameters in Large-Scale Networks
Synchronous Mode
Asynchronous Mode
Ensemble Methods
Nontrainable Aggregation
Trainable Aggregation
References
Chapter 13 Enhanced Feature Mining and Classifier Models to Predict Customer Churn for an e-Retailer
Introduction
Enterprise Data Pipeline (Hadoop Stack)
Stages of the Hadoop Pipeline
Source
Collection
Process
Store
Extract
Existing Models for Customer Churn
Customer Churn Model
Phase 1: Feature Mining
Phase 2: Data Science Model Building
Phase 3: Cross-Validation, Business Action, and Performance Tuning
Feature Engineering
Feature Mining
Demographic Features
Customer Information
Customer Sales
Product Sales
Frequency
Behavioral
Experience
Feature Selection
Baseline Methods
Imbalance in Class Label Outputs
F-ANOVA
Regularization
Binomial Classifier
SVM
Logistic Regression with L1 Norm
Gradient-Boost Ensemble
Cross-Validation
k-Fold Strategy for Cross-Validation
Receiver Operating Characteristics Curve
Conclusions and Future Work
References
Key Terminology and Definitions
Chapter 14 Large-Scale Entity Clustering Based on Structural Similarities within Knowledge Graphs
Introduction
Background: Big Data and Analytics
Definitions of Big Data
Characteristics of Big Data
Gartner’s Three Vs
IBM’s Four Vs
Microsoft’s Six Vs
Analytics
Background: Knowledge Graphs
Motivation: Entity Clustering in Large-Scale Knowledge Graphs
Data Description
Overview
Freebase Data Model
Knowledge Representation in Freebase
Computing Clusters
Data Preprocessing
Formulation of the Similarity Metric
Selection of Clustering Algorithm
Clustering Experiments
Results and Discussion
Evaluation of Cluster Quality
Cluster Analysis
Future Work: Exploitation of Clusters for Topic Exploration
Related Work
Methodology Limitations
Conclusions
References
Chapter 15 Big Data Analytics for Connected Intelligence with the Internet of Things
Introduction
Iot Paradigm
Main Elements of IoT
Popular Applications of IoT
Security Challenges in the IoT Network
A Brief Description of the Intelligence Definition
A Model of Systematic Intelligence Formation from Natural Intelligence
Foundations of Human–Computer Interactions
Intelligent Frameworks in IoT
Big Data Analytics for Intelligence in the IoT
Data Storage and Processing Modules
The Key Devices of Intelligence in IoT
The Fundamental Mechanisms for Big Data Analytics
Big Data Analytics Methods for Connected Intelligence with IoT
Evaluation Results
Conclusions and Future Work
References
Key Terminology and Definitions
Chapter 16 Big Data-Driven Value Chains and Digital Platforms: From Value Co-creation to Monetization
Introduction
Big Data-Driven Value Chains
Creation (Data Capture)
Storage (Data Warehousing)
Processing (Data Mining and Fusion)
Consumption (Sharing)
Assembling Value via Heterogeneous Data Fusion and Interoperability
Intermediating Digital Platforms
Linking Big Data with Network Theory
Big Data-Driven Business Planning
Monetization Strategies
Conclusions
References
Chapter 17 Distant and Close Reading of Dutch Drug Debates in Historical Newspapers: Possibilities and Challenges of Big Data Analysis in Historical Public Debate Research
Introduction
Big Data and Digital Humanities
Public Debate Research
Dutch Historical Newspapers as Big Data
Discourse Analysis and Governmentality
A Leveled Approach
Research Demonstration: Public Perception of Amphetamine in Dutch Newspapers
Regulation and Public Perception of Amphetamine
Modularity and the Cross-Media Challenge
Conclusions
References
Index
People also search for Big Data Analytics Tools and Technology for Effective Planning 1st Edition:
explain big data analytics tools and technology in iot
discuss big data analytics tools and technology
big data analytics tools and technology for effective planning
big data analytics tools examples
Tags:
Arun Somani,Ganesh Chandra Deka,Data Analytics,Technology,Effective Planning