Data Mining and Data Warehousing: Principles and Practical Techniques 1st Edition by Parteek Bhatia – Ebook PDF Instant Download/DeliveryISBN: 1108585859, 9781108585859
Full download Data Mining and Data Warehousing: Principles and Practical Techniques 1st Edition after payment.
Product details:
ISBN-10 : 1108585859
ISBN-13 : 9781108585859
Author: Parteek Bhatia
Written in lucid language, this valuable textbook brings together fundamental concepts of data mining and data warehousing in a single volume. Important topics including information theory, decision tree, Naïve Bayes classifier, distance metrics, partitioning clustering, associate mining, data marts and operational data store are discussed comprehensively. The textbook is written to cater to the needs of undergraduate students of computer science, engineering and information technology for a course on data mining and data warehousing. The text simplifies the understanding of the concepts through exercises and practical examples. Chapters such as classification, associate mining and cluster analysis are discussed in detail with their practical implementation using Weka and R language data mining tools. Advanced topics including big data analytics, relational data models and NoSQL are discussed in detail. Pedagogical features including unsolved problems and multiple-choice questions are interspersed throughout the book for better understanding.
Data Mining and Data Warehousing: Principles and Practical Techniques 1st Table of contents:
1. Beginning with Machine Learning
Chapter Objectives
1.1 Introduction to Machine Learning
1.2 Applications of Machine Learning
1.3 Defining Machine Learning
1.4 Classification of Machine Learning Algorithms
1.4.1 Supervised learning
1.4.2 Unsupervised learning
1.4.3 Supervised and unsupervised learning in real life scenario
1.4.4 Reinforcement learning
2. Introduction to Data Mining
Chapter Objectives
2.1 Introduction to Data Mining
2.2 Need of Data Mining
2.3 What Can Data Mining Do and Not Do?
2.4 Data Mining Applications
2.5 Data Mining Process
2.6 Data Mining Techniques
2.6.1 Predictive modeling
2.6.2 Database segmentation
2.6.3 Link analysis
2.6.4 Deviation detection
2.7 Difference between Data Mining and Machine Learning
3. Beginning with Weka and R Language
Chapter Objectives
3.1 About Weka
3.2 Installing Weka
3.3 Understanding Fisher’s Iris Flower Dataset
3.4 Preparing the Dataset
3.5 Understanding ARFF (Attribute Relation File Format)
3.5.1 ARFF header section
3.5.2 ARFF data section
3.6 Working with a Dataset in Weka
3.6.1 Removing input/output attributes
3.6.2 Histogram
3.6.3 Attribute statistics
3.6.4 ARFF Viewer
3.6.5 Visualizer
3.7 Introduction to R
3.7.1 Features of R
3.7.2 Installing R
3.8 Variable Assignment and Output Printing in R
3.9 Data Types
3.10 Basic Operators in R
3.10.1 Arithmetic operators
3.10.2 Relational operators
3.10.3 Logical operators
3.10.4 Assignment operators
3.11 Installing Packages
3.12 Loading of Data
3.12.1 Working with the Iris dataset in R
4. Data Preprocessing
Chapter Objectives
4.1 Need for Data Preprocessing
4.2 Data Preprocessing Methods
4.2.1 Data cleaning
4.2.2 Data integration
4.2.3 Data transformation
4.2.4 Data reduction
5. Classification
Chapter Objectives
5.1 Introduction to Classification
5.2 Types of Classification
5.2.1 Posteriori classification
5.2.2 Priori classification
5.3 Input and Output Attributes
5.4 Working of Classification
5.5 Guidelines for Size and Quality of the Training Dataset
5.6 Introduction to the Decision Tree Classifier
5.6.1 Building decision tree
5.6.2 Concept of information theory
5.6.3 Defining information in terms of probability
5.6.4 Information gain
5.6.5 Building a decision tree for the example dataset
5.6.6 Drawbacks of information gain theory
5.6.7 Split algorithm based on Gini Index
5.6.8 Building a decision tree with Gini Index
5.6.9 Advantages of the decision tree method
5.6.10 Disadvantages of the decision tree
5.7 Naïve Bayes Method
5.7.1 Applying Naïve Bayes classifier to the ‘Whether Play’ dataset
5.7.2 Working of Naïve Bayes classifier using the Laplace Estimator
5.8 Understanding Metrics to Assess the Quality of Classifiers
5.8.1 The boy who cried wolf
5.8.2 True positive
5.8.3 True negative
5.8.4 False positive
5.8.5 False negative
5.8.6 Confusion matrix
5.8.7 Precision
5.8.8 Recall
5.8.9 F-Measure
6. Implementing Classification in Weka and R
Chapter Objectives
6.1 Building a Decision Tree Classifier in Weka
6.1.1 Steps to take when applying the decision tree classifier on the Iris dataset in Weka
6.1.2 Understanding the confusion matrix
6.1.3 Understanding the decision tree
6.1.4 Reading decision tree rules
6.1.5 Interpreting results
6.1.6 Using rules for prediction
6.2 Applying Naïve Bayes
6.3 Creating the Testing Dataset
6.4 Decision Tree Operation with R
6.5 Naïve Bayes Operation using R
Acknowledgement
7. Cluster Analysis
Chapter Objectives
7.1 Introduction to Cluster Analysis
7.2 Applications of Cluster Analysis
7.3 Desired Features of Clustering
7.4 Distance Metrics
7.4.1 Euclidean distance
7.4.2 Manhattan distance
7.4.3 Chebyshev distance
7.5 Major Clustering Methods/Algorithms
7.6 Partitioning Clustering
7.6.1. k-means clustering
7.6.2 Starting values for the k-means algorithm
7.6.3 Issues with the k-means algorithm
7.6.4 Scaling and weighting
7.7 Hierarchical Clustering Algorithms (HCA)
7.7.1 Agglomerative clustering
7.7.1.1 Role of linkage metrics
7.7.1.2 Weakness of agglomerative clustering methods
7.7.2 Divisive clustering
7.7.3 Density-based clustering
7.7.4 DBSCAN algorithm
7.7.5 Strengths of DBSCAN algorithm
7.7.6 Weakness of DBSCAN algorithm
8. Implementing Clustering with Weka and R
Chapter Objectives
8.1 Introduction
8.2 Clustering Fisher’s Iris Dataset with the Simple k-Means Algorithm
8.3 Handling Missing Values
8.4 Results Analysis after Applying Clustering
8.4.1 Identification of centroids for each cluster
8.4.2 Concept of within cluster sum of squared error
8.4.3 Identification of the optimum number of clusters using within cluster sum of squared error
8.5 Classification of Unlabeled Data
8.5.1 Adding clusters to dataset
8.5.2 Applying the classification algorithm by using added cluster attribute as class attribute
8.5.3 Pruning the decision tree
8.6 Clustering in R using Simple k-Means
8.6.1 Comparison of clustering results with the original dataset
8.6.2 Adding generated clusters to the original dataset
8.6.3 Apply J48 on the clustered dataset
Acknowledgement
9. Association Mining
Chapter Objectives
9.1 Introduction to Association Rule Mining
9.2 Defining Association Rule Mining
9.3 Representations of Items for Association Mining
9.4 The Metrics to Evaluate the Strength of Association Rules
9.4.1 Support
9.4.2 Confidence
9.4.3 Lift
9.5 The Naïve Algorithm for Finding Association Rules
9.5.1 Working of the Naïve algorithm
9.5.2 Limitations of the Naïve algorithm
9.5.3 Improved Naïve algorithm to deal with larger datasets
9.6 Approaches for Transaction Database Storage
9.6.1 Simple transaction storage
9.6.2 Horizontal storage
9.6.3 Vertical representation
9.7 The Apriori Algorithm
9.7.1 About the inventors of Apriori
9.7.2 Working of the Apriori algorithm
9.8 Closed and Maximal Itemsets
9.9 The Apriori–TID Algorithm for Generating Association Mining Rules
9.10 Direct Hashing and Pruning (DHP)
9.11 Dynamic Itemset Counting (DIC)
9.12 Mining Frequent Patterns without Candidate Generation (FP Growth)
9.12.1 Advantages of the FP-tree approach
9.12.2 Further improvements of FP growth
10. Implementing Association Mining with Weka and R
10.1 Association Mining with Weka
10.2 Applying Predictive Apriori in Weka
10.3 Rules Generation Similar to Classifier Using Predictive Apriori
10.4 Comparison of Association Mining CAR Rules with J48 Classifier Rules
10.5 Applying the Apriori Algorithm in Weka
10.6 Applying the Apriori Algorithm in Weka on a Real World Dataset
10.7 Applying the Apriori Algorithm in Weka on a Real World Larger Dataset
10.8 Applying the Apriori Algorithm on a Numeric Dataset
10.9 Process of Performing Manual Discretization
10.10 Applying Association Mining in R
10.11 Implementing Apriori Algorithm
10.12 Generation of Rules Similar to Classifier
10.13 Comparison of Association Mining CAR Rules with J48 Classifier Rules
10.14 Application of Association Mining on Numeric Data in R
11. Web Mining and Search Engines
Chapter Objectives
11.1 Introduction
11.2 Web Content Mining
11.2.1 Web document clustering
11.2.2 Suffix Tree Clustering (STC)
11.2.3 Resemblance and containment
11.2.4 Fingerprinting
11.3 Web Usage Mining
11.4 Web Structure Mining
11.4.1 Hyperlink Induced Topic Search (HITS) algorithm
11.5 Introduction to Modern Search Engines
11.6 Working of a Search Engine
11.6.1 Web crawler
11.6.2 Indexer
11.6.3 Query processor
11.7 PageRank Algorithm
11.8 Precision and Recall
12. Data Warehouse
Chapter Objectives
12.1 The Need for an Operational Data Store (ODS)
12.2 Operational Data Store
12.2.1 Types of ODS
12.2.2 Architecture of ODS
12.2.3 Advantages of the ODS
12.3 Data Warehouse
12.3.1 Historical developments in data warehousing
12.3.2 Defining data warehousing
12.3.3 Data warehouse architecture
12.3.4 Benefits of data warehousing
12.4 Data Marts
12.5 Comparative Study of Data Warehouse with OLTP and ODS
12.5.1 Data warehouses versus OLTP: similarities and distinction
13. Data Warehouse Schema
Chapter Objectives
13.1 Introduction to Data Warehouse Schema
13.1.1 Dimension
13.1.2 Measure
13.1.3 Fact Table
13.1.4 Multi-dimensional view of data
13.2 Star Schema
13.3 Snowflake Schema
13.4 Fact Constellation Schema (Galaxy Schema)
13.5 Comparison among Star, Snowflake and Fact Constellation Schema
14. Online Analytical Processing
Chapter Objectives
14.1 Introduction to Online Analytical Processing
14.1.1 Defining OLAP
14.1.2 OLAP applications
14.1.3 Features of OLAP
14.1.4 OLAP Benefits
14.1.5 Strengths of OLAP
14.1.6 Comparison between OLTP and OLAP
14.1.7 Differences between OLAP and data mining
14.2 Representation of Multi-dimensional Data
14.2.1 Data Cube
14.3 Implementing Multi-dimensional View of Data in Oracle
14.4 Improving efficiency of OLAP by pre-computing the queries
14.5 Types of OLAP Servers
14.5.1 Relational OLAP
14.5.2 MOLAP
14.5.3 Comparison of ROLAP and MOLAP
14.6 OLAP Operations
14.6.1 Roll-up
14.6.2 Drill-down
14.6.3 Slice and dice
14.6.4 Dice
14.6.5 Pivot
15. Big Data and NoSQL
Chapter Objectives
15.1 The Rise of Relational Databases
15.2 Major Issues with Relational Databases
15.3 Challenges from the Internet Boom
15.3.1 The rapid growth of unstructured data
15.3.2 Types of data in the era of the Internet boom
15.4 Emergence of Big Data due to the Internet Boom
15.5 Possible Solutions to Handle Huge Amount of Data
15.6 The Emergence of Technologies for Cluster Environment
15.7 Birth of NoSQL
15.8 Defining NoSQL from the Characteristics it Shares
15.9 Some Misconceptions about NoSQL
15.10 Data Models of NoSQL
15.10.1 Key-value data model
15.10.2 Column-family data model
15.10.3 Document data model
15.10.4 Graph databases
15.11 Consistency in a Distributed Environment
15.12 CAP Theorem
15.13 Future of NoSQL
15.14 Difference between NoSQL and Relational Data Models (RDBMS)
People also search for Data Mining and Data Warehousing: Principles and Practical Techniques 1st:
difference between data mining and data warehousing
introduction to data mining and data warehousing
olap operations in data mining and data warehousing
data mining and data warehousing pdf
data mining and data warehousing question paper
Tags: Data Mining, Data Warehousing, Principles, Practical Techniques, Parteek Bhatia