"Data Mining and Predictive Analytics Part I" Webpage

Data Mining and Predictive Analytics Part I - Class Notes
From Data Mining and Predictive Analytics Second Edition, by Daniel Larose and Chantal Larose (John Wiley & Sons, 2015) Larose and Larose's Data Mining and Predictive Analytics book, 2nd edition

Larose and Larose's Data Mining and Predictive Analytics book, 2nd edition

Copies of the classnotes are on the internet in PDF format as given below. The notes and supplements may contain hyperlinks to posted webpages; the links appear in red fonts. These notes have not been classroom tested and may have typographical errors.

"Data Mining and Predictive Analytics" is not yet a class at ETSU, but it is related to the M.S. program in Applied Data Science (MSADS). Details on this program can be found on the Masters in Applied Data Science Program Overview webpage (accessed 3/24/2024).

Preface. Preface notes

Part I. DATA PREPARATION.

CHAPTER 1. AN INTRODUCTION TO DATA MINING AND PREDICTIVE ANALYTICS.

1.1. What is Data Mining? What is Predictive Analytics? Section 1.1 notes
1.2. Wanted: Data Miners. Section 1.2 notes
1.3. The Need for Human Direction of Data Mining. Section 1.3 notes
1.4. The Cross-Industry Standard Process for Data Mining: CRISP-DM. Section 1.4 notes
1.5. Fallacies of Data Mining. Section 1.5 notes
1.6. What Tasks Can Data Mining Accomplish? Section 1.6 notes
Study Guide 1

CHAPTER 2. DATA PREPROCESSING.

2.1. Why do We need to Preprocess the Data? Section 2.1 notes
2.2. Data Cleaning. Section 2.2 notes
2.3. Handling Missing Data. Section 2.3 notes
2.4. Identifying Misclassifications.
2.5. Graphical Methods for Identifying Outliers.
2.6. Measures of Center and Spread.
2.7. Data Transformation.
2.8. Min-Max Normalization.
2.9. Z-Score Standardization.
2.10. Decimal Scaling
2.11. Transformations to Achieve Normality.
2.12. Numerical Methods for Identifying Outliers.
2.13. Flag Variables.
2.14. Transforming Categorical Variables into Numerical Variables.
2.15. Binning Numerical Variables.
2.16. Reclassifying Categorical Variables.
2.17. Adding an Index Field.
2.18. Removing Variables that are not Useful.
2.19. Variables that Should Probably not be Removed.
2.20. Removal of Duplicate Records.
2.21. A Word About ID Fields.
Study Guide 2.

CHAPTER 3. EXPLORATORY DATA ANALYSIS.

3.1. Hypothesis Testing Versus Exploratory Data Analysis.
3.2. Getting to Know the Data Set.
3.3. Exploring Categorical Variables.
3.4. Exploring Numerical Variables.
3.5. Exploring Multivariate Relationships.
3.6. Selecting Interesting Subsets of the Data for Further Investigation.
3.7. Using EDA to uncover Anomalous Fields.
3.8. Binning Based on Predictive Value.
3.9. Deriving New Variables: Flag Variables.
3.10. Deriving New Variables: Numerical Variables.
3.11. Using EDA to Investigate Correlated Predictor Variables.
3.12. Summary of Our EDA.
Study Guide 3.

CHAPTER 4. DIMENSION REDUCTION METHODS.

4.1. Need for Dimension-Reduction in Data Mining.
4.2. Principal Component Analysis.
4.3. Applying PCA to the Houses Data Set.
4.4. How Many Components Should We Extract?
4.5. Profiling the Principal Components.
4.6. Communalities.
4.7. Validation of the Principal Components.
4.8. Factor Analysis.
4.9. Applying Factor Analysis to the Adult Data Set.
4.10. Factor Rotation.
4.11. User-Defined Composites.
4.12. An Example of a User-Defined Composite.
Study Guide 4.

ADDITIONAL PARTS OF THE BOOK

Part II. Statistical Analysis. (5 Chapters)
Part III. Classification. (9 Chapters)
Part IV. Clustering. (4 Chapters)
Part V. Association Rules. (1 Chapter)
Part VI. Enhancing Model Performance. (3 Chapters)
Part VII. Further Topics. (2 Chapters)
Part VIII. Case Study: Predicting Response to Direct-Mail Marketing. (4 Chapters)
Appendix A. Data Summarization and Visualization.

Return to Bob Gardner's home page