Data Mining and Predictive Analytics Part I - Class Notes
From Data Mining and Predictive Analytics Second Edition, by Daniel Larose and Chantal Larose (John Wiley & Sons, 2015)
Copies of the classnotes are on the internet in PDF format as given below. The notes and supplements may contain hyperlinks to posted webpages; the links appear in red fonts. These notes have not been classroom tested and may have typographical errors.
"Data Mining and Predictive Analytics" is not yet a class at ETSU, but it is related to the M.S. program in Applied Data Science (MSADS).
Details on this program can be found on the Masters in Applied Data Science
Program Overview webpage (accessed 3/24/2024).
Preface. Preface notes
Part I. DATA PREPARATION.
CHAPTER 1. AN INTRODUCTION TO DATA MINING AND PREDICTIVE ANALYTICS.
CHAPTER 2. DATA PREPROCESSING.
- 2.1. Why do We need to Preprocess the Data? Section 2.1 notes
- 2.2. Data Cleaning. Section 2.2 notes
- 2.3. Handling Missing Data. Section 2.3 notes
- 2.4. Identifying Misclassifications.
- 2.5. Graphical Methods for Identifying Outliers.
- 2.6. Measures of Center and Spread.
- 2.7. Data Transformation.
- 2.8. Min-Max Normalization.
- 2.9. Z-Score Standardization.
- 2.10. Decimal Scaling
- 2.11. Transformations to Achieve Normality.
- 2.12. Numerical Methods for Identifying Outliers.
- 2.13. Flag Variables.
- 2.14. Transforming Categorical Variables into Numerical Variables.
- 2.15. Binning Numerical Variables.
- 2.16. Reclassifying Categorical Variables.
- 2.17. Adding an Index Field.
- 2.18. Removing Variables that are not Useful.
- 2.19. Variables that Should Probably not be Removed.
- 2.20. Removal of Duplicate Records.
- 2.21. A Word About ID Fields.
- Study Guide 2.
CHAPTER 3. EXPLORATORY DATA ANALYSIS.
- 3.1. Hypothesis Testing Versus Exploratory Data Analysis.
- 3.2. Getting to Know the Data Set.
- 3.3. Exploring Categorical Variables.
- 3.4. Exploring Numerical Variables.
- 3.5. Exploring Multivariate Relationships.
- 3.6. Selecting Interesting Subsets of the Data for Further Investigation.
- 3.7. Using EDA to uncover Anomalous Fields.
- 3.8. Binning Based on Predictive Value.
- 3.9. Deriving New Variables: Flag Variables.
- 3.10. Deriving New Variables: Numerical Variables.
- 3.11. Using EDA to Investigate Correlated Predictor Variables.
- 3.12. Summary of Our EDA.
- Study Guide 3.
CHAPTER 4. DIMENSION REDUCTION METHODS.
- 4.1. Need for Dimension-Reduction in Data Mining.
- 4.2. Principal Component Analysis.
- 4.3. Applying PCA to the Houses Data Set.
- 4.4. How Many Components Should We Extract?
- 4.5. Profiling the Principal Components.
- 4.6. Communalities.
- 4.7. Validation of the Principal Components.
- 4.8. Factor Analysis.
- 4.9. Applying Factor Analysis to the Adult Data Set.
- 4.10. Factor Rotation.
- 4.11. User-Defined Composites.
- 4.12. An Example of a User-Defined Composite.
- Study Guide 4.
ADDITIONAL PARTS OF THE BOOK
- Part II. Statistical Analysis. (5 Chapters)
- Part III. Classification. (9 Chapters)
- Part IV. Clustering. (4 Chapters)
- Part V. Association Rules. (1 Chapter)
- Part VI. Enhancing Model Performance. (3 Chapters)
- Part VII. Further Topics. (2 Chapters)
- Part VIII. Case Study: Predicting Response to Direct-Mail Marketing. (4 Chapters)
- Appendix A. Data Summarization and Visualization.
Return to Bob Gardner's home page