NOTES PERTAINING TO TITLE AND STATEMENT OF RESPONSIBILITY
Text of Note
Stephane Tuffery
ORIGINAL VERSION NOTE
Text of Note
1
CONTENTS NOTE
Text of Note
Machine generated contents note: Preface -- Foreword -- Contents -- Overview of data mining -- 1.1. What is data mining? -- 1.2. What is data mining used for? -- 1.3. Data Mining and statistics -- 1.4. Data mining and information technology -- 1.5. Data mining and protection of personal data -- 1.6. Implementation of data mining -- The development of a data mining study -- 2.1. Defining the aims -- 2.2. Listing the existing data -- 2.3. Collecting the data -- 2.4. Exploring and preparing the data -- 2.5. Population segmentation -- 2.6. Drawing up and validating predictive models -- 2.7. Synthesizing predictive models of different segments -- 2.8. Iteration of the preceding steps -- 2.9. Deploying the models -- 2.01. Training the model users -- 2.11. Monitoring the models -- 2.21. Enriching the models -- 2.31. Remarks -- 2.41. Life cycle of a model -- 2.51. Costs of a pilot project -- Data exploration and preparation -- 3.1. The different types of data -- 3.2. Examining the distribution of variables -- 3.3. Detection of rare or missing values -- 3.4. Detection of aberrant values -- 3.5. Detection of extreme values -- 3.6. Tests of normality -- 3.7. Homoscedasticity and heteroscedasticity -- 3.8. Detection of the most discriminating variables -- 3.9. Transformation of variables -- 3.01. Choosing ranges of values of continuous variables -- 3.11. Creating new variables -- 3.21. Detecting interactions 98 -- 3.31. Automatic variable selection -- 3.41. Detection of collinearity -- 3.51. Sampling -- Using commercial data -- 4.1. Data used in commercial applications -- 4.2. Special data -- 4.3. Data used by business sector -- Statistical and data mining software -- 5.1. Types of data mining and statistical software -- 5.2. Essential characteristics of the software -- 5.3. The main software packages -- 5.4. Comparison of R, SAS and IBM SPSS -- 5.5. How to reduce processing time -- An outline of data mining methods -- 6.1. A note on terminology -- 6.2. Classification of the methods -- 6.3. Comparison of the methods -- 6.4. Using these methods in the business world -- Factor analysis -- 7.1. Principal component analysis -- 7.2. Variants of principal component analysis -- 7.3. Correspondence analysis -- 7.4. Multiple correspondence analysis -- Neural networks -- 8.1. General information on neural networks -- 8.2. Structure of a neural network -- 8.3. Choosing the training sample -- 8.4. Some empirical rules for network design -- 8.5. Data normalization -- 8.6. Learning algorithms -- 8.7. The main neural networks -- Automatic clustering methods -- 9.1. Definition of clustering -- 9.2. Applications of clustering -- 9.3. Complexity of clustering -- 9.4. Clustering structures -- 9.5. Some methodological considerations -- 9.6. Comparison of factor analysis and clustering -- 9.7. Intra-class and inter-class inertias -- 9.8. Measurements of clustering quality -- 9.9. Partitioning methods -- 9.01. Hierarchical ascending clustering -- 9.11. Hybrid clustering methods -- 9.21. Neural clustering -- 9.31. Clustering by aggregation of similarities -- 9.41. Clustering of numeric variables -- 9.51. Overview of clustering methods -- Finding associations -- 01.1. Principles -- 01.2. Using taxonomy -- 01.3. Using supplementary variables -- 01.4. Applications -- 01.5. Example of use -- Classification and prediction methods -- 11.1. Introduction -- 11.2. Inductive and transductive methods -- 11.3. Overview of classification and prediction methods -- 11.4. Classification by decision tree -- 11.5. Prediction by decision tree -- 11.6. Classification by discriminant analysis -- 11.7. Prediction by linear regression -- 11.8. Classification by logistic regression -- 11.9. Developments in logistic regression -- 11.01. Bayesian methods -- 11.11. Classification and prediction by neural networks -- 11.21. Classification by support vector machines )SVMs( -- 11.31. Prediction by genetic algorithms -- 11.41. Improving the performance of a predictive model -- 11.51. Bootstrapping and aggregation of models -- 11.61. Using classification and prediction methods -- An application of data mining: scoring -- 21.1. The different types of score -- 21.2. Using propensity scores and risk scores -- 21.3. Methodology -- 21.4. Implementing a strategic score -- 21.5. Implementing an operational score -- 21.6. The kinds of scoring solutions used in a business -- 21.7. An example of credit scoring )data preparation( -- 21.8. An example of credit scoring )modelling by logistic regression( -- 21.9. An example of credit scoring )modelling by DISQUAL discriminant analysis( -- 21.01. A brief history of credit scoring -- Factors for success in a data mining project -- 31.1. The subject -- 31.2. The people -- 31.3. The data -- 31.4. The IT systems -- 31.5. The business culture -- 31.6. Data mining: eight common misconceptions -- 31.7. Return on investment -- Text mining -- 41.1. Definition of text mining -- 41.2. Text sources used -- 41.3. Using text mining -- 41.4. Information retrieval -- 41.5. Information extraction -- 41.6. Multi-type data mining -- Web mining -- 51.1. The aims of web mining -- 51.2. Global analyses -- 51.3. Individual analyses -- 51.4. Personal analyses -- Appendix: Elements of statistics -- 61.1. A brief history -- 61.2. Elements of statistics -- 61.3. Statistical tables -- Further reading -- 71.1. Statistics and data analysis -- 71.2. Data mining and statistical learning -- 71.3. Text mining -- 71.4. Web mining -- 71.5. R software -- 71.6. SAS software -- 71.7. IBM SPSS software -- 71.8. Websites -- Index