Best practices in data cleaning : a complete guide to everything you need to do before and after collecting your data /

"Many researchers jump straight from data collection to data analysis without realizing how analyses and hypothesis tests can go profoundly wrong without clean data. This book provides a clear, step-by-step process to examining and cleaning data in order to decrease error rates and increase bot...

Full description

Bibliographic Details
Main Author:	Osborne, Jason W.
Format:	Book
Language:	English
Published:	Thousand Oaks, Calif. : SAGE, [2013]
Subjects:	Quantitative research. Social sciences > Methodology.

Table of Contents:

Machine generated contents note: ch. 1 Why Data Cleaning Is Important: Debunking the Myth of Robustness
Origins of Data Cleaning
Are Things Really That Bad?
Why Care About Testing Assumptions and Cleaning Data?
How Can This State of Affairs Be True?
The Best Practices Orientation of This Book
Data Cleaning Is a Simple Process; However...
One Path to Solving the Problem
For Further Enrichment
SECTION I BEST PRACTICES AS YOU PREPARE FOR DATA COLLECTION
ch. 2 Power and Planning for Data Collection: Debunking the Myth of Adequate Power
Power and Best Practices in Statistical Analysis of Data
How Null-Hypothesis Statistical Testing Relates to Power
What Do Statistical Tests Tell Us?
How Does Power Relate to Error Rates?
Low Power and Type I Error Rates in a Literature
How to Calculate Power
The Effect of Power on the Replicability of Study Results
Can Data Cleaning Fix These Sampling Problems?
Conclusions
For Further Enrichment
Appendix
ch. 3 Being True to the Target Population: Debunking the Myth of Representativeness
Sampling Theory and Generalizability
Aggregation or Omission Errors
Including Irrelevant Groups
Nonresponse and Generalizability
Consent Procedures and Sampling Bias
Generalizability of Internet Surveys
Restriction of Range
Extreme Groups Analysis
Conclusion
For Further Enrichment
ch. 4 Using Large Data Sets With Probability Sampling Frameworks: Debunking the Myth of Equality
What Types of Studies Use Complex Sampling?
Why Does Complex Sampling Matter?
Best Practices in Accounting for Complex Sampling
Does It Really Make a Difference in the Results?
So What Does All This Mean?
For Further Enrichment
SECTION II BEST PRACTICES IN DATA CLEANING AND SCREENING
ch. 5 Screening Your Data for Potential Problems: Debunking the Myth of Perfect Data
The Language of Describing Distributions
Testing Whether Your Data Are Normally Distributed
Conclusions
For Further Enrichment
Appendix
ch. 6 Dealing With Missing or Incomplete Data: Debunking the Myth of Emptiness
What Is Missing or Incomplete Data?
Categories of Missingness
What Do We Do With Missing Data?
The Effects of Listwise Deletion
The Detrimental Effects of Mean Substitution
The Effects of Strong and Weak Imputation of Values
Multiple Imputation: A Modern Method of Missing Data Estimation
Missingness Can Be an Interesting Variable in and of Itself
Summing Up: What Are Best Practices?
For Further Enrichment
Appendixes
ch. 7 Extreme and Influential Data Points: Debunking the Myth of Equality
What Are Extreme Scores?
How Extreme Values Affect Statistical Analyses
What Causes Extreme Scores?
Extreme Scores as a Potential Focus of Inquiry
Identification of Extreme Scores
Why Remove Extreme Scores?
Effect of Extreme Scores on Inferential Statistics
Effect of Extreme Scores on Correlations and Regression
Effect of Extreme Scores on t-Tests and ANOVAs
To Remove or Not to Remove?
For Further Enrichment
ch. 8 Improving the Normality of Variables Through Box-Cox Transformation: Debunking the Myth of Distributional Irrelevance
Why Do We Need Data Transformations?
When a Variable Violates the Assumption of Normality
Traditional Data Transformations for Improving Normality
Application and Efficacy of Box-Cox Transformations
Reversing Transformations
Conclusion
For Further Enrichment
Appendix
ch. 9 Does Reliability Matter? Debunking the Myth of Perfect Measurement
What Is a Reasonable Level of Reliability?
Reliability and Simple Correlation or Regression
Reliability and Partial Correlations
Reliability and Multiple Regression
Reliability and Interactions in Multiple Regression
Protecting Against Overcorrecting During Disattenuation
Other Solutions to the Issue of Measurement Error
What If We Had Error-Free Measurement?
An Example From My Research
Does Reliability Influence Other Analyses?
The Argument That Poor Reliability Is Not That Important
Conclusions and Best Practices
For Further Enrichment
SECTION III ADVANCED TOPICS IN DATA CLEANING
ch. 10 Random Responding, Motivated Misresponding, and Response Sets: Debunking the Myth of the Motivated Participant
What Is a Response Set?
Common Types of Response Sets
Is Random Responding Truly Random?
Detecting Random Responding in Your Research
Does Random Responding Cause Serious Problems With Research?
Example of the Effects of Random Responding
Are Random Responders Truly Random Responders?
Summary
Best Practices Regarding Random Responding
Magnitude of the Problem
For Further Enrichment
ch. 11 Why Dichotomizing Continuous Variables Is Rarely a Good Practice: Debunking the Myth of Categorization
What Is Dichotomization and Why Does It Exist?
How Widespread Is This Practice?
Why Do Researchers Use Dichotomization?
Are Analyses With Dichotomous Variables Easier to Interpret?
Are Analyses With Dichotomous Variables Easier to Compute?
Are Dichotomous Variables More Reliable?
Other Drawbacks of Dichotomization
For Further Enrichment
ch. 12 The Special Challenge of Cleaning Repeated Measures Data: Lots of Pits in Which to Fall
Treat All Time Points Equally
What to Do With Extreme Scores?
Missing Data
Summary
ch. 13 Now That the Myths Are Debunked...: Visions of Rational Quantitative Methodology for the 21st Century.