Best practices in data cleaning : a complete guide to everything you need to do before and after collecting your data /
"Many researchers jump straight from data collection to data analysis without realizing how analyses and hypothesis tests can go profoundly wrong without clean data. This book provides a clear, step-by-step process to examining and cleaning data in order to decrease error rates and increase bot...
| Main Author: | |
|---|---|
| Format: | Book |
| Language: | English |
| Published: |
Thousand Oaks, Calif. :
SAGE,
[2013]
|
| Subjects: |
Table of Contents:
- Machine generated contents note: ch. 1 Why Data Cleaning Is Important: Debunking the Myth of Robustness
- Origins of Data Cleaning
- Are Things Really That Bad?
- Why Care About Testing Assumptions and Cleaning Data?
- How Can This State of Affairs Be True?
- The Best Practices Orientation of This Book
- Data Cleaning Is a Simple Process; However...
- One Path to Solving the Problem
- For Further Enrichment
- SECTION I BEST PRACTICES AS YOU PREPARE FOR DATA COLLECTION
- ch. 2 Power and Planning for Data Collection: Debunking the Myth of Adequate Power
- Power and Best Practices in Statistical Analysis of Data
- How Null-Hypothesis Statistical Testing Relates to Power
- What Do Statistical Tests Tell Us?
- How Does Power Relate to Error Rates?
- Low Power and Type I Error Rates in a Literature
- How to Calculate Power
- The Effect of Power on the Replicability of Study Results
- Can Data Cleaning Fix These Sampling Problems?
- Conclusions
- For Further Enrichment
- Appendix
- ch. 3 Being True to the Target Population: Debunking the Myth of Representativeness
- Sampling Theory and Generalizability
- Aggregation or Omission Errors
- Including Irrelevant Groups
- Nonresponse and Generalizability
- Consent Procedures and Sampling Bias
- Generalizability of Internet Surveys
- Restriction of Range
- Extreme Groups Analysis
- Conclusion
- For Further Enrichment
- ch. 4 Using Large Data Sets With Probability Sampling Frameworks: Debunking the Myth of Equality
- What Types of Studies Use Complex Sampling?
- Why Does Complex Sampling Matter?
- Best Practices in Accounting for Complex Sampling
- Does It Really Make a Difference in the Results?
- So What Does All This Mean?
- For Further Enrichment
- SECTION II BEST PRACTICES IN DATA CLEANING AND SCREENING
- ch. 5 Screening Your Data for Potential Problems: Debunking the Myth of Perfect Data
- The Language of Describing Distributions
- Testing Whether Your Data Are Normally Distributed
- Conclusions
- For Further Enrichment
- Appendix
- ch. 6 Dealing With Missing or Incomplete Data: Debunking the Myth of Emptiness
- What Is Missing or Incomplete Data?
- Categories of Missingness
- What Do We Do With Missing Data?
- The Effects of Listwise Deletion
- The Detrimental Effects of Mean Substitution
- The Effects of Strong and Weak Imputation of Values
- Multiple Imputation: A Modern Method of Missing Data Estimation
- Missingness Can Be an Interesting Variable in and of Itself
- Summing Up: What Are Best Practices?
- For Further Enrichment
- Appendixes
- ch. 7 Extreme and Influential Data Points: Debunking the Myth of Equality
- What Are Extreme Scores?
- How Extreme Values Affect Statistical Analyses
- What Causes Extreme Scores?
- Extreme Scores as a Potential Focus of Inquiry
- Identification of Extreme Scores
- Why Remove Extreme Scores?
- Effect of Extreme Scores on Inferential Statistics
- Effect of Extreme Scores on Correlations and Regression
- Effect of Extreme Scores on t-Tests and ANOVAs
- To Remove or Not to Remove?
- For Further Enrichment
- ch. 8 Improving the Normality of Variables Through Box-Cox Transformation: Debunking the Myth of Distributional Irrelevance
- Why Do We Need Data Transformations?
- When a Variable Violates the Assumption of Normality
- Traditional Data Transformations for Improving Normality
- Application and Efficacy of Box-Cox Transformations
- Reversing Transformations
- Conclusion
- For Further Enrichment
- Appendix
- ch. 9 Does Reliability Matter? Debunking the Myth of Perfect Measurement
- What Is a Reasonable Level of Reliability?
- Reliability and Simple Correlation or Regression
- Reliability and Partial Correlations
- Reliability and Multiple Regression
- Reliability and Interactions in Multiple Regression
- Protecting Against Overcorrecting During Disattenuation
- Other Solutions to the Issue of Measurement Error
- What If We Had Error-Free Measurement?
- An Example From My Research
- Does Reliability Influence Other Analyses?
- The Argument That Poor Reliability Is Not That Important
- Conclusions and Best Practices
- For Further Enrichment
- SECTION III ADVANCED TOPICS IN DATA CLEANING
- ch. 10 Random Responding, Motivated Misresponding, and Response Sets: Debunking the Myth of the Motivated Participant
- What Is a Response Set?
- Common Types of Response Sets
- Is Random Responding Truly Random?
- Detecting Random Responding in Your Research
- Does Random Responding Cause Serious Problems With Research?
- Example of the Effects of Random Responding
- Are Random Responders Truly Random Responders?
- Summary
- Best Practices Regarding Random Responding
- Magnitude of the Problem
- For Further Enrichment
- ch. 11 Why Dichotomizing Continuous Variables Is Rarely a Good Practice: Debunking the Myth of Categorization
- What Is Dichotomization and Why Does It Exist?
- How Widespread Is This Practice?
- Why Do Researchers Use Dichotomization?
- Are Analyses With Dichotomous Variables Easier to Interpret?
- Are Analyses With Dichotomous Variables Easier to Compute?
- Are Dichotomous Variables More Reliable?
- Other Drawbacks of Dichotomization
- For Further Enrichment
- ch. 12 The Special Challenge of Cleaning Repeated Measures Data: Lots of Pits in Which to Fall
- Treat All Time Points Equally
- What to Do With Extreme Scores?
- Missing Data
- Summary
- ch. 13 Now That the Myths Are Debunked...: Visions of Rational Quantitative Methodology for the 21st Century.