Setting the pace : what is bad data? -- Is it just me, or does this data smell funny? -- Data intended for human consumption, not machine consumption -- Bad data lurking in plain text -- (Re)organizing the web's data -- Detecting liars and the confused in contradictory online reviews -- Will the bad data please stand up? -- Blood, sweat, and urine -- When data and reality don't match -- Subtle sources of bias and error -- Don't let the perfect be the enemy of the good : is bad data really bad? -- When databases attack : a guide for when to stick to files -- Crouching table, hidden network -- Myths of cloud computing -- The dark side of data science -- How to feed and care for your machine-learning experts -- Data traceability -- Social media : erasable ink? -- Data quality analysis demystified : knowing when your data is good enough
0
SUMMARY OR ABSTRACT
Text of Note
This practical handbook takes the reader through several real-world examples to demonstrate the theory and practice behind working with and cleaning up dirty data. As no single tool solves all of the problems well a polyglot approach is taken, with most examples involving R and Python, but sed/awk utilities also appearing
ACQUISITION INFORMATION NOTE
Source for Acquisition/Subscription Address
Oreilly & Associates Inc, C/O Ingram Pub Services 1 Ingram Blvd, LA Vergne, TN, USA, 37086
TOPICAL NAME USED AS SUBJECT
Data editing
Database management, Handbooks, manuals, etc
Databases-- Quality control
Electronic data processing, Handbooks, manuals, etc