Synthesis lectures on information concepts, retrieval, and services,
#66
1947-9468 ;
Title from PDF title page (viewed on June 4, 2019).
Includes bibliographical references (pages 105-127).
1. Introduction -- 1.1. Human computers -- 1.2. Basic concepts -- 1.3. Examples -- 1.4. Some generic observations -- 1.5. A note on platforms -- 1.6. The importance of labels -- 1.7. Scope and structure
3. Quality assurance -- 3.1. Quality framework -- 3.2. Quality control overview -- 3.3. Recommendations from platforms -- 3.4. Worker qualification -- 3.5. Reliability and validity -- 3.6. Hit debugging -- 3.7. Summary
4. Algorithms and techniques for quality control -- 4.1. Framework -- 4.2. Voting -- 4.3. Attention monitoring -- 4.4. Honey pots -- 4.5. Workers reviewing work -- 4.6. Justification -- 4.7. Aggregation methods -- 4.8. Behavioral data -- 4.9. Expertise and routing -- 4.10. Summary
5. The human side of human computation -- 5.1. Overview -- 5.2. Demographics -- 5.3. Incentives -- 5.4. Worker experience -- 5.5. Worker feedback -- 5.6. Legal and ethics -- 5.7. Summary
6. Putting all things together -- 6.1. The state of the practice -- 6.2. Wetware programming -- 6.3. Testing and debugging -- 6.4. Work quality control -- 6.5. Managing construction -- 6.6. Operational considerations -- 6.7. Summary of practices -- 6.8. Summary
7. Systems and data pipelines -- 7.1. Evaluation -- 7.2. Machine translation -- 7.3. Handwritting recognition and transcription -- 7.4. Taxonomy creation -- 7.5. Data analysis -- 7.6. News near-duplicate detection -- 7.7. Entity resolution -- 7.8. Classification -- 7.9. Image and speech -- 7.10. Information extraction -- 7.11. RABJ -- 7.12. Workflows -- 7.13. Summary
8. Looking ahead -- 8.1. Crowds and social networks -- 8.2. Interactive and real-time crowdsourcing -- 8.3. Programming languages -- 8.4. Databases and crowd-powered algorithms -- 8.5. Fairness, bias, and reproducibility -- 8.6. An incomplete list of requirements for infrastructure -- 8.7. Summary.
0
8
8
8
8
8
8
8
Many data-intensive applications that use machine learning or artificial intelligence techniques depend on humans providing the initial dataset, enabling algorithms to process the rest or for other humans to evaluate the performance of such algorithms. Not only can labeled data for training and evaluation be collected faster, cheaper, and easier than ever before, but we now see the emergence of hybrid human-machine software that combines computations performed by humans and machines in conjunction. There are, however, real-world practical issues with the adoption of human computation and crowdsourcing. Building systems and data processing pipelines that require crowd computing remains difficult. In this book, we present practical considerations for designing and implementing tasks that require the use of humans and machines in combination with the goal of producing high-quality labels.