Crowdsourcing and human computation enable organizations to accomplish tasks that are currently not possible for fully automated techniques to complete, or require more flexibility and scalability than traditional employment relationships can facilitate. In the area of data processing, companies have benefited from crowd workers on platforms such as Amazon's Mechanical Turk or Upwork to complete tasks as varied as content moderation, web content extraction, entity resolution, and video/audio/image processing.
Several academic researchers from diverse areas, ranging from the social sciences to computer science, have embraced crowdsourcing as a research area, resulting in algorithms and systems that improve crowd work quality, latency, and cost. Despite the relative nascence of the field, the academic and the practitioner communities have largely operated independently of each other for the past decade, rarely exchanging techniques and experiences.
Crowdsourced Data Management aims to narrow the gap between academics and practitioners. On the academic side, it summarizes the state of the art in crowd-powered algorithms and system design tailored to large-scale data processing. On the industry side, it surveys 13 industry users - such as Google, Facebook, and Microsoft - and four marketplace providers of crowd work - such as CrowdFlower and Upwork - to identify how hundreds of engineers and tens of million dollars are invested in various crowdsourcing solutions. It simultaneously introduces academics to real problems that practitioners encounter every day, and provides a survey of the state of the art for practitioners to incorporate into their designs.
Through the surveys, it also highlights the fact that crowdpowered data processing is a large and growing field. Over the next decade, most technical organizations are likely to benefit in some way from crowd work, and this monograph can help guide the effective adoption of crowdsourcing across these organizations.