Crowdsourcing in cultural heritage — a form of digitally-enabled participation that promises deeper, more engaged relationships with the public via meaningful tasks with cultural heritage collections — is moving from an experimental activity to something more embedded within institutional priorities.
As more cultural institutions ask members of the public to undertake tasks such as transcription, classification, and description of digitized collections, expectations from the public are rising about the quality of their experience, and about how the results will be used. We argue that crowdsourcing in cultural heritage should create benefits for organizers and participants. It is not merely a productive framework for enhancing cultural heritage collections, but also fundamentally a form of public engagement with collections and knowledge with the potential to support transformative encounters between institutions and their global publics.
This book is written for crowdsourcing practitioners who work in cultural institutions, as well as those who wish to gain experience with crowdsourcing. It provides both practical tips, grounded in lessons often learned the hard way, and inspiration from research across a range of disciplines. Case studies and perspectives based on our experience are woven throughout the book, complemented by information drawn from research literature and practice within the field. To supplement our voices, we have included quotes from small surveys we ran for project stakeholders and volunteers before the Book Sprint.1
It is also written to honor the work of previous projects and teams. This book is in part a response to the invisibility of documentation and reflection by practitioners about their work in formats legible to other researchers. Conferences for practitioners rarely ask for more than an abstract in advance, so knowledge is held in individual memories, presentations, and blog posts rather than in formal publications.
Many different versions of this book are possible, with different contributors and different examples to draw from. We are aware that while it represents the collective output of a range of people chosen for their expertise in crowdsourcing, citizen history and citizen science, work embedded in academic and cultural heritage institutions, and grounded in community projects, many further perspectives remain to be drawn in. We know that many readers will have their own expertise as project organizers, stakeholders, funders, and participants to add to what we share here. We encourage you to share your comments, examples, counter-examples, and perspectives via our website.
This book was written at a particular moment in time — March to April 2021. As all documents do, it reflects the time in which it was written. In our case, it was written entirely remotely, and at a time when museums, libraries, archives, academic institutions, and the societies they are embedded in were examining how art, historical, and scientific collections spoke to wider issues. While some details will change — particularly technologies and perhaps institutional practices — the basic principles are more likely to be stable over time.
Crowdsourcing has helped to provide a framework for online participation with, and around, cultural heritage collections for over a decade. It will continue to change and we look forward to seeing where that journey takes us.
The structure of this book is designed around the key stages of planning, implementing, and running a crowdsourcing project. Threads woven through that structure include understanding and matching activities to participant motivations, articulating and embedding your values, planning for the eventual uses of the data, and recognizing the institutional, social, and technological context in which you are working. Each chapter contains perspectives and case studies that summarize or exemplify the ideas discussed in the text. These are listed in the overview below for reference.
In “What is crowdsourcing in cultural heritage?” we share our hope that a crowdsourcing project can create joy, inspire curiosity, foster new understanding, encourage wider social connections, promote learning, and help people cultivate new skills and expertise — while also producing results that improve cultural heritage collections and knowledge. To help illustrate those possibilities, this chapter offers an introduction that lays the groundwork for other chapters. We define and discuss where crowdsourcing in cultural heritage sits in relation to other fields. We provide an overview of key concepts, including types of crowdsourcing projects, and some of the complexities you will encounter.
“Why work with crowdsourcing in cultural heritage?” outlines the mutual benefits of this method for project teams and participants in working towards shared goals. It considers how power operates and whose voices are heard in the process, and the need to plan to reduce barriers to participation and increase inclusion and accessibility. This chapter also addresses situations in which crowdsourcing is not the answer, a theme picked up elsewhere in this book.
Included in this chapter:
Perspective: collective projects for unremembered histories of the Colored Conventions
Case study: In the Spotlight participant motivations and demographics
Perspective: designing for accessibility on FromThePage
Case study: self-organized Wikisource project for the staff at the National Library of Scotland
Perspective: the “virtuous circle” of sharing participant comments at the British Library
In “Identifying, aligning, and enacting values in your project,” we discuss how establishing values can empower decision-makers and others in upholding shared principles. We consider what happens if you do not consciously establish values, how to set shared values in dialog with project goals and objectives, and what they look like in practice. We close with a discussion of productive tensions, then reflect on how values were discussed in the process of writing this book.
Included in this chapter:
Case study: integrating values in the British Library’s In the Spotlight
Case study: communicating values in the Lone Rock Convict Stockade Project
Case study: communicating and integrating values in action: Wikimedia community
Case study: generative tensions in DigVentures and the Unloved Heritage? project
In “Designing cultural heritage crowdsourcing projects,” we provide a high-level guide to planning projects that helps them meet broader values and strategic objectives. It includes questions and prompts designed to elicit clarity and anticipate practical and resourcing issues relevant to your particular context.
Included in this chapter:
Case study: the role of content specialists on the Great War Archive
Case study: the role of communications specialist on Lockdown 2020
Case study: explore crowdsourcing with the Zooniverse Project Builder
Case study: building custom projects on the Zooniverse platform
Case study: tool customization at the Newberry Library
Case study: beta testing to support iterative design at the Boston Public Library
Case study: pilot testing and scaling up in the Davy Notebooks Project
“Understanding and connecting to participant motivations” highlights the reasons that people start and continue working with crowdsourcing projects, and how your project can link motivations to design choices and concrete activities. It also discusses issues around non-voluntary participation, publishing statistics, and enabling collective competition.
Included in this chapter:
Perspective: a matrix of motivations at the Smithsonian Transcription Center
Perspective: recruiting hundreds-at-a-time, rather than one-at-a-time, for WeDigFLPlants
Case study: bringing motivations and activities together
Case study: suffrage campaigns at the Library of Congress
Case study: collective goals for the community through WikiLovesMonuments
Case study: purposeful gaming and the Biodiversity Heritage Library project
”Aligning tasks, platforms, and goals” helps you map the components of your project onto available tools and platforms for crowdsourcing. We offer framing questions to help you focus on the platform features that will be essential for your project, discuss collecting, analyzing, and reviewing tasks, front- and back end interfaces, and the role of pipelines or workflows in building your project.
Included in this chapter:
Case study: design principles: Concordia and By the People
Perspective: requirements for using the Zooniverse Project Builder
Perspective: the DigVentures collaborative platform for archaeology
Following that, “Choosing tasks and workflows” guides you through designing effective tasks, emphasizing the work of finding a match between your participants, tasks, data, and goals. We note the role of learning and developing skills and the benefits of interaction around tasks. We look at collecting, analyzing, and reviewing tasks in more detail, consider workflow design, and working with different languages. We consider task size and selection, the use of AI, and how you could combine the unique abilities of both humans and computers.
Included in this chapter:
Case study: discussion notes on FromThePage
Case study: hashing it out with the Smithsonian Transcription Center
Case study: uploading and mapping photos with Historypin
Case study: human and computer recognition at ArtUK Tagger
Case study: identifying unknown photos with crowdsourcing and AI at the Civil War Photo Sleuth
Case study: translating content on Wikimedia
Case study: transliterating Arabic script on the Jawi Transcription Project
“Supporting participants” helps you plan and deliver projects with transformative benefits to all parties in a crowdsourcing project. We focus particularly on how you can care and advocate for the participants’ experience when planning invitations and outreach activities. It discusses common questions from participants, issues around working with students, creating spaces for discussion and connection, and introduces the 5Cs — care, convening, connection, communication, continuation — for supporting participants.
Included in this chapter:
Case study: providing hooks and excuses to celebrate with mini-projects
Case study: crowdsourcing in the classroom, student learning in the History Unfolded project
Case study: prototyping social interactions with paid crowds in Second Opinion
Perspective: community spaces as sources of spontaneous discovery
In “Working with crowdsourced data” we focus on planning to use the data that results from your project. We provide a framework for thinking about data management, including where it intersects with project values, legal requirements, and data ethics. We discuss data management and lifecycles, then delve into quality control and methods for evaluating and validating crowdsourced data. Finally, we discuss processing, accessing, and re-using data.
Included in this chapter:
Case study: making Zooniverse data more easily re-usable with ALICE
Case study: working with digital data in archaeological projects: DigVentures’ Born Digital project
Case study: Wikidata contributor community agreeing on data standards
Case study: automation for optimizing volunteer effort
Case study: testing the influence of Zooniverse platform-tools on the quality of transcription results
Case study: alternatives to aggregation in the Mutual Muses project
Case study: principles around access to the Colored Conventions Project corpus
Case study: using public domain crowdsourced data to create the Newspaper Navigator application
In “Managing cultural heritage crowdsourcing projects” we look specifically at the challenges of managing digital heritage crowdsourcing projects once they have gone “live.” This chapter will guide the reader through the considerations, planning, and strategic decisions needed to make their crowdsourcing project a success throughout its lifecycle. It includes organizational issues grounded in practice, including questions frequently asked by colleagues. This practical information is supported with examples showing how others have successfully overcome these challenges.
Included in this chapter:
Perspective: finding the right people: DigVentures’ Culture Deck
Perspective: building coalitions at the Smithsonian Institution and Library of Congress
Perspective: responding to participant feedback on the Zooniverse platform
People are the alchemy that creates a project. In “Connecting with communities,” we provide practical tips and proven strategies for organizing and engaging a community around a crowdsourcing project. We discuss sustainable and equitable community building and management and look at the role of governance, different types of partnerships, and building spaces for the community. We look at inclusive outreach and organizing at and beyond Predominantly White Institutions.
Included in this chapter:
Case study: community norms in the Wikimedia movement
Case study: governance and the Wikimedia movement
Case study: forging reciprocal partnerships for Douglass Day
Case study: fostering a community of practice around Douglass Day
Case study: cultural competency and Douglass Day
Case study: community responses to gender imbalances on Wikipedia
Online and in-person events can deliver, enhance, or complement other crowdsourcing activities. In “Planning crowdsourcing events,” we provide insights into reasons for running events and list common event formats that can create specific types of experiences, communities, or data results. We describe partnerships and distributed organizing, then close with practical tips for event planners.
Included in this chapter:
Case study: distributed transcribeathons with Early Modern Recipes Online Collective (EMROC)
Case study: online or offline: Wikimedia’s approach
Case study: Image du Monde challenge/La Sfera challenge
Case study: widening participation through events: Europeana community collections
Case study: distributed collecting events: Lest We Forget
Case study: expanding opportunities for participation with Making History: Transcribe
In our final chapter, “Evaluating your crowdsourcing project,” we share the benefits of evaluating your projects at different points, encourage you to define success and to evaluate as a purposeful exercise, and to be ready to respond to discoveries. We discuss planning, reporting, and the role of successful failures.
Included in this chapter:
Case study: valuing contributor feedback
Case study: an approach to the theory of change and evaluation at Wikimedia UK
This book is an important outcome of the Collective Wisdom2 project, which aimed to produce an open access book that provides the definitive guide to designing, managing, and integrating crowdsourcing activities for cultural heritage collections. Itself an example of “collective wisdom,” this book was written collaboratively with participants selected from responses to an open call for co-authors to share their knowledge and expertise. Like other online participatory projects, our contributions were structured by a framework — in this case, the Book Sprint3 format — that focused our efforts on specific tasks as needed throughout our two weeks of writing.
The Collective Wisdom project is led by Principal Investigator Dr. Mia Ridge (British Library) and Co-Investigators Dr. Meghan Ferriter (Library of Congress) and Dr. Samantha Blickhan (Zooniverse).
Our overarching goals are to:
Foster an international community of practice in crowdsourcing in cultural heritage
Capture and disseminate the state of the art and promote knowledge exchange in crowdsourcing and digitally-enabled participation
Set a research agenda and generate shared understandings of unsolved or tricky problems that could lead to future funding applications
Our funded project concludes with a workshop and final white paper,4 which will also be posted on our Collective Wisdom website. The workshop aims to identify gaps and emerging challenges that could be addressed by future collaborations. We will put additional material on our website throughout 2021. If there is sufficient interest, we may also produce a second edition of this book, updated in response to readers’ feedback.
These goals have shaped how we approached the writing and publication of this book. We welcome comments on this version of the book — particularly notes on additional projects or research we could include to expand our coverage — and on the future of crowdsourcing in cultural heritage or other issues relevant to our workshop via our Collective Wisdom Contact page.5
The importance of digital cultural heritage in its many forms has been underscored during the COVID-19 pandemic. While medics and carers, virologists and epidemiologists, and logistics experts worked towards saving lives, the arts and cultural heritage provided an escape from immediate and general trauma, the boredom of furlough, and the temptation to doomscroll. Many were happy to enjoy cultural experiences passively, but a large number of people wanted to engage actively; crowdsourcing appeared made for the moment. Evidence of this enthusiasm is clear in the numbers of people engaging in crowdsourcing projects — for example in FromThePage6 and Zooniverse, which saw an increase of 2-5 times the volume of classifications compared to 20197 — and in the increased interest in crowdsourcing within academic and cultural heritage institutions, as well as the media.
The pandemic also had a direct impact on the process of writing this book. It was originally intended as an in-person event, to be co-hosted at the Peale Center for Baltimore History and Architecture in Baltimore in April 2020. In response to our shared constraints, it became an entirely online process held over two five-day weeks in March and April 2021. We used Zoom for video calls, Mural for ideation and planning via sticky notes, Google Docs for writing, Zotero for references, and Slack for chat to collaborate on and discuss shared texts.
Listed alphabetically by first name below, the authors of this book have provided some information about their perspective, experience, and inspiration for their work in this field.
Austin Mast is Director of Florida State University’s Robert K. Godfrey Herbarium, a collection of plant specimens collected over the past 150 years. He focuses on the creation and use of digital data by the world’s natural heritage collections, where an estimated 3 billion plants on sheets, insects on pins, fossils in drawers, fish in jars, and other collections are curated. He dove into crowdsourcing in 2011 when he recognized the opportunities it offered his domain to simultaneously advance digital data creation, science literacy, and the establishment of new relationships between the collections and their stakeholders. He now serves as Director of the Digitization, Workforce Development, and Citizen Science Domain at iDigBio, the US National Science Foundation’s National Resource for Advancing Digitization of Biodiversity Collections.
Ben Brumfield is a partner at FromThePage, a collaborative platform for transcribing, translating, and indexing manuscripts. After early experiences editing Wikipedia and Pepys Diary Online, he was inspired to build crowdsourcing software to transcribe a set of family diaries in 2005 and released it as an open-source tool in 2009. FromThePage has since been adopted by libraries, archives, and researchers for material ranging from financial records to Aztec codices. He writes about crowdsourcing and textual encoding on the FromThePage project blog.8
Brendon Wilkins is co-founder and projects director of DigVentures, a collaborative platform enabling civic participation in archaeology and heritage projects. He has been a professional archaeologist for 25 years, and ‘digging the crowd’ with DigVentures for the last ten of those, pioneering a model of crowdfunded and crowdsourced archaeology that has been implemented on 40 archaeological projects in the UK, Europe, and the US, raising approximately £1.5M for excavation through crowdfunding and matched grant funding, supported annually by over 1,000 participants. His Ph.D. is based at the University of Leicester School of Museum Studies, entitled ‘Digging the Crowd: the future for archaeology in the digital and collaborative economy.’
Daria Cybulska has worked at Wikimedia UK since 2012, initially focusing on events, outreach, partnerships, and community development, and now serving as the Director of Programmes and Evaluation. Wikimedia UK is an independent UK-based charity focusing on programs and advocacy for open knowledge but is a part of the global Wikimedia movement. Daria was initially drawn to Wikimedia because of its open, start-up character, and the beauty of people building on each others’ contributions. Since then she has developed programs to Increase Knowledge Equity via bringing diverse people and content onto Wikimedia projects. In the last few years, she collaborated on Wikimedia’s 2030 global strategy with colleagues across the globe, a process that brought many reflections about accessibility, inclusion, and equity within collaborative knowledge projects.
Denise Burgher is a Senior Team leader at the Colored Conventions Project — an award-winning Black DH project at the Center for Black Digital Research at Penn State University. As the Chair of Community Engagement and Historic churches and the Co-chair of the Curriculum Committee, Denise has spent the majority of her time working on several issues which this text explores. Brought on initially as a community organizer to create the successful strategy for the CCP’s “Transcribe Minutes” crowdsourcing initiative, she now helps lead Douglass Day alongside curriculum development work that invites multiple classroom teachers and students into the world of transcribing!
Jim Casey is the managing director of the Center for Black Digital Research and an assistant professor of African American Studies, History, and English at Pennsylvania State University. While completing a Ph.D. in English at the University of Delaware, he co-founded the Colored Conventions Project and its “Transcribe Minutes.” In 2017, he launched the team behind Douglass Day, an annual transcribeathon that partners with library and community organizations for collective actions that generate new digital resources for the study of Black history and culture each year.
Kurt Luther is an associate professor of computer science and, by courtesy, history, at Virginia Tech, where he directs the Crowd Intelligence Lab. For his Ph.D. in human-centered computing at Georgia Tech, he built Pipeline, a software tool for leading crowdsourced animation projects. During his postdoc at Carnegie Mellon University’s Human-Computer Interaction Institute, he built CrowdCrit, a tool for designers to crowdsource feedback on their works-in-progress. At Virginia Tech, he has served as PI or Co-PI for several crowdsourced history projects, including the NSF-funded Civil War Photo Sleuth, a free website that combines crowdsourcing and face recognition to identify unknown soldiers in American Civil War-era photos.
Meghan Ferriter found cultural heritage crowdsourcing while researching with Smithsonian Institution Archives to determine how archives and museums could engage with audiences in digital knowledge repositories. In 2013, she helped launch the Smithsonian Transcription Center and served as its first Project Coordinator. She then joined the LC Labs team in 2017 to design and lead the implementation of By the People at the Library of Congress. Meghan blends a lifetime of coaching and teaching with an endless fascination with how people share information and learn together through media and other technologies, which she explored in her Ph.D. She remains inspired by the ways crowdsourcing transforms participants and organizations; and models her work on the generosity shown to her by cultural heritage and humanities colleagues and peers.
Mia Ridge came to crowdsourcing through her experience at the State Library of Victoria and Melbourne Museum, where online access and physical outreach (‘museum in a van’) were in the same team. Her fascination with the ability of crowdsourcing to encourage closer engagement with cultural heritage collections was deepened during her Museum Metadata Games (2010-11) MSc. research project. Her work on crowdsourcing in the Digital Scholarship team at the British Library and as a Co-Investigator on Living with Machines draws on her Ph.D. research and background in UX (user experience design and research).
Michael Haley Goldman directs Future Projects for the United States Holocaust Memorial Museum where he helped develop the Citizen History projects Children of the Lodz Ghetto Research Project and History Unfolded: US Newspapers and the Holocaust. These projects and other experiments at the Museum have pursued improved techniques for shaping cultural crowdsourcing projects as environments for learning Holocaust history.
Nick White is the Web Developer/Librarian at the Newberry Library, where he develops and maintains the web architecture for the Newberry’s Digital Initiatives. Since 2019 he has been developing the Newberry’s crowdsourced transcription platform, which has so far produced over 65,000 pages of transcribed letters, diaries, journals, and other handwritten items from the Library’s collection. He has worked on numerous other large-scale data aggregation and normalization projects, from OCR to hand-keyed metadata records representing hundreds of thousands of items, as well as developing digital collections and their web access points for the Chicago Public Library.
Pip Willcox works with cultural heritage collections and digital technologies, and the people who use them. Her interest in crowdsourcing practice and theory is in enabling people to engage inclusively with library and archive collections. Previously she worked on the EPSRC-funded project, SOCIAM: The Theory and Practice of Social Machines, and directed the Digital Humanities at Oxford Summer School where a crowdsourcing workshop has been taught for many years by Zooniverse colleagues. She is currently working on the AHRC-funded Towards a National Collection project, Engaging Crowds: Citizen research and heritage data at scale, and is Head of Research at The National Archives (UK).
Samantha Blickhan is the Humanities Lead for Zooniverse and Co-Director of the Zooniverse team at the Adler Planetarium in Chicago, where she guides the strategic vision for Humanities efforts on the platform, collaborates with external teams to create projects, and manages the development of new tools and resources. Her interest in crowdsourcing began in grad school when she volunteered on Seafloor Explorer during breaks from writing her doctoral thesis on the paleography and notation of medieval song.
Sara Brumfield is a partner at FromThePage, where she builds software and helps state and national archives, research groups, public libraries, and universities run crowdsourcing projects.
Sonya Coleman is the Digital Engagement and Social Media Coordinator at the Library of Virginia, the state library and archives located in Richmond, Virginia. In addition to developing digital collections and online exhibits, she has helped start and run the Making History: Transcribe manuscript crowdsourcing project since 2014. Crowdsourcing and open data projects continue to expand at the library. Sonya is happiest when working directly with the crowd and continues to run transcribeathons each month.
Ylva Berglund-Prytz unwittingly ran her first crowdsourcing project in 1999, creating a language corpus of texts shared by students at Uppsala University, Sweden. Since then she has been involved in many crowdsourcing initiatives, often using a combination of online and offline approaches to create digital collections of material shared by members of the public. She is based at the University of Oxford where she manages the RunCoCo service, offering advice, training, and support to those looking to use crowdsourcing to create collections and encourage engagement with cultural heritage.
We are grateful to the Arts and Humanities Research Council for their funding of Ridge, Blickhan and Ferriter’s proposal “From crowdsourcing to digitally-enabled participation: the state of the art in collaboration, access, and inclusion for cultural heritage institutions” (AH/T013052/1) through an AHRC UK-US Partnership Development Grant.9 This funding made it possible to support all the logistics of hosting an event and to draw on the expert services of the Book Sprint company.
We would like to thank our family, friends, and colleagues who supported us in the lead-up to and throughout the Book Sprint. We are further grateful to the wider, distributed community of researchers, practitioners, organizers, and participants working in crowdsourcing, citizen history, and citizen science, volunteering, online engagement and participation who have shared their experiences in various forms over the past years. We would also like to thank the “behind the scenes” teams, from caterers to catalogers to camera and scanner operators, who make crowdsourcing in cultural heritage possible. From the Book Sprints team, we would like to thank Faith Bosworth and Karina Piersig for their facilitation of the sprint, Christine Davis and Raewyn Whyte for copy-editing, Henrik van Leeuwen and Lennart Wolfert for illustrations, and Agathe Baëz for HTML book design.