3. Why work with crowdsourcing in cultural heritage?

by Mia Ridge, Samantha Blickhan, Meghan Ferriter, Austin Mast, Ben Brumfield, Brendon Wilkins, Daria Cybulska, Denise Burgher, Jim Casey, Kurt Luther, Michael Haley Goldman, Nick White, Pip Willcox, Sara Carlstead Brumfield, Sonya J. Coleman, and Ylva Berglund Prytz

Published onApr 29, 2021

Why work with crowdsourcing in cultural heritage?

A fundamental question for crowdsourcing projects in cultural heritage is: what are the benefits of this approach, and how do they differ from other approaches such as commercial outsourcing and automated solutions? Crowdsourcing in cultural heritage draws on the efficiency and productivity of the broader field of crowdsourcing, but also has its roots in public participation in science, arts, and history, and is closely related to online volunteering and digitally-enabled public engagement. At its best, it is a form of digitally-enabled participation that creates meaningful opportunities for the public to experience collections while undertaking a range of tasks that make those collections more easily discoverable and usable by others. This book provides a range of examples of how this dual-purpose plays out in practice.

Mutual benefit — for participants, institutions, researchers, and communities — is fundamental to this definition. Participants deeply engage with their heritage and learn more about history, science, art, language, and an ever-expanding field of subjects. Institutions increase the reach of and access to their collections. Researchers leverage the passion and expertise of the public to accomplish tasks they could not tackle alone. Communities find new opportunities to share their perspective and stories.

As a methodology, crowdsourcing provides a context in which people outside an organization can actively engage with cultural heritage collections via carefully designed tasks and systems that create valuable tangible and intangible outcomes. Working within this framework means the capacity to draw on other practitioners, resources, and platforms for building participatory projects — no need to reinvent the wheel.

Crowdsourcing draws the interest of many institutions because it promises to improve discoverability and accessibility for collections through transcription, description, and other techniques. Projects have often lived up to this idea by providing digital content for large collections that would have been prohibitive to generate using traditional techniques. The Library of Congress’ By the People project, for example, started in 2018 and by 2021 had supported over 280,000 pages of transcription.1 By integrating the transcribed pages back into the Library’s online catalog, the material becomes searchable and can be read by accessibility technologies. Digital participation also ameliorates some of the issues faced by in-person volunteering projects such as physical access or preservation concerns. For example, the herbaria@home2 project began in 2005 to address the limited numbers of computers, specimens, and people that could work on sometimes-delicate physical specimen sheets simultaneously.

With crowdsourcing, the opportunities are huge. Never before has it been easier for groups of people to work together to tackle large-scale research challenges. Technologies ranging from collection digitization to image recognition to satellite imagery, combined with networked infrastructures and widespread access to them, have enabled people to share information, ideas, and insights, facilitating new ways of thinking about today’s problems and challenges through online collaboration. Combining these with the capacity of Artificial Intelligence to analyze large volumes of data, the potential to create a more complete picture of our world is hugely amplified, connecting people with their shared history, science, and culture. We have the opportunity, in these times of accelerated technological change and pronounced social upheaval, to deploy these approaches intentionally, creating human-centered projects that deliver benefits to all involved.

Why work with the public?

Crowdsourcing in cultural heritage should be considered a form of public engagement that contributes to the primary mission and/or research of projects and organizations. This, alone, is reason enough to invite the public to work with you.

Crowdsourcing also allows you to reach beyond the traditional physical limits of an organization, creating a huge opportunity to draw on expertise outside the institution. This ability to solve challenges with — rather than for — people is a mutually beneficial approach and can help to overcome a perceived relevance gap that often exists between cultural heritage or research institutions and their wider communities. Organizations and practitioners can benefit by including views and needs that they may otherwise neglect; communities can become empowered through equal involvement in work and decision making.

When done responsibly and ethically, crowdsourcing can generate or renew trust, and create fruitful relationships between institutions and user communities. By inviting “outsiders” to be part of the work of an organization, we demystify the processes and open them to both potential criticism and potential improvement. The actual work of institutions and staff are made visible to the public in new ways, which may spark debate and creativity. Especially in the case of hidden histories or previously unexplored subjects, crowdsourcing can operate as a way to augment the historical record and fill in knowledge and collection gaps. Institutions have often foregrounded certain histories; bringing a more inclusive and accurate understanding to collections can be an important step to correcting past wrongs.

“Why is this indexing project important? Because, for decades, the state (as did many other states after emancipation), unfairly targeted and imprisoned African Americans, then leased them out as forced labor to private businesses. The conditions in which they lived and how they were treated were horrendous. These records are important because they provide an opportunity for us to KNOW and LEARN these individuals’ names and begin to do the work to TELL THEIR STORIES. And possibly, to reconnect families (via descendants) that have been systematically ripped apart by this practice.”3

For more information about this project, see Case study: Lone Rock Convict Stockade Project in the “Identifying, aligning, and enacting values in your project” chapter.

Perspective: collective projects for unremembered histories of the Colored Conventions — Denise Burgher, Chair-Community Engagement and Historic churches, Co-chair Curriculum Team, Co-director Douglass Day — Colored Conventions Project

The Colored Conventions Project4 uses crowdsourcing for some of the reasons which have already been outlined in this section. As a Black Digital Humanities project which curates, transcribes and creates exhibits using texts that were created by Black intellectual activists in the nineteenth century, our work performs several interventions in historic as well as cultural discourses. Crowdsourcing allows us to recognize and extend some of the critical theories and ideas developed in the Colored Conventions Movement (CCM) archive into our current work. From the horizontal structure of our working teams — which emphasizes collective labor-thinking, planning and execution, to the types of documents we curate — minutes, proceedings and associated documents that record the history of the movement — we endeavor to respect and extend the work of the organizers and participants in the Colored Conventions Movement. The collective was the most important component of the CCM. Decisions were made collectively by vote after vigorous debate and discussion. These efforts meant that the documents which were produced deeply reflected the ideas, thoughts and labor of the people versus an individual or committee. The power of this type of collective labor which is fundamental to successful community building/organizing allowed a broad cross section of people to participate in the shaping of policies, institutions, petitions and collective public responses to the varied issues which the Black community faced locally, regionally and nationally. Choosing to create a crowdsourced project to transcribe our documents meant that we were modelling our labor on that of the CCM. We invited the crowd, specifically Black publics, into the process of thinking and working with us as we transcribed. It also allowed us to explicitly connect our current work to the original communities which created these texts. This created space for the descendants of the members of the CCM to learn, share, and make fully accessible for the future the very documents which were the foundation of the modern Civil Rights movement. These efforts allowed us to reveal what for most people is what we call unremembered history, which deepens the connections which descendant communities had and continue to have with the documents, the authors, and the enduring legacies of the ongoing fight for civil rights in the United States and the Diaspora.

Crowdsourcing can be useful in solving many different problems or tasks. It is amenable to any digital or digitized data. Tangible outputs include various forms of text, structured data, and metadata, collection items, increased knowledge, and research.5 Outputs are discussed further in the “What is crowdsourcing in cultural heritage?” chapter.

However, while the tangible results of the crowd’s work are important, they do not constitute the only outcome. Often, crowdsourcing projects bring together staff, volunteers, researchers, and other participants in new and exciting ways outside traditional power structures. We work toward a shared goal. We learn about each other, our interests and passions, our motivations and frustrations. Communities form and reform around projects and collections, drawing collaborators from different backgrounds who enrich projects with their own knowledge and experiences. Knowledge is shared between institutions and the communities who help create it.

Automated processes to transcribe or describe collection items are increasingly accurate for some tasks, but they cannot respond to collections with curiosity, expertise, and delight. For example, the comments that participants leave about the playbills and performances they encounter on the British Library’s In the Spotlight project help highlight interesting people, events, and other details that can be shared to make the collections more accessible to the general public. Other projects rely on people’s ability to recognize voices, faces, and concepts.6

Increasingly, human computation systems combine the particular abilities of humans with those of software, amplifying the value of individual contributions and pre-processing items to make the crowdsourced task more enjoyable and rewarding. Further discussion is available in the “Choosing tasks and workflows” chapter.

What counts as knowledge? Whose voices are heard?

We believe that without access to knowledge, we cannot build a common understanding of our heritage. Without diversity of content, this understanding is limited. A thoughtfully planned and designed project can help to create more complete information as well as a more informed society. We do this by supporting marginalized people to participate in and lead community efforts, and by uncovering and sharing knowledge created by and about underrepresented people and subjects. By engaging with marginalized people and enabling them to create knowledge relevant to them, we empower them and make our society more diverse.

This is central to the radical potential of crowdsourcing as a practice. Much of the world’s digital knowledge is contributed by only part of the world, and the views of that part are dominant online. As more people come online, addressing representation will be even more urgent.7 Otherwise, newcomers to the internet will not see their heritage represented in a way that is meaningful to them; yet another experience of a dominant culture framing things in a space that they have held for many years. Bringing new voices into heritage projects online may go some way in addressing this. A challenging, but very important potential benefit of engaging with voices not previously included in digital heritage is that it may push against and expand your organization’s notions of what counts as knowledge.

Who holds power in your project?

Engaging with the crowd can be incredibly impactful in terms of democratizing knowledge production and access to heritage. As other chapters discuss, designing participatory projects is an opportunity to consider how power and privilege function in the space you are creating. Key sections to refer to are “Organizing at and beyond Predominantly White Institutions” in the “Connecting with communities” chapter, and “Risks of replicating existing structures of power and privilege through your project” in the “Identifying, aligning, and enacting values in your project” chapter.

There is a lot of positive impact in addressing inequities through your project, but there are also potential harms in unintentionally replicating them. Therefore we invite you to consider how power plays out in your project:

Who decides the design, tasks, participants?
How much influence do participants have over the direction of the project?
Who decides the goals for the project?
Who will the results of the project benefit and how?
Who decides what role crowdsourcing plays for an institution?

Additionally, consider your project within a broader ecosystem of knowledge:

How to articulate the impact and reality of racist and colonizing practices in institutions such as galleries, libraries, archives, and museums in dialog with your project goals?
How can you design and use your project to help break the perpetuation of unhealthy legacy power dynamics?
How can you apply anti-racism work in the context of crowdsourced heritage projects?

Who is the crowd?

The crowd is not a homogeneous group. While crowdsourcing projects may be open to the public, the crowd does not always mean a large, anonymous group of individuals. It is worth taking some time to think about who your most likely participants are if you issue invitations using existing channels. Is the resulting crowd likely to represent the population as a whole, or will it reinforce systemic biases? As noted in the case study above, the lack of representation and diversity in contributors of Wikipedia editors (90% male, predominantly white, Western), for example, has created gaps in knowledge.8 9 Participant demographics and motivations are discussed further in other chapters.

Consider how your crowd will access your project. When facilitated online, many of the traditional barriers to engagement are removed — time and place, the size of tasks, and access to collections — only to be replaced with new and different barriers. Without giving thought to accessibility, inclusion, and how power plays out in your project, you run the risk of excluding people who would bring the richness of perspective and knowledge to your project. Not being intentional in this space increases the chance that you will replicate existing structures of power.

Crowdsourcing involves making choices about what to work on and how to work on it. Intentional decision-making can widen the scope of whose stories get told, who is interested in the project, and who can participate.

Understanding participant demographics

Who is in “the crowd?” Better understanding who your participants are will also help you think about their interests and motivations. Demographic information gives you some insight into your crowd and can challenge your assumptions about who takes part in a project. Some aspects depend on the project, with a Galaxy Zoo survey reporting in 2013 that over “80% of respondents to the gender question self-reported as male,”10 while nearly 70% of respondents to a 2014 ArtUK Your Paintings Tagger survey were women.11 65% of the Galaxy Zoo sample were from the US or UK; the rest were from 116 other countries or territories. Participants in both tended to be well educated; almost 70% of American Galaxy Zoo respondents aged over 25 had a bachelor’s degree or higher,12 and nearly 80% of Your Paintings Tagger respondents were educated to at least degree level.

The impact of the digital divide on demographics should also be considered. Many projects are designed for broadband internet access and laptops or desktop computers. If your target demographics are more likely to use mobile phones or tablets with limited data, you should carefully consider and test for this throughout the platform and task design process. When considering your participant demographics, it is key to consider the impact — even unintentional — of the design choices that you and your team have made, even down to the time of day or frequency with which new data is added to the project.13

Case study: In the Spotlight participant motivations and demographics

A small 2018 survey of In the Spotlight participants using some of the same questions as an earlier Zooniverse survey found that 77% of participants were female (5% preferred not to say), with the most common age groups being 36-41 or 60-65 years old (over 20% each). Over 50% had a Masters and/or Doctoral degree, and over half were in full-time employment, with some working in libraries, research, or information-management posts (and one actor!). Most heard about the project via social media or word of mouth and usually contributed when taking a quick break between tasks or in longer periods of spare time. Over 75% reported their main motivation as “contributing to historical or performance research.” Common answers to a question about why they might sometimes spend more time than they intended to on the project included “it’s so entertaining and/or easy to do that it’s very easy to get carried away.” One respondent said, “Every time an entry is completed you are presented with another item which is interesting and illuminating which provides a continuous temptation regarding what you might discover next.” Demotivating aspects included uncertainty about how to complete tasks correctly, tedium “when there is lots of missing data,” and being presented with duplicate titles to transcribe from different playbills.14

Potential barriers to participation

Understanding structural, technical, and other barriers to participation for different groups is important. It is all too easy to unintentionally exclude groups by failing to proactively address the questions and concerns they might have about contributing. Addressing the basic heuristics for the usability of digital resources is a key part in lowering the barriers to participation for all.15

Barriers such as the digital divide remain and can be addressed intentionally. Events during the COVID-19 pandemic have also highlighted all manner of inequalities, and without deliberate design, we risk adding to this exclusion, replicating historical problems of power and privilege in a new digital divide. We need to approach ethical crowdsourcing mindfully, as organizations and as practitioners, and seek to harness people’s amazing potential to create fairer organizations going forward. You can draw on local expertise to help you think about potential gaps in representation, and about ways in which your project might create unanticipated barriers or harms.

Anticipating and addressing potential barriers to participation might mean undertaking activities designed to reach specific demographic or other groups, discussed further in the “Connecting with communities” chapter, or evaluating interface and task designs and instructions with the needs of specific communities in mind, discussed further in the “Choosing tasks and workflows” chapter. Later chapters will include suggested approaches to ensure that your technical design and development process does not result in additional barriers, including Web accessibility standards,16 and ensuring participants have access to resulting data (see the “Working with crowdsourced data” chapter for further discussion). Designing with a wide range of participants in mind is vital to the success of any crowdsourcing project, and will ultimately benefit all participants.

The community itself can sometimes become a barrier to new participants. The intentional building and maintaining of the community shape the culture and norms of the project. Creating a welcoming environment broadens the opportunity for participation. Read more on community outreach in the “Connecting with communities” chapter, and more on how existing community values can be a barrier to entry in itself in the “Identifying, aligning, and enacting values in your project” chapter (in particular, the case study on values in action within the Wikimedia community).

Considering inclusion in your project

We understand the concept of inclusion to be about reducing exclusion and discrimination (e.g., regarding age, social class, ethnicity, religion, gender, sexual orientation, etc.) by both individuals and groups through modifying settings, policies, cultures, and structures, to create the proper conditions for the emergence of diverse perspectives. For crowdsourcing projects within the cultural heritage sector, considering inclusion is critical. These projects are participatory so designing for your intended participants is key.

We invite you to consider the following recommendations, especially as you think about your project and task design, as well as your participants (discussed in greater detail in the “Designing cultural heritage crowdsourcing projects,” “Choosing tasks and workflows,” and “Supporting participants” chapters):

Intentionally engage diverse communities in ways and spaces that work for them
Work with marginalized communities — crowdsourcing projects can bring enormous value to documenting, preserving, and amplifying the heritage of marginalized communities. This comes with additional considerations about what barriers, oppression, and marginalization these communities experience, and what extra work you must be prepared to undertake to make a project inclusive and rewarding for them
Co-design tools with participants to make your project more inclusive
Make the content (and outputs of your project) inclusive for a diverse reader group. Consider how some groups may be put off by your content, feeling it is not for them. How can you reframe it to make it more inclusive?

Perspective: designing for accessibility on FromThePage — Ben and Sara Brumfield, partners at FromThePage

When designing tasks, we attempt to find ways for users to contribute at any level of technical skill. Text annotation projects can be broken into small steps, so that a first pass might be in plain text only, with more complex mark-up or annotation added later. This allows some users to focus on mark-up, while others focus on reading difficult handwriting; two very different tasks which users may gravitate towards. If possible, tasks should be designed to be completed independently, so that a user working on an Urdu-to-English translation project may translate handwritten Urdu text to English digital text without first needing to transcribe the text using an unfamiliar Urdu keyboard.17Exposing your contact info on the crowdsourcing site allows people who are unable or unwilling to use the crowdsourcing platform to contribute by offering their expertise on the back-channel. In most cases, these contributions will be qualitatively different from the crowdsourcing task; this back-channel communication can offer a way for people to offer related material or personal anecdotes. It can also attract advice from technically sophisticated users, as when a volunteer transcribing on DigiVol offered suggestions on statistical sampling techniques for quality control.

When is the crowd not the crowd?

Crowdsourcing as a framework for internal tasks of your organization

Crowdsourcing can be adapted for other contexts, including internal work. During the COVID-19 pandemic, for example, some institutions used crowdsourcing platforms to provide work-from-home options to staff. Projects or collections were designated specifically for staff, as a closed community, so that they could continue to work remotely. Using crowdsourcing platforms for cataloging or transcription tasks meant staff could be productively employed instead of being put on furlough or let go. For example, Clements Library at the University of Michigan adjusted to the COVID-19 pandemic by enabling transcription:

“When library staff started working from home in mid-March [2020], we quickly upgraded our demo version of the crowdsourcing software FromThePage to allow staff and student employees to transcribe the manuscripts and printed annual reports of the Rochester Ladies’ Anti-Slavery Society Papers. This allowed students to finish their work hours for the semester remotely and gave staff a useful project to supplement their other work.”18

Another example of integrating staff into a crowdsourcing endeavor occurred in March 2020 at the National Library of Scotland:

Case study: self-organized Wikisource project for staff at the National Library of Scotland

The National Library of Scotland Wikisource project is an inspiring example of a staff self-organized crowdsourced project that took place across several months. The National Library of Scotland has a collection of over 3,000 Scottish chapbooks which have been digitized and published on the Library’s Digital Gallery. There was never a good time, though, to work on this content. In March 2020, the Library undertook a project to correct the OCR on this collection by uploading them to Wikisource and proofreading them, then exporting the corrected OCR and loading it back into the Digital Gallery. The project began during the Coronavirus crisis and was a useful way of using staff time while they worked from home. In the first four weeks of the project, more than 60 members of staff took part. Also, it served as a good vehicle for engaging Library staff with the wider Wikimedia environment.19

The framework of crowdsourcing — such as the emphasis on scaffolded, rewarding tasks — makes it attractive for internal work, which often relies on tools with poor usability and sometimes awkward workflows, in contrast to participant-focused crowdsourcing interfaces. Remote work on crowdsourcing systems may also give staff a connection to the material aspect of their work, even while working at home:

“For those of us who miss the intimacy of handling the documents themselves, we’ve turned to transcription projects that call us to do close readings of page after page of manuscript letters. ... The familiarity of having to squint, tilt your head, and puzzle out an indecipherable handwritten word keeps us grounded in the collection work we love, even while we do it from our living rooms rather than our offices.”20

Crowdsourcing as a framework for institutional engagement

Institutions such as museums have similarly benefited from using crowdsourcing to keep their guests engaged while physical buildings are closed due to COVID-19.

The Mapping Historic Skies project, for example, asked Adler Planetarium guests and Zooniverse participants to join in on the work of building a collections database, in particular a database of constellation depictions that can show the way a single constellation has been represented across time and geographic location. The original project included an in-exhibit workflow and an online Zooniverse-hosted workflow. The former was designed to invite Adler Planetarium guests to take part in an onsite interactive experience to help segment images containing artistic depictions of multiple constellations, essentially cropping the individual constellations out of the parent images. The latter invited online participants to classify the individually cropped images of constellations. When the museum closed to the public in March 2020 due to COVID-19, the project pivoted exclusively to using the Zooniverse platform for both workflows, allowing visitors to continue engaging with Adler collections materials, as well as inviting new participants from around the world to take part in the project.21

How is crowdsourcing mutually beneficial?

Many institutions find that access to collections is one of the most tangible benefits of their crowdsourcing efforts. Staff immersed in collaborative projects also describe the less concrete benefits of increased bonds with participants who begin to see themselves as being part of an institution and its mission. Crowdsourcing can be a tool to build new relationships with audiences, wherein the traditional relationship between institution and user is blurred, inviting audiences to act as researchers, curators, and experts.

The Natural History Museum, which hosted a Wikimedia project in 2014 in collaboration with Wikimedia UK, described their institutional perspective:

“One of the great challenges of a 21st-century museum is how it embraces technological advances. Museums need to be conservative, to ensure that their holdings will be available (and useful) to many future generations. But museums must also maintain relevance to modern audiences, to scientists studying giant datasets, or to somebody Googling information about a species. As a museum we have a clear ambition: to be a voice of authority on the natural world. Technology allows us to advance this ambition, reaching audiences far beyond those who can visit us. The projects run by Wikimedia allow us to engage with a global community who will use, reuse, interpret and add value to our content.

In short: Wikimedia provides a platform that allows anyone to become a collaborator with one of the world’s great museums.”

– Ed Baker (Data Researcher, Natural History Museum, involved in the Wikimedian in Residence project), residency final report, May 2014.22

These new roles equally benefit project participants. Participants in GLAM crowdsourcing consistently list contributing to a bigger cause as a primary motivation for their work — something we feel is less common in commercial crowdsourced projects.23 Participants can also benefit from a conversation with each other, finding a sense of shared purpose and common interests. Many projects have ways for participants to comment on or tag a collection item they discover during a task. For example, contributors to the British Library’s In the Spotlight project can add a comment that is sent via email to the organizers, and contributors to Zooniverse projects can add a hashtag about an item or discuss it on project-specific message boards, known as Talk boards. Participants might also undertake independent research and report their results on social media, in blog posts24 or forum posts. There is some evidence that discussion leads to learning and deeper engagement with the relevant subject.25

In 2017, I started working on designs to revive the Library’s LibCrowds platform26 to help transcribe selected text on digitized sheets from our historical playbills collection. Based on previous experience, I was keen for participants to have some way to share interesting things they noticed on the historical playbills shown. There didn’t seem to be much interest in sharing notes on a forum or social media, so we built a comment box into the task page. Comments are emailed to a shared inbox along with a link to the original playbill. I’ve always felt awkward about promoting the project as a purely broadcast activity, but sharing these comments with a detail from or a link to the original playbill seemed to fit our goals much better. I hope it shows that we’re both reading and valuing participants’ comments and that that, in turn, draws other people to join (or rejoin) our project. In one case, a participant wrote a post about a phrase he’d noticed on a playbill, and we were delighted to share this on our Digital Scholarship blog.27 Over time, I came to think of this as a “virtuous circle” in which amplifying comments leads to more activity.

*Creating a “virtuous circle” around sharing participants’ comments*

Many crowdsourcing projects highlight the deep engagement with collections demonstrated by their participants. The University of Pennsylvania Libraries and their partners operate the Scribes of the Cairo Geniza project, which enlists the public in the process of transcribing over 30,000 medieval text fragments.28 While the project message boards provide a place for participants to get help and ask questions of researchers, the most active section is the Notes/Subjects Discussion board, which contains passionate and thoughtful discussions of the source materials.29 These discussions have over 70 times the participation (over 28,000 posts) compared to the board for help and technical issues (roughly 400 posts).30

Academic researchers have established the benefits of working collaboratively with crowds. Not only have projects increased academic findings and publications but they have also broadened perspectives and participation in research. The Zooniverse platform lists hundreds of academic publications based on the crowdsourced projects they have hosted.31 Many include participants as co-authors,32 publicly recognizing and crediting their contributions.

When is crowdsourcing not the answer?

For organizations grappling with resourcing challenges or contemplating new ways to engage with a digital audience, crowdsourcing can seem like a win-win solution. But it is not right for every project or organization, and in this section, we will reflect on some of the reasons why crowdsourcing may not be right for you.

Any organization seeking to work with the crowd should begin by defining their “Why.” What is the task and challenge your organization faces that is so big it can only be solved by working with a crowd? Are there tasks that would greatly benefit from involving more perspectives? Are you looking for a way to increase participation and inclusion? Anticipating a question from potential participants, why can a computer not complete this task to a high quality?

Another key question relates to those engaging with your project. Why should people give their time to your organization or your project? How will you value and respect their contributions?

A guiding principle for crowdsourcing in cultural heritage is to treat project participants with respect. This means soliciting their efforts judiciously, in ways that honor the knowledge and labor of participants and communities. Crowd-generated results should be useful and shared publicly. It is not just the polite thing to do (for more on this, see the “Identifying, aligning, and enacting values in your project” chapter), it makes for better projects than any we could imagine alone.

Having a clear vision of why you are creating your project is the first step to understanding the value crowdsourcing can create for both your institution and your participants. Knowing that your organization understands this will be the fundamental reason people will choose to get involved with your work. If you do not have clear, positive answers to your “Why,” then crowdsourcing may not be appropriate.

When you want to save money

A common misconception is that crowdsourcing is a cheaper way of achieving your goals or mission, essentially tasking the crowd to undertake labor you would otherwise have to pay for. This could include, for example, the creation of data training sets for Machine Learning (ML) models, where the crowd might be misconstrued as providing free labor as a step on the path to automation (discussed further in the “Choosing tasks and workflows” chapter). Some participants may be excited about creating datasets for use in ML that will help automate and scale-up processes that improve the quality and discoverability of collections or contribute to emerging fields of research such as human computation. For example, participating in data science for digital history was part of the call to participate in crowdsourcing on Zooniverse for the Living with Machines project.33 Another example is the Library of Congress Labs, which designed an experiment that aims at the intersection of engaging, ethical, and useful, setting up an enjoyable activity with informed participants to create and verify training data (with user-centered design and participant research being a critical part of the experiment).34

However, process-oriented goals do not absolve your team from undertaking the steps listed above to create an accessible and engaging project. Not-for-profit crowdsourcing should provide mutual benefits for participants and stakeholders, such as the opportunity to undertake meaningful and enjoyable tasks. Even if the underlying technology is free to use, organizations hosting projects should be prepared to invest resources into communication with participants, such as progress reports, answering questions, and so on.

Additionally, organizations should be prepared to consider the institutional resources necessary for a healthy and successful project, including staff time for engaging with volunteers during an active project, but also for data pre- and post-processing, which can require quite a lot of time and effort (discussed in more detail in the “Working with crowdsourced data” chapter).

If you do not have the resources or desire to invest in the communications, data processing, and day-to-day management of building and maintaining your own crowd, there are numerous commercial providers of data-labeling services to quickly train algorithms in new applications, many of which also include robust options for data output formats.

When you are skeptical of crowdsourcing

At face value, crowdsourcing is completely aligned with the determination of many public-facing organizations to realize a wider social benefit. However, far from welcoming this turn to the crowd, some practitioners have raised concerns that crowdsourcing is at best cynical and at worst exploitative. Just as commercial industrial giants might greenwash their business practices with some small-scale environmental initiatives, involving the crowd as a tokenistic bolt-on to an otherwise traditionally designed project is arguably a form of dressing up a project to look participatory. To ensure that your project does not fall into this trap and result in reputational risk, it is important to recognize that crowdsourcing is not just another form of outsourcing. Working with the crowd is a two-way street; it can be demanding, both logistically and ethically. It requires careful consideration of how the crowd will work on readily achievable tasks; attentiveness to how you will communicate with them;35 and thorough regard to your organizational capacity to build and manage your crowd. Before embarking on this journey, it is vitally important to make an honest assessment of whether your team has the logistical and intellectual capacity to open itself up to working with the crowd in a way that demonstrably values their contribution.

When your organization is not able to use the results

In launching a successful crowdsourcing project, one of the most difficult hurdles to overcome could be the cultural norms and professional expectations of your colleagues and organization. Drawing from the related field of citizen science, this difficulty was well illustrated in a recent study of the effects of NASA’s experimentation with open innovation, a process that was revealed through an in-depth three-year longitudinal field study by Hila Lifshitz-Assaf, a researcher embedded with NASA’s scientific community.36 As an experiment in crowdsourcing, NASA published fourteen strategic challenges on open innovation platforms, leading to “spectacular” results that far exceeded the work of their in-house Research and Development teams. Despite this success, enthusiasm for the initiative was far from unanimous with NASA’s scientific community, leading to “rising tensions, emotions, and fragmentation.” Lifshitz-Assaf identified two main groups based on their opposition or support to working with the crowd. The first she called “problem solvers,” who self-identified with a bounded, scientific method adhering to notions of professional expertise and peer review, and “solution seekers,” open to collaboration with the crowd and dismantling professional boundaries. Irrespective of the results or strength of the project design, Lifshitz-Assaf noted that working successfully with the crowd would require a “mindset shift.”

This change towards a crowdsourcing mindset is the necessary first step you and your team need to make before embarking on this journey: what Lifshitz-Assaf calls “a shift in one’s professional role and one’s identity… changing the focus of ‘How’ we do our work, to pause, reflect and refocus on the bigger ‘Why.’”

Reality check: should you start a crowdsourcing project?

Summary

Crowdsourcing in cultural heritage has some decades behind it, but there is much more potential to emerge. Each new project opens another door — into collections, into communities, into institutions — and holds transformative opportunities for each. Collections may be transformed into searchable data that can be analyzed and understood in new ways. Communities may feel more connected to their cultural heritage or more able to directly interact with it. Institutions may build new relationships or deepen existing ones with interested communities. Institutions can transform from faceless, impenetrable monoliths to facilitators of open processes and authentic experiences. Organizational culture can shift away from expertise and authority and toward connection and shared knowledge. While all of these outcomes are possible, how the project is designed, staffed, communicated, and supported will determine what comes to fruition.

Often, the larger goals and potential of crowdsourcing will generate far more excitement than the detailed work of data exports. These can seem lofty and aspirational when you are just getting started. Do not be afraid to plant a flag and go for it! Revolutionize access to collections. Create participatory opportunities and culture change at your organization. Build inclusive communities that welcome all skill levels and nurture greater involvement. These “big picture” goals will keep you inspired, convince your administrators, and speak to your potential participants. Some goals may develop for your project as you work, too; it is okay to not know exactly what to expect. Aim high, adjust as needed, and be ready for surprises.

Why? Because crowdsourcing creates both tangible and intangible outcomes that are central to the mission of cultural heritage institutions. There are ethical, logistical, and technical considerations particular to this type of work, so tread carefully. Crowdsourcing enables individual and collective learning and discovery. Ideally, it generates positive experiences for all parties involved: contributors feel valued and appreciated, project organizers open their processes and share their ideas, institutions better serve communities of users and increase knowledge of and access to collections. Those of us who do this work will tell you that the benefits outweigh the challenges, the tangible outputs are not the only results, and that — done right — it is incredibly fun.

License

Creative Commons Attribution 4.0 International License (CC-BY 4.0)

Comments

Johan Oomen:

Could it be this is mixing up ‘why’ and ‘what’ here? Shouldn’t the why be: ‘we want to make a difference in our users lives’. ?

Johan Oomen:

Perhaps add a example scenario where this is the case?

Johan Oomen:

Perhaps good to unpack ‘challenging’ - and perhaps refer to following chapters how to address them? I apologize if this comes across as a Reviewer 2 comment. But I think it would be good to expand, the challenges here are operational (curators giving up a bit of autonomy), technical (will my content management system allow for polyvocality to be added in a meaningful way, how can this be explored), quality, community (how to manage various viewpoints in a constructive manner, engagement (how to keep the momentum going), and certainly outreach (making sure target audiences feel safe and connected)

Johan Oomen:

In effect AI will become better (tbd: define better) by inserting a feedback loop. https://scholar.google.ca/citations?view_op=view_citation&hl=en&user=xF1LeyoAAAAJ&sortby=pubdate&citation_for_view=xF1LeyoAAAAJ:5nxA0vEk-isC

Johan Oomen:

Yes yes yes. To me, this is where we need to explore more.

Johan Oomen:

Depends a bit on the task. See earlier chapter, engagement whilst performing micro task can be quite casual.

Caitlin Haynes:

I know there is no one-size-fits-all answer for this, but is there a way to provide examples or even further detail about what appropriate staff time looks like for “engaging volunteers during an active project” and managing the day-to-day of your own crowd, including the skills, experience, education, etc. that may be needed (volunteer management, communications and marketing, organizational skills, etc.). I have found that this (what do you actually need to be a successful community manager and how many of these positions does a program like this require) is one of the most commonly asked questions and one of the hardest answers to give (as well as one of the hardest things to get leadership to understand.

Caitlin Haynes:

GIANT CLAPPING HANDS ALL AROUND in this section (and below)- yes, yes, yes, acknowledgement, transparency, respect re: your participant community are so important and so often under-appreciated for the time, skill, and attention doing this successfully entails.

Caitlin Haynes:

or beyond this - how some content can further traumatize marginalized groups - either through the emotional labor of directly engaging with traumatic history or by the very act of including content about a marginalized group or community without that source communities’ input and collaboration. I’d argue it’s not simply about being more inclusive (but also considering what (not who) should be included at all / what is even appropriate as a task, content, etc. as a crowdsourcing project. I know you get to this later on as well and I LOVE the thoughtful, critical way you have all approached this entire section, particularly re: accessibility.

Samantha Blickhan:

Thank you for adding this nuance. It’s certainly something we can add here, and will act as a reminder that inclusivity doesn’t just mean what is included, but also sometimes what is intentionally left out.

Caitlin Haynes:

what about content? I think this falls into “design and tasks,” but the actual content of the materials/items/collections with which the “crowd” may engage - who decides that - could that be added more specifically here as well? If we’re thinking about colonial institutions that so often hold collections centered on the history of white men, the choice of what collections get an “extra spotlight” of sorts by going into a crowdsourcing project, impacts these larger points about diversifying the record, incorporating community voices, etc., and on the other side of things, collections that may be written by or about marginalized groups may not be ideal candidates for the increased access / open access of a crowdsourcing project. Who gets a say in that? And how does this power imbalance affect that?

Samantha Blickhan:

I really appreciate this! You’re right that it could be related to design/tasks, but could also overlap with influence — in either case, worth adding explicitly, as you suggest. Thank you!

Caitlin Haynes:

empower “them” how? This sentence comes across as a bit too generalized; and reads somewhat like our goal in the cultural heritage crowdsourcing community is to engage marginalized groups so they can “create” knowledge for us, rather than share their knowledge in an equitable exchange of information, learn together, collaboratively create, etc. I know what you’re getting at here, but I think some deeper exploration/fleshing out of this sentence / idea (as you do further on) would help.

Samantha Blickhan:

Just re-reading this comment, and am really grateful for your generous feedback. This is a valid critique and a really important point. We’ll be sure to dedicate some time in our editing process to reframe and include some more nuance here.

Caitlin Haynes:

I’m sure Meghan’s already included this somewhere :), but Smithsonian Transcription Center volunteers were able to discover (among many other accomplishments) the identities of many previously unacknowledged women botanists who contributed to Joseph Nelson Rose’s (and the Smithsonian’s) research, and created new Wikipedia articles for 19 of these women. (info here: https://www.atlasobscura.com/articles/how-the-smithsonian-is-crowdsourcing-history and in Effie Kapsalis’s article “Making History with Crowdsourcing,” in Collections: A Journal for Museum and Archives Professionals, Vol. 12 No. 2, Spring 2016.

Caitlin Haynes:

We also often hear from Smithsonian Transcription Center volunteers that in-person volunteering requires a particular time commitment they are unable to meet or that the “digital space” allows them to contribute and engage in in a way that is more comfortable than interacting with others in-person.

Alexandra Eveleigh:

More detailed, certainly, but is ‘complete’ really a (desirable) possibility?

Mia Ridge:

We take a look at adding nuance to that, as it’s in a sentence alongside AI which tends to inflate expectations…

Vahur Puik:

I think the point of view/departure in the book seems to be very institutional. It’s mostly (at least I get the impression) about initiating crowdsourcing projects in house – the GLAM who owns collections/content is planning a campaign.
But there are many crowdsourcing initiatives (biggest of course being the Wikimedia movement) that stem from volunteers having open access to open digital heritage content. Our Ajapaik.ee is also such an initiative.

Vahur Puik:

Where’s the case study above?

Mia Ridge:

Good catch! It must have been moved below after this line was written.

Vahur Puik:

But “What is crowdsourcing in cultural heritage” is previous chapter?

Mia Ridge:

Only for our linear readers :) But also we meant ‘further’ as in ‘more’ rather than ‘further along’, if that makes sense…

Victoria Morris:

Should this include anti-discrimination more generally?

Mia Ridge:

Good point - this text reflects the time in which the text was written, but we should expand it

Victoria Morris:

Including the problem of concurrent access as well as conservation issues - we can’t all look at the same plant at the same time, but we can look at the same image of that plant.

Mia Ridge:

Thanks for teasing out what I meant by ‘physical access’ - we’ll tweak the text to reflect that

Victoria Morris:

And are able to tackle tasks at a scale that would not be feasible without the help of the crowd.

Mia Ridge:

Absolutely, and thanks - we’ll add something along those lines!

Siobhan Leachman:

I just love this diagram! I wish every crowdsourcing project proposal would go through this step by step process.

Mia Ridge:

Thank you so much for that ringing endorsement! It really helps to know what resonates and adds value.

Siobhan Leachman:

I so agree with these comments. I can remember how thrilled I was when Meghan Ferriter blogged about how I reached out to a third party organisation with data I’d found as a result of transcribing with the SI transcription centre. She encouraged and affirmed volunteer behaviour that I still practice to this day. https://web.archive.org/web/20180417100919/https://storify.com/meghaninmotion/charlotte-s-web

Mia Ridge:

Siobhan, that means so much coming from you. Thank you!

+ 1 more...

Siobhan Leachman:

With the COVID19 epidemic and the resulting requirements that employees and in house volunteers of organisations work from home, I would argue people inside organisations also participated in their organisations crowdsourcing projects. The “people outside an organisation” in this statement is possibly too narrow. As a volunteer on multiple projects I have also noticed that often employees of the organisation join in as volunteers in their own time. Although a project may intend to “engage people outside the organisation” it may not be accurate to describe all volunteers of a project as such.

I’ve just read on and realised you’ve dealt with this in “Crowdsourcing as a framework for internal tasks of your organization”.

Mia Ridge:

Thanks Siobhan - perhaps we can signpost that later discussion up here. Participating in crowdsourcing is an interesting continuation of in-person volunteering. Some of the most dedicated BL volunteers are former staff members who’ve continued to volunteer for years after retirement. I’ve also felt a little guilty when launching projects in the past as I could see productivity taking a brief hit in some sections of the institution as staff tried out new tasks.

I’m also delighted to see crowdsourcing methods used for internal participation, as the focus on a high quality user experience can be a pleasant change from working with ‘enterprise’ software.

Lauren Algee:

Is there a way to set off the end of the “Perspective” from the next section? A little confusing trying to figure out where it ends and general discussion resumes.

Mia Ridge:

Some of the styling from the supplied ePub didn’t carry over into the platform, which is a shame as in the original PDF version the sections were very visually distinct. It’s important that we find a way to redress that!

Mia Ridge:

Thanks Lauren (and for your other comments too).

Some formatting was lost when we transferred from one epub to another, but your comments are a good reminder of the importance of the styles that were lost and that we should prioritise resolving that issue somehow.

Lauren Algee:

This transition to this powerful quote is a little disorienting— I suggest introing with some context “as the Lone Rock Stokade Project stated in its mission…” so you don’t lose the reader and possibly the impact of the statement

Kurt Luther:

Perhaps add a paragraph break here

Collective Wisdom Project:

A good suggestion - thanks Kurt!

3. Why work with crowdsourcing in cultural heritage?

Why work with crowdsourcing in cultural heritage?

Why work with the public?

Perspective: collective projects for unremembered histories of the Colored Conventions — Denise Burgher, Chair-Community Engagement and Historic churches, Co-chair Curriculum Team, Co-director Douglass Day — Colored Conventions Project

What counts as knowledge? Whose voices are heard?

Who holds power in your project?

Who is the crowd?

Understanding participant demographics

Case study: In the Spotlight participant motivations and demographics

Potential barriers to participation

Considering inclusion in your project

Perspective: designing for accessibility on FromThePage — Ben and Sara Brumfield, partners at FromThePage

When is the crowd not the crowd?

Crowdsourcing as a framework for internal tasks of your organization

Case study: self-organized Wikisource project for staff at the National Library of Scotland

Crowdsourcing as a framework for institutional engagement

How is crowdsourcing mutually beneficial?

Perspective: the “virtuous circle” of sharing participant comments — Mia Ridge, Digital Curator, British Library

When is crowdsourcing not the answer?

When you want to save money

When you are skeptical of crowdsourcing

When your organization is not able to use the results

Summary