Skip to main content

2. What is crowdsourcing in cultural heritage?

Published onApr 29, 2021
2. What is crowdsourcing in cultural heritage?
·

What is crowdsourcing in cultural heritage?

A crowdsourcing project can create joy, inspire curiosity, foster new understanding, encourage wider social connections, promote learning, and help people cultivate new skills and expertise. To help illustrate those possibilities, this chapter offers an introduction that lays the groundwork for other chapters. We define and discuss where crowdsourcing in cultural heritage sits in relation to other fields. We provide an overview of key concepts, including types of crowdsourcing projects, and some of the complexities you will encounter.

Any account of this large and growing field is necessarily broad. It encompasses a wide range of human knowledge and understandings. To reflect that diversity, we strive to be capacious in our definitions, discussion, and examples, although we have largely left aside the kinds of crowdsourcing projects in the sciences and for-profit sectors. We expand our discussions, when possible, with case studies and examples that might diversify, complicate, or add extra nuances to the larger narratives in this book.

Each crowdsourcing project is driven by a larger purpose. Those purposes can be as diverse as the people who build crowdsourcing projects. We might start with a research question, for example, as when the Zooniverse crowdsourcing platform began with a single project asking people to help classify distant galaxies.1 Along with collectively driven inquiries, many crowdsourcing projects seek to expand access to a given collection by enlisting people’s help with transcribing, tagging, or other tasks. Converting scans into searchable text alone can make items in large collections much easier to find for specialists and casual visitors alike. Crowdsourcing can open up new possibilities for research that could not have existed otherwise, fundamentally changing authoritative practices, definitions, and classifications.

Sometimes, when we talk about a crowdsourcing endeavor, colleagues or friends might ask “aren’t you just asking for free labor?” That is a common enough question, usually coming from people who have not had a chance to appreciate the sense of shared purpose and community that a crowdsourcing project can inspire.2

Crowdsourcing projects can be a lot more than just augmenting a digitized collection of materials. Crowdsourcing can be transformative. While plenty of sites on the Web 2.0 (or “Social Web”) invite users to contribute content, crowdsourcing sites differ in that there may be even greater value to be had in undertaking the task, rather than restricting value to the content produced. The process of starting and running a project can present the opportunity to invite people to contribute to the store of knowledge flowing through your organization or institution. Seeing crowdsourcing projects as more than the sum of their tasks or end products allows us to imagine new horizons for our institutions, organizations, and affiliated communities. At its best, crowdsourcing increases public access to cultural heritage collections while inviting the public to engage deeply with these materials.

We learn these lessons from the participants who join our projects. We hear that crowdsourcing projects provide valuable outlets for all kinds of interests and investments. A given project might provide opportunities to volunteer without leaving the house, or provide a way to pass the time while riding on public transit. Others might help a marginalized community reconnect with its scattered or previously inaccessible cultural materials. For stewards of cultural heritage collections, a crowdsourcing project can provide more benefits. When an expanding group of people immerses themselves into a digitized collection, the process can lead to new discoveries and fresh perspectives. Participants can teach us a great deal about our collections, especially in the passion and enthusiasm they bring to so many different kinds of materials.

Even so, we know that crowdsourcing projects do not start with a blank slate. Our institutions and organizations inherit the historical legacies of empire, slavery, prejudice, and other inequalities. Those inheritances determine what has been preserved and who gets represented in our collections. Such legacies, of course, shape the digital lives of our collections. Crowdsourcing projects are far from a cure-all. Digital exclusion persists around the world. Deeply set social structures do not change in a day, nor a single project.

Yet clear commitments to certain values can help create spaces for more diverse voices. The British Library’s In the Spotlight project sought to offer spaces for engagement. The Colored Conventions Project’s principles invite people to reflect critically while browsing the recovered histories of Black lives and culture.3 Others, such as Zooniverse, ask project creators to agree to provide open access to the results of their crowdsourcing projects.4 As we discuss here and elsewhere in this book, these intentional approaches to centering certain values in our projects are invaluable. Such approaches do not just provide spaces for dialog — they make for stronger projects that can establish deep and lasting relationships with our communities.

Evolving ideas about crowdsourcing

Defining crowdsourcing in cultural heritage

Defining crowdsourcing in cultural heritage is an unending, constant task. When applying for funding to write this book, Ridge, Ferriter and Blickhan defined crowdsourcing in cultural heritage as “a form of digitally-enabled participation that promises deeper, more engaged relationships with the public via meaningful tasks with cultural heritage collections,” and said that, “at its best, digitally-enabled participation creates meaningful opportunities for the public to experience collections while undertaking a range of tasks that make those collections more easily discoverable by others.” Broadly speaking, crowdsourcing tasks may involve some kind of work on an item presented via a digital interface, such as transcribing text or audio, describing it through tags or longer texts, or collecting items for inclusion in a project.

In the quest for less unwieldy terms, acronyms such as GLAM (galleries, libraries archives, and museums), and LAM (libraries, archives, and museums) may be used. Cultural heritage organizations may be referred to as “memory organizations.”

Crowdsourcing has helped to provide a framework for online participation with, and around, cultural heritage collections for over a decade. It will continue to change, particularly as technologies such as Artificial Intelligence (AI) and Machine Learning are more closely integrated within systems that combine the work of humans and software into human-computation infrastructures.

Tensions and complexities in definitions

The term “crowdsourcing” can be complex. On one hand, “crowdsourcing” is a commonly used term. It is legible to a wide range of people, including across disciplines that may approach or define this area of work in different fashions. On the other hand, it has limits. A “crowdsourcing” project might be tremendously successful with only a handful of participants — hardly a crowd. The term also has roots in the word “outsourcing,” a word to describe an often-exploitive practice that takes advantage of cheap labor. We expect that debates will continue to seek suitable alternatives. In the meantime, it helps to refer to digitally-enabled participatory projects with the relatively more succinct term of “crowdsourcing.”

Projects in Europe and the sciences often use related terms such as “citizen science,” “citizen history,” or “citizen research.” One organization with roots in the natural sciences called the Citizen Science Association defines “citizen science” as “the involvement of the public in scientific research, whether community-driven research or global investigations.”5 When citizen science is used in the European context, “science” includes a wider range of disciplines than the STEM-linked term in English, but it can be read as excluding humanities and cultural heritage projects. We also recognize that “citizen” and “citizenship” are not neutral terms and can exclude or discourage participation from migrant and precarious communities.6 In a related manner, the word “heritage” can carry unproductive connotations in certain contexts, such as in the United States where it is a term used by white supremacist and nationalist groups, a more malevolent usage than the relatively more common and generally understood meaning of “cultural heritage” in the United Kingdom and Europe.

Other related terms include digital or online volunteering, online collaboration, and digitally-enabled participation. Niche-sourcing recognizes that many projects have a narrow appeal or may require specialist skills. Microtasking refers to dividing up complex tasks into smaller (“micro”) tasks suitable for crowdsourcing. Ultimately, we want to recognize the importance of intentionally choosing your project terminology and, crucially, recognizing where these choices can result in barriers for your project participants, whether intended or not.7

Key moments in the history of crowdsourcing in cultural heritage

While crowdsourcing is often framed as a novelty, its methods and sensibilities pre-date the advent of the Internet. Some people argue that crowdsourcing was born in the mid-2000s when the web made digitally-enabled participation more widely available, following Jeff Howe and Mark Robinson’s coining of the term in a 2006 Wired magazine article.8 For example, Brabham writes, “Although crowdsourcing rests on long-standing problem-solving and collaboration concepts, it is a new phenomenon that relies on the technology of the Internet.”9

In contrast, Ridge has argued that crowdsourcing in cultural heritage was transformed by networked digital technologies, but not created by it.10 As a field, it draws on antecedents from a range of different disciplines in which not-for-profit projects have asked people to collect and compile information and objects. These projects have their roots in the nineteenth century, if not earlier.11 Oft-cited examples include the Oxford English Dictionary’s mid-19th century appeal to the reading public to find instances of specific words in early texts.12 This approach was so successful that appeals continue to this day on social media. Other early models include the Green Book, a guide compiled in the 1930s-40s by and for Black people traveling in the United States,13 arguably a proto-crowdsourcing wiki-style information-gathering effort, and the “co-operative indexing” of 19th-century censuses in the 1980s.14

Versions of crowdsourcing continue to multiply. The field of digital humanities has driven an increasing diversity of projects, as Melissa Terras has described.15 Local historical societies and small museums have devised their own approaches, such as holding gatherings to transcribe records held in local record offices.16 These examples serve as a reminder that, while crowdsourcing is often associated with digital technologies, it retains strong ties to in-person volunteering.

The growth of these varieties of crowdsourcing has advanced amid the shift of the wider Internet to a Web 2.0 era, characterized by websites hosting user-generated content. Along with the more widely known social media platforms, influential milestones in the advance of crowdsourcing include the experiments on “social tagging” through steve.museum17 and the launch of Flickr Commons with the Library of Congress in 2008.18 In the past decade, an increasing number of museums have explored the potential of co-curating exhibitions through public selections of artworks19 or asking the public to share their knowledge on web-based wikis,20 or by contributing metadata to help enrich and remix online collections.21

What is unique about cultural heritage crowdsourcing?

Factors that make cultural heritage crowdsourcing projects different from other forms of crowdsourcing include the purposes behind their creation, motivations for participation, project content and data output, as well as theories of cultural heritage crowdsourcing. Many of these topics will be illustrated with examples in other chapters.

A broad overview of crowdsourcing project types

Crowdsourcing projects can be categorized in different ways.22 Other chapters of this book (including the “Aligning tasks, platforms, and goals” and “Choosing tasks and workflows” chapters) will approach task types in greater detail. This section will provide an overview and several options for how you might group task types together.

One approach is to focus on the activities performed by the contributors and look at the type of tasks performed. Other options include starting from the type of data processed (e.g., text vs images), size of the project (small, volunteer-led project vs large, institution-supported initiative), or the aim of the project (improve the accessibility of collections, generate or process research data, re-balance focus of collections).

Projects will typically be based around a particular set of data, and the tasks will relate to processing this in one way or another. Projects may also be focused on generating data, for example by collecting existing resources (images, stories) or creating new material (text, pictures, metadata). Many projects feature a range of tasks, perhaps moving from one to another as the project progresses, and many tasks involve more than one process, which could be performed simultaneously or in sequence, by one person or more than one. These combinations of processes, tasks, and people are often called crowdsourcing workflows.

Even though it may be difficult to classify projects by type of task alone, it can nevertheless be useful to consider the main data-centered processes employed in crowdsourcing projects. Common examples include:

Correcting/improving digital material — for example, proofreading texts that have been manually or automatically transcribed (OCR), or cropping pictures to remove blank margins, e.g., the New York Public Library’s Building Inspector.23

Improving the discoverability of material — for example by transcribing handwritten text or adding or enhancing information about the material by adding or correcting metadata, e.g., the Adler Planetarium’s Tag Along with Adler.24

Finding information about a source — for example, identifying an object in a picture, classifying a text, finding a proper name, e.g., the Conway Library’s World Architecture Unlocked.25

Generating data/material/collection — for example, by creating new material or collecting and bringing together existing resources, e.g., Wikipedia26 or The Great War Archive.27

What all the above processes have in common is that they rely heavily on the abilities of humans to identify and process information, for example, to quickly identify items in a picture, or read and understand the content of a text. Broadly described in this book as a spectrum of collecting-analyzing-reviewing, the individual contribution could be summarized28 as:

“type what you see” — transcription tasks, typing out handwritten text, e.g., the Library of Congress’ By the People project.29

“describe what you see” — adding tags that describe an image, e.g., Flickr Commons.30

“share what you know” — adding factual information based on your knowledge or research, e.g., Wikipedia.31

“share what you have” — uploading a picture or memory to a collection, e.g., Ford’s Theatre’s Remembering Lincoln. 32 33

“validate other inputs” — for example, checking and correcting text that has been transcribed, e.g., OCR correction in Trove.34

Tasks can also be grouped by size. Ridge suggests that microtasks are small, rapid, self-contained tasks that can be completed in one or a few clicks, such as classifying items by a small set of categories, identifying pictures that contain a particular object, or adding tags to an image.35 Macrotasks are more complex and may involve more than one action, such as identifying a specific item on a page, marking, classifying, and transcribing it. As such, they will take longer to complete. In addition to micro- and macrotasks, projects can also include other activities, such as supporting contributors, taking part in analysis and evaluation, or developing and maintaining platform functionality. Such metatasks are not necessarily tasks that can be easily defined as having a particular size or scope but are important for the successful running of the project.

Summary

This chapter provided an overview of crowdsourcing in cultural heritage contexts — what it is, how it came to be, broad types of projects and tasks, and some of the complexities you will encounter.

The following chapters build on this to expand on why you might work with crowdsourcing in cultural heritage, how values related to the missions and motivations of cultural heritage organizations and contributors can or should inform your projects, who participants are and what motivates them to contribute, and provide practical advice on managing and running projects (tasks, platforms, project management, data, evaluation).

Comments
32
Mia Ridge: We know that somehow footnotes in this one chapter got screwed up in copying from the source to the online ePub. We’ll fix them when we update the text in response to comments. Computers are great, except when they aren’t…
JO
Johan Oomen: Perhaps consider to broaden, also adding more subjective viewpoints to a collection object. Thus giving space for polyvocality.
JO
Johan Oomen: Correcting audio transcripts: http://fixitplus.americanarchive.org
JO
Johan Oomen: And Waisda for moving images https://www.taylorfrancis.com/chapters/edit/10.4324/9781315575162-15/waisda-making-videos-findable-crowdsourced-annotations
JO
Johan Oomen: Please also add the Europeana collection days. They are awesome.
JO
Johan Oomen: Awesome
JO
Johan Oomen: yes, very good definition. Perhaps add that through crowdsourcing, the polyvocal and complex nature of collections can be explored. By allowing space for multiple perspectives to be added by users.
JO
Johan Oomen: You might want to refer to cultural-ai.nl disclosure: I’m part of the research team.
Mia Ridge: Thanks Johan!
JO
Johan Oomen: You can argue that crowdsourcing results in a more equitable relation between ‘organization’ and ‘users’. So a way to engage more deeply with an organisation. By experts that are not professionals working for an institution, but share their time and energy because they have an intrinsic motivation to do so. In other words, we could be adding a new meaning to the word ‘your’; more based on principles such as ‘commons’, and equality.
Mia Ridge: I love that!
VP
Victoria Passau: A discussion on recolonising or othering the subjects of potential projects needs to be considered. Cultural Safety is not something that I have been able to find within this text (but I might have missed something). The biggest challenge for me is that this publication was written from a Western / Northern Hemisphere perspective and doesn’t include the nuances that are experienced by Indigenous peoples or those from the Global South. Happy to be proved wrong.
Mia Ridge: Victoria, thanks for your comment. The funding to write the book was a US-UK networking grant, so our participants reflect that. I’ll pass your comment onto the team to ask if there are any parts of the text they’d like to highlight in response. Our closing workshop has no limits on international attendance, and this would be a good topic to address there, especially as organisations are so much further advanced with anti-racism and de/recolonising work since even the start of the year.
Jim Hayes: footnote 4 goes to https://en.wikipedia.org/wiki/Category:User_warning_templates maybe https://blog.zooniverse.org/2013/08/02/many-zooniverse-papers-now-open-access/ would be better
CH
Caitlin Haynes: I know this is where you’re mainly introducing this idea that you explore further in later parts of the book, but I wonder if it would be useful to also cite an example of a specific project (or even an aspect of one of these projects) that actually incorporated the ideas and feedback of community participants / volunteers - beyond reflection and discussion. Have any crowdsourcing programs pivoted or revised aspects of their projects based directly on user input? Not saying I have the perfect example in mind - we’ve done some of this in the Smithsonian Transcription Center - but not as well as I’d like in terms of including here; but I just wanted to throw it out there! How can crowdsourcing volunteers / communities actually play a role in developing and shaping the project itself?
JN
Jez Nicholson: is there any research into the expectations and assumptions of participants in crowdsourcing? Personally, I assume that the fruit of my efforts would be released fully as Open Data and won’t be deliberately or accidentally (because the licence is tainted by use of maps, etc.) restricted. Other participants may not know to ask…or maybe they don’t mind?
Mia Ridge: It’s definitely something that some people look for, and we’ve included a comment from our volunteers survey along those lines at https://britishlibrary.pubpub.org/pub/understanding-and-connecting-to-participant-motivations/release/1#nkkt6sw478b
+ 1 more...
Vahur Puik: One distinction that I have made is about what kind of knowledge/competence is asked for from the participants. Whether we are talking about a generic knowledge (or maybe ‘skill’ is better term here) like being able to read handwriting (that can improve actually) or specific knowledge (‘information’) that the participants have and are asked to share. The important difference is that when we are dealing with skill-based tasks like transcription (or some basic categorisation) the limitation to participation is how much time the user can contribute whereas in the case of knowledge-based tasks we need more users with their specific knowledge to contribute (for instance geotagging images requires a user to recognize the place depicted and if there is no textual metadata it is the only way of identifying the location on the picture) and only will to contribute is not enough to successfully participate.
Mia Ridge: Vahur, thank you! That’s a really good summary of the logic underlying that grouping. I’ll look at incorporating it and credit you in the paragraph.
AE
Alexandra Eveleigh: Do you mean ethical values? What are these values (openness seems to be one of them - are there others?) and are they universally applicable to all crowdsourcing initiatives?
AE
Alexandra Eveleigh: …and shapes the opportunities (or lack thereof) to participate e.g. issues of connectivity in certain parts of the global south. I guess you go on to acknowledge this in the sentence starting ‘Digital exclusion…’ but this short paragraph didn’t really hang together and felt slightly tokenistic to me. And then the following one sounded abrupt: ‘but never mind, as long as you hold certain values (undefined) it’ll all be fine’. Appreciate this is very tricky to navigate successfully and succinctly though! Maybe explicitly acknowledging the biases and any gaps in experience of the co-authors here might help in some way?
Mia Ridge: Thanks for that really thoughtful comment. We’ll take a close look at this when re-editing and think about how to address the issues you‘ve raise.
VM
Victoria Morris: Linking? E.g. geo-referencing; matching to WikiData.
Mia Ridge: We could make that more explicit in ‘finding information’ as identifications and classifications would often come with links
VM
Victoria Morris: Actually there are 17th-century examples. James Jurin & the Royal Society used crowd-sourcing to gather meteorological data from across the UK (and abroad, I think). See https://publishing.cdlib.org/ucpressebooks/view?docId=ft6d5nb455;chunk.id=d0e6189;doc.view=print
Mia Ridge: Awesome, thank you! There’s also the longitude rewards https://en.wikipedia.org/wiki/Longitude_rewards which might even put things back to the 16th century
VM
Victoria Morris: The form of the verb “immerse” is incorrect. Should be “When an expanding group of people immerse themselves”, or “When an expanding group of people immerses itself”.
Mia Ridge: Thanks for spotting that!
VM
Victoria Morris: Crowdsourcing can also be a learning opportunity for the crowd, e.g. identifying flowers in images as a way to improve ones’ flower identification.
Mia Ridge: Excellent point - we’ll make sure that’s highlighted as it’s something we all agree on.
VM
Victoria Morris: You don’t seem to have used Oxford commas elsewhere.
Mia Ridge: Good catch, thank you!
SL
Siobhan Leachman: I’m wondering how Wikidata and other Linked Open Data crowdsourcing projects would be classified? Would those projects fall within either the “Generating data/material/collection'“ or the “share what you know” or would there be a separate type of “link what you know” project because of the type of outcome that results.
Mia Ridge: Personally I think of links as a form of description e.g. a Wikidata or ISNI link to Beethoven identifies and thereby describes the creator of a work. In this framework I was thinking about how the intellectual work relates to the item in front of you, rather than how that work is represented. The problem with simplification in an introduction is that it simplifies!
SL
Siobhan Leachman: I believe this is a key mindset for those designing and running crowdsourcing projects. If all that is aimed for is a narrowly defined measure of success eg all the specimen labels transcribed as written, projects, volunteers and organisers can miss out on a whole range of rich experiences and data that can result from volunteers being encouraged to explore & engage more deeply with the project content.
Mia Ridge: Thank you for articulating that!
LA
Lauren Algee: This appears to be the incorrect link (https://en.wikipedia.org/wiki/Wikipedia:Administration) - as noted above, something weird going on with all citations in this chapter.
Mia Ridge: Yes, for some reason they are weird in this chapter and apparently no other.
LA
Lauren Algee: Love the illustration below!
LA
Lauren Algee: a small thing, but wondering if this should be digital rather than digitized as not all CH crowdsourcing projects are for surrogates, for example crowdsourcing metadata or markup for a born-digital document. Digitized is also used once below.
Mia Ridge: True - I think we were trying not to over-think it, but generalisations aren’t always helpful
Christina Crowder: Um…. The link here goes to Wikimedia User Warning templates. I was hoping to get a link to the Zooniverse so I could see a sample of their contributor content agreement (or project creator agreement I guess is what you’re really describing).
Mia Ridge: Thanks Christina - we’ve discovered that the footnotes for this chapter are somehow out of whack but we haven’t had time to update the epub selectively
Mia Ridge: Here we could cite ‘From Crowdsourcing to Nichesourcing – The National Library of Finland Bulletin 2016’. Accessed 29 May 2021. https://blogs.helsinki.fi/natlibfi-bulletin/?page_id=206 for their explanation of nichesourcing for ‘complex tasks with high-quality product expectations’ and specific language skills.
BB
Ben Brumfield: Is there a chance that these refer to the footnotes on a totally different chapter, in addition to the rwong number? This marker (9) pops up the footnote body for 8, but that body (a link to the Colored Conventions project) also doesn’t seem to apply to the text preceding footnote 8 (Howe’s Wired article, presumably)
Mia Ridge: We’ll need to check them against that particular epub version to see if it’s wrong in the original too. It’s hard to see how copying and pasting chapter text wholesale could have done that but I don’t know anything about how the epub versions are made.
BB
Ben Brumfield: All the footnotes in this section seem scrambled. For example, this footnote marker 1 links to the body of footnote 10. I’m not sure what’s going on, but it seems to apply across the whole chapter.
Mia Ridge: I thought I’d replied to this, sorry! For some reason footnotes have gone awry in this chapter and apparently only in this chapter.
Jim Hayes: i like the discussion about heritage, but i might rephrase “In a related manner, the word “heritage” can carry unproductive connotations in certain contexts,” for”heritage is a problematic term, that is more about veneration of the past, used for both good and ill. we need to adopt methodical historical study, that enbodies our values.” see also The Heritage Crusade and the Spoils of History and https://www.open.edu/openlearn/history-the-arts/history/what-heritage/content-section-2.1
Mia Ridge: Thanks Jim! I’ll ask the folk who focused on that section to take a look.
Chris Lintott: This tension is neatly encapsulated in the original definition of crowdsourcing, which was ‘he act of taking a job traditionally performed by a designated agent (usually an employee) and outsourcing it to an undefined, generally large group of people in the form of an open call.’ - ie it explicitly included the idea that the task involved was something that would need to be done anyway. Obviously your definition is broader, but worth pointing out?
Chris Lintott: Never mind, I read on…