Through the project design process, you may have developed some concrete ideas about your project’s scope and goals. Moving to the task design stage, those ideas will need to become even more concrete. During task design, the goal is to transform more abstract ideas about what needs to be done, and who is doing what, into specific user interface designs and workflows.
In crowdsourcing, it can be tempting to view participants as an undifferentiated crowd, but this temptation should be resisted. Individual participants have diverse expertise and motivations for participation. Furthermore, the crowd is not static: interests and abilities change over the life of a project, sometimes as a direct result of participation in it. Recognizing and leveraging this diversity during task design, and meeting participants where they are, can not only be a more ethical approach but also can yield benefits in terms of smoother workflows and higher productivity across the project in the long run. In this chapter, we guide you through designing effective tasks, highlighting finding a match between your participants, tasks, data, and goals.
Before getting too deep into the weeds, here are a few key questions to consider:
With what source material and data will your participants be working? The format of your source material and its associated data — photos, scanned documents, digitized text, audio clips, etc. — can dramatically impact task design. A system for transcribing scanned documents will not adapt easily for transcribing audio clips. Connecting your source material considerations with the affordances of the platform or tools you would like to adopt will allow you to better determine which potential tasks can be accomplished. The sooner you can finalize the media formats which are in and out of scope, the faster you can make key decisions for task design.
What domain knowledge will your participants have? Some projects are geared towards a very broad audience with no expectation that participants have specific domain knowledge. Others assume that participants will be scholars, enthusiasts, or others who already have substantial knowledge of the domain. If the barrier to entry is low, you will be able to draw from a larger pool of potential participants. From a task design perspective, participants with little knowledge or more casual motivations also mean more work for the project team on usability; for example, to generate workflows separating complex tasks into simpler ones, creating appropriate tutorials or training materials, and monitoring quality through reviewing and other mechanisms.
What technical skills and resources do you have access to? Some projects rely heavily or exclusively on existing software and infrastructure for crowdsourcing, while others substantially modify existing technological resources or even create entirely new platforms if what they need does not already exist. Some platforms allow project teams to design tasks and workflows without any programming or technical knowledge but you may find they are limited to certain (albeit popular) task types. Some crowdsourcing software is free to use, but still requires funding and IT skill to host online. Even with ready-to-use platforms, converting crowd contributions into usable formats may require unanticipated technical skills. Substantially modifying or creating new platforms is an order of magnitude more complex, and typically requires some team members to have extensive skill and experience in software development.
What should the data look like in the end? As we will discuss below, a common mistake in task design is to think only about how the data or tasks should be divided up for participants, but fail to think enough about how the completed tasks will be recombined at the end. What you want the end product to look like heavily influences the design of the individual tasks and can make recombination much easier or harder.
The above considerations should provide you with some initial ideas about the data sources you are working with, the technical and domain knowledge of your project staff and your participants, and how the completed work will be shared at the end. These ideas can help you move towards more specific task designs, by making decisions about task types, task size, and task assignment.
Participation in crowdsourcing projects is often characterized by the Pareto principle (80-20 rule), where a small number of contributors generate a large proportion of the contributions. The research is mixed on whether these power users are born or made, i.e., if they tend to enter the projects with the preexisting goal of substantial participation, or if they develop that goal over time.1 Nevertheless, some portion of any crowd will likely seek to become highly involved, and consequently, their skill level and domain knowledge will increase over time. An effective project design will allow these participants to change the tasks they work on over time, either to challenge themselves and take advantage of learning opportunities or simply to try something new and different. Beyond retaining these highly-productive participants, it is beneficial to provide the alternatives below even for less involved participants to give them opportunities to become more central to the community.
More complex crowdsourcing projects often have multiple types of tasks that need to be done. However, you may not want to provide all of these different options to participants at the outset. A more limited set of options can avoid overwhelming newcomers with a paradox of choice.2 This decision may allow them to build skills and knowledge that can better prepare them for more complex tasks later on. Csikszentmihalyi proposed the idea of the flow state, an ideal balance between challenge and reward when people often feel fully immersed and “in the zone.”3 The quintessential example of flow is a rock climber whose next handhold is just barely within reach, neither too close (which creates boredom) nor too far (which is frustrating). The concept of flow has been widely adopted in video games (e.g., via “leveling up”),4 and to a lesser extent, crowdsourcing, to keep players and contributors engaged. For example, the original Old Weather project — in which participants transcribed weather data from ships’ logbooks — allowed participants to be promoted up to the rank of Captain by inputting more data, gamifying the experience.5
Similar ideas have been applied to task design in crowdsourced cultural heritage projects, where participants build skill and familiarity with a narrow set of initial, simpler tasks, and then unlock more challenging or complex tasks over time. For example, the New York Times’ Madison project sought to crowdsource the identification, categorization, and transcription of advertisements in its newspaper archives.6 Initially, new participants were allowed only to select areas of a given newspaper page containing an ad. After completing a certain number of these tasks, the participant was allowed to classify the selected areas into different categories of ads. Finally, after a certain number of completed classification tasks, the participant was permitted to transcribe the ad — presumably the most complex of the three task types.
These different task types, corresponding to skill or experience levels within a project, reflect a skill hierarchy or learning pathway. It is helpful to visualize or otherwise communicate these pathways to participants to help them set goals and expectations.
Roles are related to activities and permissions. Participants may be assigned different roles which correspond to different task designs or types. For example, a common role within a crowdsourcing project is that of a reviewer. After completing a certain number of regular tasks, the participant has established a track record and is promoted to “reviewer,” giving them access to tasks where they review, approve, or leave feedback on other contributors’ tasks. Other roles within a task may not be hierarchical, and simply reflect different types of work that need to be done. These roles could include linking subjects identified in texts, annotating documents, and responding to questions from other volunteers.
Many crowdsourcing projects use a hub-and-spoke model where the hub is the dataset and the spokes are participants. All of the participants interact with the dataset, but they rarely, if ever, interact with one another, except perhaps indirectly via reviewing. This model has the benefit of simplicity and can work well for many projects. However, interaction and collaboration between and among participants can also bring numerous benefits to your cultural heritage project.
Learning and accessibility Some participants may not yet have the right skills, knowledge, or confidence to complete tasks on their own, or the task design might pose accessibility barriers that prevent some participants from fully taking part. In these cases, participants can work together to close the gaps.
Intentional redundancy Despite the best efforts of both project teams and participants, not all contributions will be high quality (for more information, see the Quality Control section of the “Working with crowdsourced data” chapter). Participants can interact directly or indirectly with one another to review each other’s work, provide feedback, and make improvements.
Diverse knowledge and experiences One of the greatest benefits of leveraging crowdsourcing for cultural heritage is the diversity of knowledge and experience that a large crowd of participants brings to your project. Allowing your crowd to interact and collaborate amplifies these strengths by drawing on complementary knowledge areas and perspectives.
Inviting deeper reflection The act of discussing or reviewing project content in a collaborative setting can be as valuable for your participants as it is for your project. These interactions create opportunities for internal development, including deeper reflection, critical thinking, and empathizing to understand new perspectives.
Interaction between participants can take many forms. In addition to indirect interactions through reviewing or adding to one another’s work, participants might also interact more directly by pairing up to work on tasks, or by initiating or participating in a discussion about a tricky piece of content. To enable these types of interactions, the task design requires a mechanism for pairing up contributors inside or outside the platform, or dedicated conversation space for each item (e.g., the “Talk” page on Wikipedia, the “Talk” message boards on Zooniverse projects, or the “Notes” section on FromThePage).
In the FromThePage transcription platform, we try to follow the wiki model of transparency and discussion. Each participant can see the most recent edits on a project, as well as any notes which have been left — the notes being foregrounded to draw attention. Notes by participants or project team members are often discussions about the task and are on the transcription screen itself, so that participants have the context of the discussion. This provides a venue for productive collaboration and mentorship if done well.
As described above, participants may convene outside the platform. In some cases, going where people are, e.g., social networking platforms, may help gather participants and facilitate sharing of information about tasks and project goals.
Starting in 2014, the Smithsonian Transcription Center account (@TranscribeSI) began sharing views into tasks and goals on the social networking platform Twitter. Initially a means to showcase the project, it soon became a way to gather people together in a lightweight and asynchronous way. Convening people in this social network made sense for the project for a few reasons including cross-promotion with other Smithsonian Institution accounts and being where potential participants already sought and shared information. Using hashtags like #volunpeers or #WeAllLearnTogether or campaign-based #7DayRevChall and #FWTrueLove leveraged the affordances of the social network.7 Integrating hashtags in tweets created uniquely identifiable and searchable context, and made hashtags and tweets data, too. Hashtags were used 1) by the project organizers to focus attention on tasks that needed to be accomplished, new project releases, and updates and 2) by volunteers to perform metatasks such as sharing discoveries and coordinating with one another to take action around TC tasks during campaigns and challenges.
Crowdsourcing task types are theoretically only limited by your imagination — any type of work that can be completed online can potentially be crowdsourced. In practice, however, some crowdsourcing task types such as transcribing handwritten documents are much more common than others. Crowdsourcing tasks for cultural heritage can be roughly grouped into three types: collecting, analyzing, and reviewing. Depending on the scope of your project, you might have just one of these types, all three, and even multiple tasks within a type.
Crowdsourcing tasks are enacted upon different types of data. Some of the most common data formats for crowdsourcing tasks are documents or text, photos, videos, and audio. Primary source documents are typically stored as high-resolution photos created by cameras or (ideally) scanners, while the transcriptions are stored as text. Likewise, photos are typically captured and stored as digitized image files. It is relatively uncommon for participants to directly interact with video files. A notable exception is crowdsourced captioning projects seeking to make video recordings more searchable and accessible.
Collecting refers to tasks that involve participants collecting data or generating content. You might ask participants to add digitized documents from their own collection to your project archive, or otherwise, gather material from elsewhere for consolidation in your project. For example, the Library of Virginia’s James I. Robertson, Jr. Civil War Sesquicentennial Legacy Collection was a free digitization effort between 2011 and 2015 to encourage Virginia residents to share items from their personal collections related to the Civil War era with the Commonwealth.8 Other humanities-based collecting examples include the Letters of 1916 project9 which features letters from families as well public institutions, and History Unfolded,10 which asks participants to collect and submit relevant articles from their local newspapers. The related field of Citizen Science also assigns collecting tasks to participants. See, for example, the National Museum of Natural History’s eMammal camera trapping project11 and the iNaturalist wildlife identification project from the California Academy of Sciences and the National Geographic Society.12
Beyond digitizing or sharing existing content, collecting tasks may also involve asking your participants to generate new, original content. Many oral history projects, such as the VT Stories project at Virginia Tech, encourage participants to create new recordings of their lives and experiences and share them with the project.13 The Great War Archive and projects following in its footsteps asked members of the public to share digital copies of objects from World War I together with their memories or stories relating to the people involved.14
In either case, participants are typically provided with some user interface that allows them to import their content into the crowdsourcing platform, such as an upload feature to directly upload the content to the server, a text box where they can directly paste textual content or provide a link (URL) to the content already hosted online elsewhere, or a recording feature where they can use their computer or smartphone’s webcam or microphone to directly capture the new content.
Analyzing refers to tasks where participants are asked to analyze or interpret content. The content may have been added to the project during an earlier collecting task, imported via automated techniques, or it might have already been seeded by project team members through partnerships with GLAM institutions. The range of potential types of analysis that crowdsourcing can provide is almost limitless but also depends on the interest, expertise, and background of the participants you plan to recruit. Common analyzing tasks include transcription, translation, transliteration, and annotation; all of which make it easier for users to search for and find material, as well as create meaningful links to related content.
One of the most common analyzing tasks is the transcription of primary source materials, including handwritten documents or audio recordings. For example, in document transcription, participants typically view a scanned (digitized) copy of the document on one side of the screen and type the letters and words they can read. Transcription focuses on transforming visual or audio information into text, typically with a minimum of interpretation, so it may seem strange to call it analysis. However, while transcription can be an ideal task for crowds without much domain knowledge, it can be a mistake to assume such knowledge is not helpful or even necessary in some cases. Familiarity with cursive or period handwriting styles is valuable, and for ambiguous phrases, context becomes critical. The more a participant knows about the document, writer, culture, and period, the more likely they will be able to sort it out correctly. As projects such as Europeana’s Transcribathon15 demonstrate, it is not always necessary to speak the language being transcribed. Participants can engage with source material in a variety of languages, but lack of linguistic familiarity may result in the need for a robust quality control method.
Indexing is a special case of document transcription which is especially popular for government records and documents of genealogical interest. Here names and other entities are transcribed into structured fields. Audio transcription benefits from providing participants with a user interface that enables easily replaying short sections of a clip.
Some transcription platforms support work by multiple participants on the same document, allowing them to discuss difficult passages or interesting content.16 Many systems save each version of a transcript, allowing users to easily visualize differences between versions or revert to a particular point. Some of the more powerful transcription tools also allow manipulation of the item by rotating or changing the brightness of documents, or changing the speed or in- and out-points of audio clips.
As with other analyzing tasks, transcription can be enhanced by automated approaches. Optical Character Recognition (OCR) technology which generates an automatic text transcription from images has greatly improved in recent years, especially for printed and English text. Many cloud-based AI services provide standalone OCR services or integrate them with their computer vision services. Recent advances promise similar functionality for cursive text; for example, the Transkribus platform allows users to train AI models to automatically transcribe documents with handwriting or unusual typefaces. While automated processes have come a long way, they still often produce text with more errors than human-transcribed methods. As a result, some crowdsourcing platforms support OCR correction, in which participants edit raw text produced by OCR or HTR17 systems. Rose Holley’s documentation of the National Library of Australia’s Australian Newspapers Digitisation Program, including a well-designed text correction task for OCR transcription launched in 2008, was hugely influential in solidifying OCR correction as a crowdsourcing task (and crowdsourcing as a method suitable for libraries).18
Translation projects are less common now, in part because machine translation has improved. Text to be translated might be on collection items, or the task might involve translating parts of the software interface to support multilingual access. One example of translating historical collections is the Suda On Line project, launched in 1998, which aimed to translate the 10th century Byzantine Greek historical encyclopedia into English through a process open to anyone who “possessed the ability to translate ancient Greek, regardless of formal credentials and specialization.”19
Cross-linguistic tasks ask participants to translate text from one language to another or to transliterate texts written in one script to the same language written in a new script. Projects vary in the number of target languages requested; the Kriegestagebuch von Dieter Finzen20 project translated the war diaries of a German soldier into English, French, and Italian versions as well as transcribing the German text. Some projects combine transcription with translation, allowing translators to work from a transcribed text; others translate directly from a scanned image text. Transliteration of documents written in obsolete scripts serves the common crowdsourcing goal of making cultural heritage more accessible, but also often is an easier task for participants who may be able to read older scripts but do not have access to keyboards supporting those scripts.
Another common analyzing task is annotation, which involves asking participants to enhance a piece of content with metadata. There are many flavors of annotation, depending on the precise type of metadata that the task seeks to generate, including classification, identification, and tagging.21 Examples of annotation tasks include the BBC World Service Archive, which invited participants to help review automatically-generated tags for archival radio broadcasts.22
Classification tasks involve participants putting content into categories, especially categories that have already been specified by project staff. A platform that effectively supports classification should allow project team members to specify the category options and potentially modify them throughout the project’s lifespan. You should consider whether it makes sense for your items to have multiple categories or whether they are mutually exclusive, and make sure the platform supports this decision. It is also helpful to provide examples of items in each category to participants, either directly in the classification interface, or nearby as a link to reference materials (e.g., Zooniverse’s “Field Guide” feature) and provide options for unclassifiable items. If you are not sure exactly which categories you want to use at the start of the project — i.e., because you do not yet know what is in the data well enough to guess them — another approach is to provide tagging features as described above, so that your contributors can describe what they find in their own words.
Identification is a special case of classification or annotation tasks, where the participant is asked to place the items into very specific categories, with very few or perhaps only one member. This could include the identification of unknown objects, people, or locations in a collection of photos. For example, the Library of Congress created a “What’s the story?” feature for its Flickr online photo galleries to crowdsource the identifications of unknown photos in its collections.23 Text boxes and tags can work here too, but for certain types of identification tasks, automated and AI-based tools can help.
Historypin began as a “then and now” site, encouraging people to upload and/or “pin” photos to a map,24 and later created functions for GLAMs to bulk upload images for identifying or locating on a map.25 Organizations with photographs or artworks could post requests for information such as dates and locations.26 For example, the Imperial War Museum asked for help identifying the locations of places depicted by official war artists27 and the Year of the Bay project asked people to identify locations around San Francisco.28
A more general type of annotation is tagging, where contributors apply multiple short textual labels to content across a range of topics. Tagging allows items to be classified into multiple categories simultaneously. It can also include an option for participants to assign their own tags beyond what the project staff may have originally conceived, which can be a powerful way to generate a bottom-up analysis of materials. Since tags and keywords work best when they are reused by multiple participants (and on multiple items), it is helpful if the platform can support an “autosuggest” or “autocomplete” feature where similar tags automatically appear by the textbox as the contributor is typing. The value of a tag is diminished when, for example, the singular and plural version of a word is tagged separately, but you might also value the enhanced discoverability that variants, including misspellings, of a tag, provide and encourage unique tags.29
Reviewing tasks, unlike collecting or analyzing tasks, do not seek to create new data or metadata. Instead, as the name implies, they focus on reviewing, validating, or improving existing data provided either by previous participants or automated techniques. In a typical reviewing task, a participant sees the content and (if applicable) any previously completed analysis. Then they are asked to review it and “approve” or “reject” the work. A more elaborate version of reviewing tasks may also allow participants to provide feedback if the original work was performed by a human, either by writing freeform comments or selecting from a list of common problems. However, feedback tends to be uncommon because it requires more time and effort. Additionally, many crowdsourcing workflows do not provide a mechanism for participants to receive feedback on work they have already completed. A final type of reviewing task asks participants to directly make edits or improvements to the content, rather than simply rejecting it or leaving feedback for someone else to correct.
In The Wealth of Networks,30 Yochai Benkler defined “peer production” as a genre of online collaboration where many individuals contribute towards a shared goal, citing key examples like Wikipedia and open-source software development. He identified several common task designs that allow these projects to succeed against all odds. The tasks were modular, granular, and easy to integrate, and community members could self-select the tasks they worked on. These same elements apply well to crowdsourcing cultural heritage.
Modular tasks can be easily divided up into multiple components. A key benefit of modularization is that it allows for parallelized crowd work. If the components are decoupled rather than interdependent, then multiple participants can work on each task simultaneously, greatly reducing the total amount of time necessary to complete them. Further, if one participant struggles or quits, others are not bottlenecked. The flip side of modularity is reintegration; it does not help to split up tasks into many components if it is difficult to bring the pieces together later. A well-designed task can be both divided and reconstructed with minimal effort by the project team members or participants.
Granular tasks are tasks that can be modularized into very small pieces. These may also be referred to as microtasks. The key benefit of granularity is providing many different sub-tasks for participants to work on. It may be tempting to make all tasks fine-grained, to reduce the barriers to entry as much as possible. However, Benkler recommends designing crowd tasks that are heterogeneously granular, meaning the tasks have many sizes ranging from tiny to much larger. The idea is that fine-grained tasks are a great fit for newcomers or participants with just a few minutes to spare. However, as participants get more involved with the project, they often want to apply their growing skills to more challenging work. A project that can offer both fine- and coarse-grained tasks is better able to keep participants around longer and is more flexible to their changing ability and availability.
A classic example of heterogeneous granularity is editing Wikipedia articles. A newcomer may start with very fine-grained tasks like fixing a typo, uploading an image, or adding a reference. Over time, as they become more comfortable, the participant may advance to writing sentences, paragraphs, and eventually drafting entirely new articles. The task design and user interface of Wikipedia allow participants to find tasks of almost any granularity.
The self-selection of tasks enables participants to decide for themselves which tasks to work on within peer production environments. Self-selection is highly efficient because people are generally good at figuring out what tasks fit their particular skill level, interests, and time commitment. From a task design perspective, the implication is that crowdsourcing systems need to provide good ways for participants to find and discover appropriate tasks within all of the possibilities. Thus, the platform needs to have effective searching and browsing tools, and for tasks to be organized and modularized in intuitive ways. Empowering participants to select their own tasks, matched to their skills and interests, may also align with project values around agency and choice.
Note that self-selection of tasks is not necessarily at odds with serendipity or randomness. It is possible to have both. For example, a participant might be allowed to specify the size or time commitment of a task, and the general topic. This self-selection ensures the task will fit their interests and availability. But within those constraints, many different pieces of content could be shown to the participant. This setup strikes a balance between the individual participant’s needs and those of the overall project.
Another key issue in task design is determining which parts of the task should be completed by humans, which should be fully automated, and/or which should be some combination of human and computational effort. a humanist approaching a crowdsourced cultural heritage project may initially assume the tasks will be mostly or entirely completed by humans, while a computer scientist approaching the same project may assume most or all of the tasks will be automated. Indeed, the most popular paid crowdsourcing platform, Amazon Mechanical Turk, carried the tagline “artificial artificial intelligence” for years, a tongue-in-cheek reference to computer scientists who turn to crowdsourcing only when AI approaches fail.
Both extremes bring ethical risks. On one extreme, a project’s community may hold that some tasks should not be automated, even if software already exists for the task. Software may contain unacceptable bias, such as worse performance on material related to underrepresented groups, and processing cultural heritage materials with the software could proliferate the impacts of those biases. For example, a text analytics tool like Named Entity Recognition (NER) automatically extracts person names in transcribed text. However, NER is trained on data from Western countries and may disproportionately fail to recognize non-Western names, making them inaccessible in the resulting data. Additionally, a fully automated approach may strip away clear accountability, or it may deprive participants of opportunities for learning or engagement. On the other extreme, human participants’ time is valuable, and it is potentially unethical to ask them to devote hours of work to tasks that are unnecessary or more appropriately automated.
OCR correction is one example of humans improving machine output. The example below shows one contributor’s response to being asked to correct OCR: "I am sad to report I have found numerous errors, too many to even begin to fix, within these pages... It will be much easier to completely and correctly transcribe from the beginning than try and fix ALL the typos. Would you like me to do this for the Library?"31 If you receive feedback like this from your participants, you should revisit your task. If the OCR output is not accurate enough to reduce the amount of human work, then transcribing the typewritten material would probably be a better use of participants’ time and effort.
A third, better option is to consider the complementary strengths of humans and computers while designing the tasks. Some tasks are best performed by humans, others, by computers, and many projects combine both. Further, the choice between a human task and an automated one is often a false dichotomy. Within a workflow (discussed in detail below) or even within a task, it often makes sense to support a collaboration between humans and computation, allowing them to accomplish more together than either could alone.
The University of Oxford’s Visual Geometry Group trained and tested Convolutional Neural Networks (CNNs) with tags applied to oil paintings from Art UK’s (then called the Public Catalogue Foundation) Your Paintings Tagger project. An interface was created that showed dozens of paintings with an automatically assigned label, such as “moustache.”32 A human reviewer could then mark paintings that did not depict a moustache with a single click. This data could be then used to improve the underlying software model. Tags for 200,000 oil paintings were checked and imported back into the Art UK website as a result of this work.33
Taking advantage of these complementary strengths first requires understanding what they are. In an age of rapid advances in AI and machine learning, the question of what tasks computers can perform as well or better than humans is a moving target. In areas of intense research and commercial interest in AI technologies like natural language processing and computer vision — areas which are also deeply relevant to cultural heritage applications — the state of the art can dramatically advance even within a year or two. Given the ever-evolving technology, we will not attempt to summarize the latest capabilities, an effort which is sure to become outdated almost as soon as the text is written. Instead, we will review some more fundamental and, hopefully, durable principles of human-computer collaboration that can help you make decisions long into the future.
Crouser and Chang propose a framework that lists some of the core strengths of humans versus computers.34 The target audience of their paper is primarily computer scientists in fields such as human-computer interaction and visualization, but the framework is nicely applicable to designers of cultural heritage crowdsourcing tasks, as well. They identify six main affordances35 (i.e., strengths) for humans: visual perception (i.e., recognizing items in an image), visuospatial thinking (i.e., reasoning about spatial relationships of objects in an image), audiolinguistic ability, sociocultural awareness, creativity, and domain knowledge. For these affordances, humans — sometimes even without special training — outperform even the best computational efforts.
Cultural heritage crowdsourcing projects frequently make use of several of these affordances; i.e., transcribing documents or audio clips and tagging photos all leverage aspects of visual perception, visuospatial thinking, sociocultural awareness, audiolinguistic ability, and domain knowledge. Perhaps only creativity has not yet seen much attention from task designers in cultural heritage domains, but it seems to be a missed opportunity.
On the other hand, Crouser and Chang suggest that “machines” (i.e., computers) have at least four main affordances: large-scale data manipulation, collecting and storing large amounts of data, efficient data movement, and bias-free analysis. These first three affordances point to the advantages of computers for storing and rapidly making structured changes to very large datasets. They clarify that the latter is “apart from human bias introduced during the programming of the system” — while computers and AI models exhibit their own biases based on their training data and algorithm design, they can also be programmed to counterbalance known biases of human contributors (e.g., confirmation bias).
Some of the best results can come from task designs that combine these affordances in effective ways. For example, NYPL Lab’s Building Inspector combined computer vision with crowdsourcing to digitize 19th-century fire insurance maps of New York City across the decades.36 The computer vision tool made an initial pass on the dataset attempting to detect building footprints, and then human crowds reviewed and corrected the automated outlines.
Civil War Photo Sleuth (CWPS)37 is a free website that combines crowdsourcing and AI-based face recognition to identify unknown people in photos from the American Civil War era. To identify an unknown photo, a participant uploads the photo, tags the visual evidence in the photo (e.g., military uniform details), and applies search filters (e.g., limiting to a certain US state). The software returns potential candidates, sorted by facial similarity, whose military records match the visual clues and search filters used by the participant. The site has a reference database of over 35,000 Civil War-era photos from public and private collections contributed by participants.
A common misconception among participants and the public is that face recognition technology is perfectly accurate and better than human judgment. The research is more mixed, and instead points to complementary strengths of humans and AI. In CWPS, face recognition is used to narrow down possibilities from thousands of candidates to a shortlist of promising candidates, including (typically) many false positives. Then, human users inspect and compare the shortlisted candidates to determine a potential match. Humans tend to be better than AI at fine-grained facial comparisons since they can examine hair and facial hairstyles, ear shape, skin imperfections, and other details that AI-based face recognition tends to ignore. Humans can also consider the broader context, such as whether it makes sense that the mystery photo’s subject would be photographed in a particular city.38
The CWPS website was designed intentionally to prevent facial recognition from automatically identifying any photos. Only human participants are permitted to link a face to a name. This task design recognizes the accuracy limitations of AI-based face recognition, but also provides accountability for data generated by the site, and ensures that person identification remains a fundamentally human-driven process.
Increasingly, using computation for crowdsourcing projects means artificial intelligence, deep learning, and machine learning. These techniques are increasingly accessible as well as powerful since many tech companies make their AI/machine learning (ML) services available to any customer via the cloud. The opportunity for cultural heritage projects to gain access to the latest advances in AI/ML is exciting but also presents some important caveats.
First, most commercial cloud AI services are not free, so their cost needs to be factored into the overall project budget, and typically, higher participation in your project leads to higher service fees. Second, unlike traditional computation and algorithms — which tend to be highly structured and predictable — a hallmark of AI/ML is that the results are unpredictable. Indeed, the results generated by AI/ML models, such as suggestions of related documents to transcribe or an initial set of proposed keywords for an annotation task, might not be explainable even by a software engineer who developed the model. Thus, incorporating AI/ML into a cultural heritage crowdsourcing task is not always appropriate and requires careful consideration of the trade-offs. Some AI-generated results may be serendipitous, time-saving, and valuable. Others, however, have the potential to confuse or misrepresent content and offend or alienate contributors.
To help user interface designers consider these trade-offs, Kulkarni and Kery proposed a checklist of criteria for determining whether to use AI/ML for a project (see the table below). They conclude that an AI-based approach will probably be better for task designs where recommendation and personalization are important when participants need to use natural language, and when the goal is to identify rare, constantly evolving examples, or to recognize a general class of items with many small variations. On the other hand, AI is probably not a good choice when the predictability of the user experience is critical, when the cost of errors is very high, when explainability of each AI output is important, or when contributors explicitly say they do not want AI to be used.
AI probably better
AI probably not better
The core experience requires recommending different content to different users.
The most valuable part of the core experience is its predictability regardless of context or additional user input.
The core experience requires the prediction of future events.
The cost of errors is very high and outweighs the benefits of a small increase in success rate.
Personalization will improve the user experience.
Users, customers, or developers need to understand exactly everything that happens in the code.
User experience requires natural language interactions.
Speed of development and getting to market first is more important than anything else, including the value using AI would provide.
Need to recognize a general class of things that is too large to articulate in every case.
People specifically tell you they do not want a task automated or augmented.
Need to detect low occurrence events that are constantly evolving.
An agent or bot experience for a particular domain.
The user experience does not rely on predictability.
Workflows describe how people and tasks will be linked together, and in crowdsourcing this will refer as much to the project team (or back end) experience as it does to the participant (or front end) experience. As discussed in the “Aligning tasks, platforms, and goals” chapter, you should pay careful consideration to designing effective workflows for both these user groups, aiming to be clear, specific, and user-centered. Workflows should have a goal or purpose, with a clear beginning and end. They should be as specific as possible, supported with documentation enabling the reader to undertake the work without any additional explanation. A well-designed workflow will concern itself as much with the user experience (UX) of the work as with the work itself. While no workflow can account for every user and use case, potential difficulties can be avoided by questioning assumptions about the crowd or project participants. For example, telling a user to “click the blue button” may not be adequate for users with color vision deficiencies.
In designing the project team workflow, it is helpful to allocate specific roles with responsibility or ownership for particular workstreams. The first step to accomplishing this is to isolate tasks or a series of tasks in a workstream. As much as possible, each task should be an independent process. Then assign ownership of these tasks to specific roles or groups, such as contributors, engineers, communications, and managers. Additional information on roles can be found in the “Managing cultural heritage crowdsourcing projects” chapter.
Some of our authors suggest that if your project is really small, it may help to think of these roles as “hats,” such as “this task is a job for when I am wearing my Project Manager hat” or, “this other task will be for when I am wearing my Public Relations hat.” While it may be tempting to relax these guidelines for clarity and specificity when you will be the one performing them, it is valuable not to do so. By keeping tasks isolated, they are more easily repeatable, and analyzable, and, in the case of mistakes, confusion, or failures, they are more easily identified and repaired.
The most important workflow to consider is that of the participant or crowd in your crowdsourcing project. The more people who will be performing a particular task, the more opportunities there will be for misunderstandings, so this group would benefit from clearly written and specific documentation that outlines the work required from participants. Always imagine a participant’s experience of the workflow at every step, and question your assumptions. It is important to recognize that participants may come to the project with a wide variety of computer skills, and what is intuitive to you may not be intuitive to them (think “click the submit button” level of specificity). We strongly recommend using screenshots and even short screen recordings where appropriate.
Moving into the back end — project team roles — software developer workflows will probably be the least specific, as those skills are often more unique, and the developers themselves are often the best equipped to understand the steps needed to complete their tasks. Still, they should be clear in their goals and expected outcomes. Define the expected functionality which will be enabled by crowdsourcing can be done through user stories. An example might be a detailed instruction like “after crowdsourced transcripts are imported into the library system, a full-text search for ‘onion’ will return recipes that include ‘onion’ and ‘onions.’’’
Other project team roles that may need to adapt their typical workflows to the specifics of a crowdsourcing project are the communications team. As with developers, clearly described goals and expectations are keys to success, but step-by-step instructions are unlikely to be required. What are the unique communications needs? If these needs are not clear, ask them! For example, communications teams have a unique and important relationship to their timeframes: a 2-week “social media outreach” plan is fundamentally different from a 2-year version.
Finally, the project managers, who are often the high-level project designers as well, and will have a diverse and broad list of duties (see the “Managing cultural heritage crowdsourcing projects” chapter). In addition to normal, day-to-day project management tasks and reviews, project managers should also maintain a document of the highest-level workflow: the puzzle whose pieces are the other workflows. The project management puzzle pieces could also include collating feedback and overseeing the evaluation (see the “Evaluating your crowdsourcing project” chapter). Quality control and assurance are important tasks that will also require workflow documents (see “Working with crowdsourced data” chapter).
Careful workflow design should also be applied to the full lifespan of a project, including detailed thinking about what happens when the crowds disperse. Unique to crowdsourcing projects is the need to aggregate the results, and publish them — which is often ingesting them into a discovery system (catalogs, finding aids). While the details of your finished product may not be known at the outset, the discovery system should be, so even if you cannot create an ingest plan, it is important to understand the needs of that system as you plan the outputs — the data objects — of your workflows.
How does crowdsourcing work for projects in a non-dominant language such as English? How can crowdsourcing work for speakers of Indigenous languages or material about marginalized communities? Working with different languages poses many challenges, both in terms of task design (how do you make your platform bilingual or able to handle different scripts?) and in finding and training participants. Language choice can impact who can perform a task, but also who feels welcome to join the crowd. Like the many other examples included throughout this book, language choice will signal values embedded in your project.
Language choice not only concerns which language(s) contributors can use; other aspects that can influence inclusion are the tone of voice, choice of terms, use of specialized vocabulary, or sentence complexity. You may wish to use a tool to evaluate the readability of your project text (e.g., a Flesch-Kincaid readability test40); many options for this are available for free online.41
Projects working with language material may face various challenges that affect task design. Not all languages use the same alphabet, for example, and if contributors need a different keyboard to transcribe material in a particular language this may limit who can take part. It is possible to design language tasks that do not require language-specific keyboards, though, as shown by the Zooniverse Ancient Lives project. In the project, participants transcribed Ancient Greek text found on fragments of papyri. Rather than expecting everyone to have access to a Greek keyboard, the project platform provides an on-screen clickable display where the contributor simply chooses the relevant letter(s). The task is to match the character seen on the papyrus fragment to the relevant letter on the keyboard and does not require the transcriber to actually know Ancient Greek — an example of how a language-based task can be designed to allow anyone to take part, even those who do not know the language.
Even modern-language projects can benefit from on-screen clickable keyboards, as Scribes of the Cairo Geniza presents a Hebrew character helper to participants.42 Participants can choose from modern characters (shown below) or many common script types found in the Geniza corpus. These keyboards were created by the project team to enable participants to choose the script-type keyboard that most closely resembles the text they are asked to transcribe.
Crowdsourcing projects can enable wider access to cultural heritage material. Transcribed text can be used for searching, making it easier to identify and retrieve relevant material. Text-to-speech applications such as screen readers can make the content available also to those who cannot see or read the original text. By translating textual material including manuscripts or photograph captions into more accessible formats, these items can be made available to a multilingual audience. Translation can be performed either as a post-transcription step or as an independent process. Transliteration of material from obscure or obsolete scripts also enhances access.
A wiki-specific functionality is creating a Wikipedia page in one language by translating a page from another language (often from English). There are over 300 language version Wikipedias, and since they are generally developed independently by speaker communities, the coverage of content can differ a lot from version to version. The Content translation tool can be used to create a page by translating from another language.43
The tool allows editors to reuse as much or as little content for their initial version to later keep editing it with their usual editing tools. It automates the more mundane steps of creating an article: copying text across browser tabs, looking for corresponding links and categories, etc.
Edinburgh University, supported by Wikimedia UK (a UK-based chapter of the global Wikimedia movement) has been running a Wikipedia in the Classroom project, where students create Wikipedia articles for course credit. The Translation Studies MSc Wikipedia project44 involves about 20-50 students a year all peer assessing one another’s work and then publishing their 1,500-2,000 word translations to a different language Wikipedia.45
In the early 20th century, writing of the Malay language was converted to a Roman-based script from Jawi, an earlier Arabic-based script. As a result, texts from the 19th-century and earlier are not very accessible to modern Malay speakers. The Jawi Transcription Project run by Mulaika Hijas engages participants to transliterate Maylay folk tales published in Jawi into modern orthography so that these materials are available to modern readers.46
When working with colonial material about Indigenous communities, it is important to provide space for the communities to question and criticize the material in addition to any crowdsourcing task. Something as simple as a comments field on a crowdsourcing platform has been used by community members to “talk back to the archives” by explaining what the source material got wrong about their community, or to redact sacred rituals which should not be shared outside the community.47
English language projects working with material in languages other than English will be drawing on the expertise of a specific community of participants and may need to do specialized outreach and task design. The British Library’s Arabic Scientific Manuscripts transcription project48 began with an on-site event that attracted Arabic-speaking residents of London, introducing them to the physical collections, explaining the project, and training them on the task. This initially targeted outreach allowed them to engage participants who were fluent in both Arabic and English, motivated, and well qualified to work on the project.49
In some cases, the language of the platform and instructional materials may need to be translated for a community. Platforms should serve the expectation of the community, rather than match the language of the crowdsourced source material. The British Library’s Endangered Archives Program translated the Zooniverse platform interface into Russian to serve the likely participants in a Siberian Photograph collection.50 The Europeana 1914-1918 project created versions of their collection platform in 18 languages to reach communities across Europe.51 The Royal Archives of Cholula contain material written in Nahuatl, which is accessible to the Nahuatl-speaking people of Puebla. Serving this community required the crowdsourcing platform to be made available in Spanish, the language of most Mexican websites.52
Task design is where your vision for the crowdsourcing project becomes the most concrete. Designing effective tasks requires finding a match between your participants, tasks, data, and goals. As we highlighted in this chapter, knowing your participants’ backgrounds and expertise, and creating opportunities for them to learn, progress, and achieve their goals, can go a long way towards finding this match. Task types vary widely, but can generally be categorized as collecting (gathering or creating material), analyzing (e.g., transcribing, annotating, and translating material), or reviewing (reviewing what computers or other participants have done). Providing tasks of varying sizes (from small to large) can help participants self-select for work that fits their interests, skill level, and available time.
It is also important to consider which parts of a task should be completed by human participants and which should be automated — in this chapter we explore possible answers, looking at norms within your community, and the capabilities and limitations of AI. Finally, since task designs in different languages pose unique challenges (different scripts, spellings, and cultural expectations, that require careful attention), we take time to highlight these issues too.
From here, you could proceed to the “Managing cultural heritage crowdsourcing projects” chapter, or explore how design is influenced by participants in the “Understanding and connecting to participant motivations” chapter.