by Mia Ridge, Samantha Blickhan, Meghan Ferriter, Austin Mast, Ben Brumfield, Brendon Wilkins, Daria Cybulska, Denise Burgher, Jim Casey, Kurt Luther, Michael Haley Goldman, Nick White, Pip Willcox, Sara Carlstead Brumfield, Sonya J. Coleman, and Ylva Berglund Prytz
Published onApr 29, 2021
7. Aligning tasks, platforms, and goals
Aligning tasks, platforms, and goals
This chapter will help you to formulate the questions that you will use to map the components of your project onto available tools and platforms for crowdsourcing. It is important to keep in mind that there may not be a single platform that perfectly fits your project. In this chapter, we suggest some ways to approach and define a series of needs.
In many cases, cultural heritage crowdsourcing projects have goals that require transforming source material into forms of data to be used toward the goals of the project, and through the efforts of a set of participants. So the project design considerations discussed in other chapters — including the selection of tools, platforms, and especially tasks — should be grounded in needs that speak to the overarching objectives of the project, its values, available resources, source material, and participant needs.
In this chapter, we will list some common platform features for cultural heritage crowdsourcing projects, as well as offering tips for ways to evaluate platforms based on the needs of your team as well as the needs of your project participants. Ultimately, the goal is to find a solution that supports as many of your considerations as possible without sacrificing your project values. It may take some time to go through the process of evaluation, and the content in this chapter can help you to do this in a way that ensures your project will be beneficial to both you and your participants.
A good starting point for this decision-making process is to ask yourself some framing questions to determine the essential platform features that will allow you to enact your values and achieve your goals. These questions may include:
What results do you want from the source materials and tasks in your project?
What will your participants need during their experience? What will they need to contribute to the project goals?
What will you need to manage your project effectively?
Can your goals be met by designing a series of tasks?
What resources and skills are available to you throughout your project?
As you consider these questions, remember that more technology does not always lead to a better project, and that “the perfect is the enemy of the good.” Technological framing can be useful to identify the minimal components of your project, including the types of data you need and your overarching goals. Then you can allow those components to drive your search, rather than trying to identify a platform that will incorporate all of the potential features you are considering. If possible, aim for iterative design and consider that you may need to use a series of different tools to achieve your objectives. Features that seem desirable in theory may be less so in practice, and once you start a project you may discover the need for functions you had not previously considered.
The tasks available to your project are a combination of what is possible with your source material and chosen platform. If you are aware of the range of task types and how each may suit your goals, you might find the next sections of this chapter helpful to direct your next steps in exploring technologies. If not, you may wish to first consult the “Choosing tasks and workflows” chapter, which provides in-depth descriptions of common tasks and considerations in cultural heritage crowdsourcing.
The availability of platforms has dramatically reduced the overhead of starting a crowdsourcing project. It is also becoming easier to explore different platforms, either by trying their tasks as a participant or through talking to other project teams. Some platforms that are described in this chapter include FromThePage, Zooniverse, Omeka + Scripto, Pybossa, Concordia, Madoc, and Transkribus. Each presents its own set of functionalities and we encourage you to explore their suitability given the resources you have at hand and the needs of your project.
Choosing a platform for your cultural heritage crowdsourcing project means balancing many different factors, including its affordances — that is, the perceivable aspects of a device or interface that provide clues about how to use it, and therefore what it can do.1 This includes both the affordances available to project teams and administrators on the back end of a system and those available to participants and contributors using front end interfaces. Some platforms, such as Wikipedia, Wikidata, and Zooniverse, may also come with an existing community attached to them, who could additionally engage with your project, for example by generating open data relating to your content.2
Connecting project needs with technologies
In the “Designing cultural heritage crowdsourcing projects” chapter, we looked at ways you might plan your high-level project goals and objectives, consider your access to resources and technical expertise, and define your stakeholders. With those decisions in mind, you can focus on what you and your participants need to accomplish your shared goals, and how that might map onto specific tasks, platforms, and features for each dimension of the project. Key dimensions include your project’s values, source materials, and data considerations, crowdsourcing tasks, customization, features for community management, resources, and pipelines.
In the “Identifying, aligning, and enacting values in your project” chapter, we discuss the importance of articulating values for your project and the ways values can be enacted. The technologies you choose to adopt or design for your project are an excellent example of how you can carry your values forward. Defining which tasks to perform and selecting platforms, however, are just part of the decisions you will make. Your values are also connected to your responsibilities as a project organizer. Your participants are not the only ones who have needs that will ideally be supported effectively by your chosen technologies; you and your project team also have needs that can be met through careful assessment of functionality, level of customization, and administrative tools.
As we have discussed in other chapters, your decisions — from selecting technologies, to committing to user testing, to sharing outcomes and data — reflect the power you hold concerning your project and participants. You should devote as much care to the online “user experience” (or UX) as you would to designing a building, exhibition, or any other public space — and even more so, given that the tasks and platforms you implement may be some people’s only experience with your institution or project.
Case study: design principles: Concordia and By the People
Before adopting or developing any technology that would form the foundation of the By the People program at the Library of Congress, the project team and stakeholders came together to define our design principles.3 The design principles focus on establishing an atmosphere of trust and approachability to complement the Library’s authoritative reputation. The key themes in the Concordia design principles are: Engage, Understand, Connect, and Grow. The design principles represented not only the needs of potential future participants in ways that could connect to functionality but also the essential dimensions that would guide the community management approaches. We decided to develop a platform that could continue to grow following its launch and accommodate the skills and expertise of the Library of Congress software development teams.4 We created Concordia; an open source platform that was developed using the design principles, user-centered research, and data needs defined by the source material. While it represents the values of the project team and practices of the Library of Congress, it remains open for use by other organizations.
Other ways you can bring your values into your platform and task choices include:
Creating methods for participants to share their feedback
Ensuring accessibility in your project
Reading the documentation of potential tools and platforms to understand the values they represent
Defining what types of data you will collect about participants and their activity, and who has access to that data
Auditing the partnerships of platforms before you adopt them
Maintaining iterative design that allows you to address inadvertent movement away from project values
Accessibility in the online context means making content available and accessible to those with disabilities; e.g., making content and software readable with screen readers for people with impaired vision. If your crowdsourcing project is launched on behalf of a government entity in the United States, there may be applicable laws (e.g., Section 508 in the US) requiring a baseline level of accessibility.5 In the UK, websites for public sector bodies must meet accessibility standards,6 and many other countries have their own specific requirements or draw on standards set by W3C Web Accessibility Initiative (WAI).7 Crowdsourcing tasks by nature render inaccessible content more accessible by transcribing, describing, or identifying material in source materials. Ironically, this means that many crowdsourcing projects may have to work with images, descriptions, metadata, and source systems that are not themselves fully accessible.
We believe that accessibility, however, is broader than just legal standards and that crowdsourcing projects should seek to involve as many of the crowd as possible (this is discussed in more detail in the “Why work with crowdsourcing in cultural heritage?” chapter). There are many options for making your project more accessible. Some common starting points to ensure accessibility is foregrounded in your project include selecting platforms with clear and consistent navigation options, simplifying the design, and adding alternative or supplemental image content when possible. Incorporating easily identifiable feedback and avoiding the temptation to rely on color alone are other features to consider. You can also take steps to understand best practices and undergo training to learn how to advocate for accessibility.8
Source material and resulting data
Designing a crowdsourcing project means approaching your options for tasks, tools, and platforms with an eye to a set of intersecting factors. One decision-making tandem: the source material and the types of data required to meet the project goals. Determining what you have (source material) and to what form you would like to transform it (data) will help define the types of pipelines (discussed in detail later in this chapter) or tools you will need to adopt, as well as the types of support your participants will need to help you get to your preferred outcomes. Where you fall between those needs will help determine what type of platforms and tasks are necessary for your project.
Some things to consider when considering the transformation process between your source material and resulting data might include the following: the content and subject matter of your source material; forms, formats, metadata; how you will approach communicating rights and use of data; whether a platform offers support for processing data; any data requirements of platforms (pre- and post-project). Source materials for your project and the types of data into which the source material will be translated may factor significantly into what tools and platforms you select.
For example, transcription tasks may have very different user interface needs based on categories of texts. Examples include:
Free-form text (such as letters, diaries, field notes, or legal documents) is usually transcribed as full text, producing textual data that can be printed, incorporated into scholarly work, recognized by screen readers to assist blind or visually impaired users, or ingested into digital library systems to support full-text search.
Form-based text (such as questionnaires, marriage certificates, or typewritten index cards) contains structured information. To support these documents, crowdsourcing platforms may present the user with specific entry fields such as “Surname,” “Place of Birth,” or “Regiment,” producing structured data that allows institutions to index collections and lets researchers perform quantitative analysis of the documents.
Ledger-style or tabular documents (including census forms, ship logs, registration lists, or financial accounts) also contain structured data but require much more sophisticated interfaces to allow participants to transcribe many records per page, as well as header or footer fields which apply to all records on a page.
A single transcription platform rarely supports all of these types of textual documents. Mis-matching your platform to your source material is likely to cause frustration for participants and less than ideal results for data outputs. A very good way to determine whether a platform or task is appropriate is to find an existing crowdsourcing project with a similar type of source material and work through the project as a participant. What do you like about the project and the tasks you are undertaking? What would you change about them? Keep this feedback in mind as you explore your options.
Most of the platform affordances you will consider for the crowdsourcing stage of your project focus on 1) the tasks you will ask participants to complete and their experiences completing those tasks, and 2) the kinds of data to which you will have access to enact your overarching project goals. Close attention should be paid to the forms and formats of data resulting from these tasks, including data from the source material, activity data, and data that can be used to improve your management of those participants.
In the “Working with crowdsourced data” chapter we will look in detail at the steps for managing data, thinking first about design and ethics (such as the rights and use of data for participants and the general public), followed by different crowdsourced collection strategies, then finally how the data will be processed and archived.
Tailoring your needs to each of these steps should be a major consideration in your choice of platform — not all platforms support project teams through each of these steps. Once the crowdsourced data has been collected, some platforms enable features to help you clean up the data submitted by participants and present the results for scholars, the public, or whoever your target audience is. Other platforms provide little or no functionality in this stage — their purpose is to help you collect the crowdsourcing data, assuming that displaying or visualizing the data would be achieved with another tool. For example, Zooniverse Project Builder data is exported as a CSV file (i.e., basic spreadsheet) with text transcription data contained as embedded JSON (a notation for describing content), which needs to be imported somewhere else to be usable. The Zooniverse Aggregate Line Inspector and Collaborative Editor (ALICE) was built as an option for teams working on text transcription projects, to allow them to review and edit data before exporting it from the platform with more robust file export options, including plain text.9 Other systems also provide bridging functionality to help transform and present data. For example, FromThePage exports crowdsourced text in several researcher-friendly formats (HTML, plain text, TEI-XML) as well as writing data directly into digital library systems via API access.
Some cultural heritage crowdsourcing projects aim to collect and analyze content by putting it directly into a collections management system, such as Omeka. These systems often allow for a variety of plug-ins geared towards presenting and interpreting digital cultural heritage projects generally, but they work just as well for crowdsourced projects — the system does not distinguish the source of the data. For example, the Neatline suite of plug-ins for Omeka allows project staff to tell stories with their crowdsourced results by creating interactive displays, such as visualizations, maps, and timelines, from visual and even textual data.10 In other cases, data may be documented and shared as a data paper11 or ingested into public access catalogs. This is a major consideration for many project organizers; designing processes and workflows to ensure easy import and export of data in portable formats is discussed further in the final section of the “Working with crowdsourced data” chapter.
Another essential decision-making factor will be the conditions of use of your data, as defined by the platform and its values. For example, project teams using the Zooniverse Project Builder are asked to share the data generated through their project after two years.
Perspective: requirements for using the Zooniverse Project Builder — Samantha Blickhan, Zooniverse Humanities Lead
Anyone can use the Zooniverse Project Builder to build a crowdsourcing project, free of charge. However, to be promoted to the wider Zooniverse community via e-newsletters and listed on our Projects page, teams need to agree to the Project Builder policies.12 Our policies include a requirement to “make the resulting data open after a proprietary period, normally lasting two years from project launch.” The policies also include a requirement for teams to communicate research findings to their communities and acknowledge Zooniverse in any resulting publications. We put these policies in place as a way to ensure that teams are building equitable partnerships with their participants and creating well-designed projects that have specific goals in mind, to avoid wasting participants’ time.
Considering crowdsourcing tasks
Crowdsourcing tasks most frequently provided by platforms can be roughly grouped into three types: collecting, analyzing, and reviewing. Below we provide an overview of what you might consider when determining which of the most common tasks make sense for your project. For a broad overview of project types, see the “What is crowdsourcing in cultural heritage?” chapter, and for a more thorough review of tasks and their use in cultural heritage crowdsourcing, see the “Choosing tasks and workflows” chapter. While the platforms that support these tasks are subject to change, we are documenting them in this way to help you map between tasks and platform features.
Toolkit: considerations for selecting crowdsourcing tasks
Ease of submission process for participant
Utility and completeness of resulting data
Need for quality control
Submission formats and size
Subjective, complicated source material
Safeguarding for illegal or irrelevant submissions
How to provide context
Levels of participation, expertise required
Communicating Rights management
Preferred pace of completion
Defining metadata needs
Segmenting tasks and multiple steps
Collecting tasks involve participants collecting data or generating the content, then sharing it with your project. These tasks can also include asking participants to share what they have (items) or what they know (context).
Asking your participants to contribute content to your project as part of a collecting task can have a considerable impact on the type of platform you select. User-generated content (UGC) can require additional workflows, increased storage space, and careful legal considerations to handle the influx of materials from outside of your organization. The more you can plan for the possible issues raised by this collecting task, the more likely it is that your selected platform will be able to handle the determined needs of your project.
Ease of uploading workflows — when considering the technologies that would be most appropriate for your project’s goals, you will want to think about the experience of your participants. How easy will it be for them to submit data and other materials to you?
Impact of file format and size — some media files, such as photos, videos, and audio recordings, can be very large. Will the platform you select be able to automatically process larger files and compress or reformat as necessary, or will participants be required to resize files before submitting?
Relevance and legality of content — as UGC projects in cultural heritage discovered in the mid-2000s,13 whenever you create an opportunity for the public to upload media files, some of the uploaded content could be irrelevant to your project or even inappropriate or illegal. Should your platform support the accountability and monitoring of UGC?
Rights management — when contributors upload content, they should have a good understanding of what will happen to their content and how it will be used, and some authority over how the material will be used or attributed. Can your platform track these Rights managements?
Metadata — when contributors upload content, they should also provide some basic metadata to attach to that content. Metadata will also need to be able to track contributor information. Platforms can provide help text for forms at the time of upload to guide participants in documenting this metadata.
While no single platform may offer the ideal solution for all of these concerns, knowing in advance how different tools and procedures will manage these needs can guide your platform selection.
Analyzing tasks involve participants analyzing or interpreting content. These include tasks where participants type in or describe what they see, or share what they know about the project content. Common analyzing tasks include annotation, tagging, classification, transcription, correction, and translation. Increasingly within projects, analyzing tasks are either the result of or intended to create data to be used for artificial intelligence and machine learning workflows. In that case, the analyzing tasks that create data include segmenting or marking out specific areas on an image.
Some core considerations for adopting analyzing tasks include:
What will be the utility of the resulting data?
To what extent does the project require complete data?
What is the necessary pace of completion for you to reach your goals?
What are potential points at which your goals could be broken into a series of tasks?
How will you help participants perform the tasks at hand?
Can you provide scaffolding within the task or workflow, such as autocomplete and controlled vocabularies?
Reviewing tasks focus on reviewing, validating, or improving other inputs, such as previous contributions or the results of automated techniques. If quality control is an important goal in your project (versus, e.g., engaging or educating participants), or if the data or tasks are especially subjective, complex, or challenging, then you will want to take a careful look at the reviewing options afforded by different platforms. A detailed discussion of quality control methods is provided in the “Working with crowdsourced data” chapter.
One question to ask is who you believe is qualified to perform a reviewing task: for some projects, the project team prefers to provide the final round of vetting, while in others, it is appropriate for participants to peer review one another’s work. Sometimes the scale of collections or tasks means that very little coverage by team members is possible. Platform functionality may also affect your decision, as some platforms allow project teams to create or assign a group of advanced users for reviewing tasks — this group may include volunteer participants or people employed by the project. Other platforms allow any participant to review completed tasks, while still others provide no reviewing features.
Reviewing tasks might also ask participants to correct or verify the work of software processes. Increasingly, “human in the loop” systems integrate reviewing tasks into a wider process. For example, Transkribus uses machine learning to transcribe handwritten text but relies on human corrections to train and refine its model. Similarly, the New York Public Library Labs’ Building Inspector used computer vision for the first pass at georectifying fire insurance maps of New York City but relied on human volunteers to review and correct the AI-generated building outlines. Leveraging the complementary strengths of human effort and AI is discussed in more detail in the chapter on “Choosing tasks and workflows.”
What is possible? The platform experience
Choosing a platform to create a crowdsourcing project is about balancing project needs with the quality of the participant experience. In this section, we offer some guidance around the competitive analysis of platforms from the perspective of project creators as well as project participants. Platform capacities are continually evolving, with new features being rolled out and older designs regularly refreshed, so be mindful of this shifting landscape before making your final choice.
Project team (back end) experience
Evaluating the team- or creator-side (back end) interface of a platform can be tricky, as these features are not as commonly discussed in publications about cultural heritage crowdsourcing or reviews of digital projects. Recalling the “Identifying, aligning, and enacting values in your project” chapter, aside from the basic functional aspects to the back end experience, the decisions made here should align with your project values. This could be a consideration of accessibility, such as whether or not the platform code is open source or entails broader political concerns, such as whether the platform contracts ethically with partners and suppliers. For example, GitHub’s controversial decision to contract with ICE,14 the agency responsible for enforcing US immigration policy, or FamilySearch’s relationship with prison labor.15
Once you have undertaken the necessary due diligence on which platforms best align with your values, the next step is to commence preparatory research to compare your best-fit platform options. The best place to begin is with a solid recommendation or referral from a friend or colleague who may have hosted a project on one or many different platforms. If this draws a blank, you may still be able to benefit indirectly from this first-hand experience by reviewing published case studies or conference presentations. You are looking for honest reviews of the challenges and constraints of platforms, so probe a little deeper if all you find are glowing reports of how everything went perfectly. Other methods that may help you to determine whether a platform’s creator interface is appropriate for you and your team include:
Reading platform or project reviews16
Reading documentation and user forums
Reading case studies or examples provided by the platform17
Attending a training workshop or webinar18
Watching online videos19
Testing via a platform’s trial or sandbox version20
Once you feel like you have a good sense of a platform’s organizer interface, you may want to consider additional factors such as:
Does the platform enable you to collaborate with other organizers as you create your project?
Is the platform customizable, and if not, are you happy to live with the color, branding, and design scheme?
Is the platform free to use or on a sliding scale of cost? Are there any potential hidden costs, such as technical support?
How well supported is the platform in terms of tutorials, development, or other resources?
Does the platform provide hosting or does it require your project to be hosted on your organization’s servers?
Is the platform regularly updated and maintained? Will it be around for the duration of your project or any subsequent phases that you may be planning?
Beyond the tasks performed by the crowd themselves, another set of platform features helps project staff monitor, manage, and support the efforts of participants. Some large crowdsourced cultural heritage projects attract thousands or even tens of thousands of unique participants, but even in much smaller projects, it can quickly become overwhelming to keep track of which items and tasks are completed or behind schedule, which participants are succeeding or struggling.
Crowd management features to look for may include: basic descriptive statistics for numbers of unique participants involved, the number of crowd tasks completed (or attempted), elapsed time per task, tools for removing problematic content or banning malicious participants, and options for receiving bug reports or feature requests. These features can make it easier to report on progress to team members, institutional leadership, and participants alike. For example, projects built using the Zooniverse Project Builder come a project stats page, which includes information on the number of participants, a daily classification count, completeness metrics for any live workflows, and the number of posts on the project message boards. To determine more granular metrics such as elapsed time per task, however, you will need to extract this information from a data export using Python or other code.21
More advanced features may include tools for sorting or filtering stats by time frame or categories (such as task type or content type) and visualizations of participant behavior patterns, and alerts about bottlenecks or system performance issues (e.g., unusually high bandwidth usage, server lags, or large file sizes). Tools that are designed for general website usage analytics, such as Google Analytics and Mouseflow, can often be repurposed for crowd management, with good results.
Participant(front end) experience
It is also important to consider the experience of your participants when evaluating platform options for your project. In this section, we will offer guidance around how to evaluate platforms based on the participant experience and interface. A good way to commence this research is to test out as many of these platforms as you can as a volunteer. Find diverse examples of projects that have been built with the platform you are considering. By looking at a range of projects, you can get a sense of what tools are available on a particular platform. Particularly consider the following:
All projects on a platform may be set up in the same way, or different tools may be available depending on the project goals. Some platforms are set up to support a specific task type (collecting, analyzing, or reviewing). For example, Scripto supports transcription projects only, while the Zooniverse Project Builder supports many different task types, including multiple options for text transcription. It is simply a matter of determining what level of customization your project actually requires, and making your platform decision based on your project goals.
You can conduct usability tests on existing platforms by recruiting volunteers to “think aloud” as they go from the project’s front page to completing a task. Krug provides templates for this process.22
Does the platform include a way to communicate with your volunteers, e.g., a forum or discussion board? Can volunteers communicate with one another in this space? Some conversations may happen on social media instead of project forums, but if engaging with your volunteers is part of your project values or goals, a forum may be a project requirement.
If the example projects you are reviewing include a message board or community forum, reading through the posts can help you to get a sense of the community’s response to the project. If the message boards seem to feature a lot of negative or critical feedback about the project tools or have high occurrences of bug reporting, you may want to avoid these options as you make your platform choice.
What kind of access will your participants have to the contributions they have made, and/or their contribution metrics? For example, can they download their data?
How much technical support will the platform provide while your project is active? If your participants are experiencing technical difficulties, is it clear who they will need to ask?
Does the platform meet legal accessibility standards? Does it meet any additional standards relevant to your organization or values?
Is the platform usable on mobile devices? Can tasks be accomplished in mobile views? If inclusion is a project value, this may be a concern for those accessing the internet primarily via mobile devices.
Preparation, pipelines, and customization
One more set of considerations before you choose your platform: do you need machine learning, artificial intelligence, or other tools to help you prepare the data that will be at the heart of your project?
Integrating computational analysis
Computational methods can help with both classification and identification. If your items include text, then Natural Language Processing (NLP) tools can use AI techniques to automatically categorize content in various ways. For example, Named Entity Recognition (NER) can quickly search large quantities of text and automatically categorize proper nouns such as names of people, places, and organizations; dates; money; phone numbers; and other types of entities to be easily extracted for closer inspection. They could also be repurposed as categories or tags. The power of this technique is that you do not need to specify the names you are looking for ahead of time; NLP tools will try to report all the names or other entities.
For identifying entities in photos, several large tech companies provide cloud-based computer vision services that attempt to automatically identify the contents of images. Google Cloud’s Computer Vision service, for example, can rapidly analyze thousands of photos and automatically generate labels describing them, such as “street,” “snapshot,” “town,” and “alley.” These tend not to be very specific, but the same service can also suggest specific places and locations if it recognizes them — these are typically highly recognizable locations with familiar landmarks. If your photos have people in them, AI services can automatically detect (and count) faces, giving you a very rapid sense of which photos in a large collection have people in them. They can also attempt to assign characteristics such as gender, age, and emotion. a few platforms take this idea even further to support face recognition, allowing the user to compare a detected face to a reference database of hundreds or thousands of known faces. Most of these services require the project creator to provide their own reference database, so you can decide which photos are appropriate to include in the search.
However, as with any AI tool, you should be aware of potential biases, including a focus on safe-for-work/not-safe-for-work classifications, a reliance on social media data that reflects social inequalities and prejudices23 and a 20th-/21st-century view and definition of concepts worth identifying.24 Any information added by AI-based tools should be reviewed against your values to ensure that any violations of those values are not included in your results.
If you find that existing platform features seem like they might not perfectly fit your needs, do not be alarmed. One platform will rarely solve every problem. But it is not necessarily the case that you need to embark on the arduous journey of building a team and seeking funding to create a bespoke software platform (although some project creators do, and it can be an exciting and rewarding experience!).
More often, you can make what you need through appropriation. Appropriation involves creative reuse and remixing of existing tools, such as adapting a tool for a slightly different purpose from what its designers intended or combining multiple tools to fill gaps that any one platform cannot address.25 For example, your team may design a pipeline or workflow where one tool is used to collect items from contributors, another is used to annotate and translate them, and a third is used to present the results to the public. Taking time to understand the landscape of platforms and their affordances and limitations, as we have outlined in this chapter, can help you figure out where these opportunities for appropriation lie.
In many cases, you will already have some data or image processing tasks that precede your crowdsourcing task as the first parts of your pipeline. Processing data for re-use in other systems may conclude your pipeline. Platforms built with pipelines in mind include FromThePage and Madoc.26
Remember the advice from the introduction of this chapter: more technology is not necessarily better, and simple and flexible tools can be surprisingly powerful. The utility of basic web-based productivity tools, such as Google Docs or Microsoft Office 365, should not be underestimated for providing the glue and duct tape to allow different, otherwise incompatible tools to work together.27 For example, in The American Soldier in World War II project, the team set up an Omeka and Scripto instance to allow participants to review and validate transcripts generated from a Zooniverse project.28 There was no easy way for the team to divide up the validation tasks among dozens of participants, so they simply created a shared Google Sheet with a list of URLs to each document, grouped by participants. Likewise, the humble tagging feature can be immensely powerful and flexible, and make up for many “missing” features in other platforms. If you can add tags to a document, then you can track their status in a data pipeline, enhance it with metadata to enable new types of searches, assign them as tasks to participants, and much more. It can be a helpful exercise, when seeking out a new feature, to first ask, “could I appropriate tags for that function?”
Another way to think about pipelines for platform selection is to consider the complementary strengths of automated techniques and human participants — a topic we explore in more depth in the chapter on “Choosing tasks and workflows.” AI technologies can often reduce or streamline the work of participants, but they can rarely complete the task on their own. You need not see this as an insurmountable limitation. Instead, it can be helpful to think about how AI or automated techniques can provide a “first pass” on the data, allowing humans to correct the results.
Perspective: the DigVentures collaborative platform for archaeology — Brendon Wilkins, Co-founder at DigVentures
Archaeology is a niche discipline, with specific ethical and logistical challenges around building meaningful participation into research projects. When it came to choosing a platform to crowdsource projects for DigVentures,29 a social enterprise with a mission to expand civic engagement with archaeological research, we decided to take the pipeline approach. Rather than take an off the peg solution, we developed a suite of digital tools to harness the crowd’s passion for archaeology to help fund, identify, dig, record, and research sites. As opposed to a generalized platform that serves all kinds of different projects in the cultural heritage or citizen science space, we have niched on a single discipline — archaeology — to build and deepen our community around a common interest. This approach is not without its challenges, not least of which is ensuring a high degree of technical expertise on your team, as well as the financial resources to maintain and update your platform. But the potential to create a unique solution for you and your crowd can be hugely impactful, building digital participation into your organisational DNA.
Having considered your project goals and objectives and read through these lists of tasks and features, you might find there is an existing tool or platform that already does just what you need. Fantastic! Many prospective project creators have similar high-level goals for basic crowdsourcing projects, and many of the tools and examples we have discussed above were built with the explicit goal of making the most common scenarios more turn-key. If you are in that group, you might be ready to move on to the “Choosing tasks and workflows” chapter. Otherwise, it is now time to consider the options available to you. Weight them against your project values, goals, and objectives. Consider your resources and whether you will use or adapt existing tools, or if customization is right for you.
We hope this chapter provided useful dimensions to consider as you continue to design your project. Remember to keep your values and project objectives at the forefront as you map your technical choices to the resources at hand. The usability and experience of your participants and the utility of the resulting data from these tasks and platforms are key dimensions. We encourage you to design iteratively and return to these considerations regularly. This process will contain a fair amount of trial and error. It is likely to be frustrating at times, but the results can be transformative for you and your community of participants.
While this chapter focused on tasks and platforms, we emphasize that they are affordances for human creativity and experiences. Our focus can and should remain on the experience of participants and project team members. In seeking the right selection of tools for your project, we recommend you consider whether there are ways to compromise on the technology, rather than on the experience of the participants — you are invited to dig deeper into participants’ motivations in “Understanding and connecting to participant motivations,” or continue onto “Managing cultural heritage crowdsourcing projects.”
Only Google Vision is mentioned here but wouldn’t it be good to list also other AI-services that have been used in enriching cultural heritage?
Maybe use “(geo)tagging”?
Localisation options (translating the UI) is also extremely important aspect
Attribution needed! Could some footnotes be added to this paragraph and the following one, so that those who want to know more about NER or computer vision applied to cultural heritage content have somewhere to go on to next? I’m not sure what; maybe https://www.oclc.org/content/dam/research/publications/2019/oclcresearch-responsible-operations-data-science-machine-learning-ai-a4.pdf ?
The irony of discovering this paper is paywalled led me to wonder whether it might be worthwhile seeking out preprint versions of articles cited (though it seems there isn’t one in this case)? I’m not entirely sure who you view as your target audience, but there are plenty of people working in the cultural heritage sector who do not have access to a wide range of academic journals.