Blogs Blogs

First Round Transcription Results

Published Date 30/10/24 19:35

Now that we have completed our first round of transcription assignments, I wanted to take stock of the progress we have made. Crowdsourcing in digital humanities projects is risky business—often leaving little or nothing in the way of useful results—so we owe it to the field and our supporters to be transparent about our experiment and our results. Thankfully, I am proud to be able to report truly astonishing success! Thanks to the generous contributions from our volunteer transcribers, we have made excellent progress already in this first round.

By way of background, I divided Psalms 1-50 into five sections that are roughly equal in size: Pss 1-15, 16-25, 26-34, 35-40, and 41-50. For each section of each manuscript, we require two independent transcriptions, which makes for 10 assignment slots per manuscript (unless it is lacunose for parts, like the Aleppo Codex [MA]). By tracking hours during our Psalm 22 pilot project, I was able to estimate that each of these assignments should take approximately 20 hours of transcriber time. Factoring in about 5 hours for setup and training, each first assignment should take about 25 hours to complete. The transcribers would have about 1.5 to 2 months to finish the assignments at their own pace.

With project management help from Caleb Punt, we distributed these assignments on our forum optimally based on transcribers’ reported competencies. It was critical to make sure to have enough material from Pss 1-15 to begin processing these psalms immediately, so I built in redundancy by assigning more transcribers than necessary for the assignment slots for Pss 1-15 (and a few others). We also selected a team of experienced scholars to transcribe the very difficult Berlin Codex (MZ)—which presents the Babylonian Masorah and is badly damaged—and planned some redundancy there as well. The response from the transcribers was amazing!

Indexing

We were able to complete the full indexing for every one of the manuscripts, identifying which verses are on which pages.

Completed Assignments

In all, we distributed a total of 120 first-round assignments to transcribers who had set up accounts in the Virtual Manuscript Room (VMR). 68 (= 57%) of these were completed on time and according to project standards! 50 out of 88 (57%) assignments for Hebrew manuscripts were completed, as were 18 out of 32 (56%) Greek assignments. At the estimated average of 25 hours per assignment (68 x 25 = 1700 volunteer hours), the transcribers completed in total approximately an entire year’s full-time work for an individual researcher over the course of a couple of months in the summer!

Of these completed assignments, only 5 Hebrew and 1 Greek (= 9%) turned out to be redundant. But even these are of value, because reconcilers can select the best two transcriptions to use in cases of redundancy.

Manuscript Coverage

For the 11 major manuscripts we chose to include, there were a total of 100 assignment slots required for full coverage, 62 Hebrew and 38 Greek. All of the Hebrew slots were assigned to transcribers, as were 29 out of the 38 required Greek slots. For the Hebrew assignments (which had greater redundancy built in), 45 (73%) distinct slots were filled with completed transcriptions. And the Greek transcribers filled 17 slots (59% of assigned slots; 45% of total slots). This makes for a grand total of 62 completed assignment slots accounting for 62% of the total project need. Not bad for a first round! Significantly, all of the manuscripts except GA had at least two completed assignments for Pss 1-15, which allowed us to begin reconciling transcriptions right away.

Quality Control

This progress was indeed remarkable and exceeded even my own optimistic expectations. But it raises the question of quality control. How useful and accurate are the data that were produced? There were several steps we took to ensure accuracy and compliance with project standards:

Training - All transcribers underwent training regarding working with manuscripts and electronic transcription (live sessions were recorded for later viewing).
Transcription manual - I wrote a detailed transcription manual for reference with instructions defining all project standards.
Checking first pages - Transcribers had the option of requesting a more experienced transcriber to check their first transcription page upon completion to provide feedback and guidance for going forward.
Forum - I set up a forum category for posting and answering questions regarding the transcription process, project standards, difficult passages, and website bugs.
Checks and revisions - When transcribers reported their assignments as complete, I checked their work quickly and requested revisions where project standards were not yet met. After revisions, these assignments were marked as complete.
Reconciliation - We have now begun the reconciliation process, which provides another critical quality control measure. A select team of experts have the job of comparing two completed transcriptions using the VMR’s reconciliation tool and adjudicating whenever they disagree. This process is intended to correct any remaining errors in the individuals’ transcriptions and to ensure adherence to project standards. So, unless both transcribers independently made the same mistake, the resulting approved project transcription should reflect the text and layout of the manuscript accurately. These project transcriptions are what will be published on the website and used for the digital edition.
Proofreading and editing - In the future, we hope to have specialists proofread the project transcriptions against the manuscripts to identify any remaining errors and inconsistencies in meeting project standards. Additional problems may be flagged in the collation editing process. In these cases, we can edit existing project transcriptions directly.

In a project of this magnitude, there will inevitably be errors and inconsistencies in the data. But these quality control measures have proven both manageable and effective in ensuring a high-quality data set of manuscript transcriptions.

Feedback

In order to further evaluate our process, I sent a Google form to the transcribers soliciting feedback. We received 18 responses from transcribers, all of whom had completed their first assignments. These responses indicated a very high level of volunteer satisfaction.

Overall, the transcribers were very satisfied with the tools and process as well. However, we received useful critical feedback in the following areas:

Timely feedback - Because we struggled to manage the massive influx of volunteers at first, it took longer than expected to get the assignments distributed and to provide feedback to transcribers (e.g., checking first pages, replying to forum posts). We have brought Caleb Punt on board to help with project mangement going forward, and we expect subsequent rounds to be more manageable.
Technical glitches - Some users experienced technical glitches on the website. In rare cases, this required redoing previously completed work. We have invited volunteers with software development skills to join a debugging team to help alleviate these problems in the future.

Retention

Of the 68 transcribers who completed their first assignments, 42 (62%) actively volunteered for a second round of assignments. This nicely matches the US national average for volunteer retention of about 65%. It reflects a high level of volunteer satisfaction (evident in the feedback), based on the perceived value of the work as an enjoyable learning experience and as meaningful service to the field. Furthermore, in the two months since the end of the first round of transcription, we have had 13 new volunteers register to do transcriptions. This suggests that our initial push for volunteers did not exhaust the pool of willing volunteers. Furthermore, several repeat transcribers have signed up to do more extensive transcription work for independent study credit, which considerably increases our capacity for further transcription.

Next Steps

We have recently released a second round of transcription assignments that aims to fill the remaining gaps in our coverage. Almost all remaining slots are currently assigned, so if the transcribers complete their tasks again this round, then we will essentially have full coverage for all of the originally selected manuscripts.

Our continued volunteer capacity has simultaneously allowed us to expand the corpus of manuscripts included. With help from Vince Beiler, we were recently able to add 8 new model Masoretic codices from the Firkovich collections in St. Petersburg that date from around the 10th-11th centuries. These are very substantial manuscripts that are just as old as the Aleppo and Leningrad codices, but have never before been transcribed or thoroughly studied for purposes of a critical edition. Ben Outhwaite and Kim Phillips have also helped us identify a selection of the most important fragments from the Cairo Genizah that we hope to include in the future. Thus, this crowdsourcing experiment has not only helped us achieve our project goals more quickly and efficiently, but also opened up new possibilities to make the edition even more robust.

In parallel with the ongoing second round of transcriptions, we have already started reconciling transcriptions for Pss 1-15 with the help of a select team of skilled and experienced volunteer transcribers. And I plan to begin editing a preliminary edition of Pss 1-15 in the coming weeks based on these transcriptions. Troy Griffitts and I will report on our progress and tools at an SBL Digital Humanities session on November 25, so please do join us for further updates and reflections.

Of course, none of this would have been possible with the support of our amazing volunteers, who have made this crowdsourcing experiment such a success. So I want to conclude this post with a big “Thank you!” and "Congratulations!" to all of you!

By Drew Longacre

1839 Views, 0 Comments

Flag

Average (0 Votes)

Comments

Recent Bloggers Recent Bloggers

Drew Longacre Posts: 3 Stars: 5 Date: 12/12/24
Greg Paulson Posts: 25 Stars: 72 Date: 07/10/24

Katie Leggett Posts: 2 Stars: 8 Date: 31/07/24
Darius Mueller Posts: 2 Stars: 6 Date: 17/06/24
marie-luise lakmann Posts: 3 Stars: 12 Date: 29/05/24
Klaus Wachtel Posts: 2 Stars: 3 Date: 13/03/23
Scott Hayes Posts: 10 Stars: 0 Date: 27/01/23
Luke Steven Ambrose Posts: 3 Stars: 0 Date: 09/12/21
Kyle Baughman Posts: 1 Stars: 0 Date: 25/02/21

First Round Transcription Results - Blog

Blogs Blogs

First Round Transcription Results

Tag Cloud Tag Cloud

Recent Bloggers Recent Bloggers