Wrecked Beach to Recognized Speech: AI Transcription Correction

It’s an oft-repeated anecdote that when demonstrating the first vocoders (the data compression of audio signals containing speech and voice), Manfred Shcroeder, a German physicist for Bell Laboratories, noted that the machines and the people hearing them repeatedly tangled themselves in homophonic webs: the word “relationship,” clipped and compressed for transmission, sounded like “real Asian ship.” Ironically, he noted that the phrase “How to recognize speech,” fed into a vocoder and heard from the other end, disassembled into the nonsensical “How to wreck a nice beach.”

As stewards and creators of digitized archival material, our first challenge lies in capturing original objects as they are. The second challenge is presenting that object through the screen to scholars in a way that provides as much context and clarity as possible. One of the big spaces where Natural Language Processing (NLP) and AI machine learning has flourished is in the transcription and captioning field, and DLTS recently partnered with the service Konch AI to provide captioning and transcriptions for select archival collections. Properly storing a 50-year-old audio cassette is good; digitizing that cassette for modern playback methods is better; using machine learning software to provide an accurate transcript and ADA compliant captioning for that audio file is better still. After the initial machine processing, manual human corrections helps Konch bridge the clarity gap between “wreck a nice beach” and “recognize speech.”

In May of this year, with the lockdown in effect and our usual photographic services on hold, myself and my student photographers were drafted into an effort to help correct Konch-processed transcripts from the University Archive’s “Soul of Reason” collection. “Soul of Reason” was a half-hour radio show that aired from 1971 to 1986, broadcast jointly on WNBC and WNYU. Dr. Roscoe C. Brown, the host of the show, was the director for NYU’s Institute of Afro-American Affairs, and the topics and guests covered the entire social fabric of New York’s cultural, educational, social and political organizations. The guest list includes jazz visionaries, such as Cecil Taylor and Mary Lou Williams, the brilliant playwright and director Bill Gunn (“Ganja & Hess”, “Black Picture Show”), as well as members of New York’s business and social networks. Dr. Brown’s conversational style ranges from chummy to interrogative, and his mellifluous New York accent and wry asides proved difficult for Konch to transcribe with 100% accuracy. Interviews with multiple guests were particularly difficult for the automated processing, such as the spirited discussion between Dr. Brown, George Sparks and William Alston about the role of Black Mason lodges and their role in communities and in the larger Freemason network.

Our task: correct the transcripts for grammar and clarity, while preserving the conversational feeling from inside the studio. Of paramount importance was making sure that the captions correctly identified who was speaking, and editing the transcript to make sure identifiers, business names and organizations, and the titles of works of art were accurate.

One of the first things we noticed is that Konch wasn’t recognizing when a question was asked–one of the hallmarks of an interview program!

Screenshot of Konch editing interface.

Every program begins and ends with a short theme song and an introductory bumper, which had to be hand-corrected each time. As to proper nouns, Konch correctly identified the names of national organizations, but was often stymied by references to plays, books and films. In the interviews from the late 1970s, Dr. Brown mentions Sidney Lumet’s 1978 film “The Wiz” frequently on his program, but we never saw it captured accurately in the transcript. In the below example, director Bill Gunn is talking about his independent cult-favorite vampire film, “Ganja & Hess.”

Screenshot of Konch editing interface.

The difference between an automatic and a hand-corrected transcript is vast.

There will be more posts from our colleagues from Archives & Special Collections about other aspects of the project, including the crowd-sourced transcription events that they spearheaded. In the meantime, read more about the collection and ongoing efforts to amplify the “Soul of Reason” collection.

Share this post:

X (Twitter) Facebook Pinterest LinkedIn Email

Share this post:

Leave a Reply Cancel reply