Xinran Shen: Midterm Portfolio – #5 Audio Descriptions of Captions – Accessibility As Creative Practice: Spring 24

Project Description:

In this piece, I tried to partially abandon logic and use a more emotional, intuition-based way to remix the caption texts with the new audio voiceover. I took out some key parts in the original caption for my assignment 1 (Furina’s character demo video), cut and rephrased some of them, used repetition of certain words, and collected some stock videos that could interpret these words in different ways. Then, I assembled them with a child’s voice reading the caption texts. I would say the final draft is not a cohesive story based on a narrative, but a collage-like stream-of-conscious that describes a painful yet dumb mental state.

Documentation:

Transcript:

AD: A woman’s eyes gaze into the camera, and in a moment of closure, a tear trickles down from the corner of her eye.

Child voiceover: vocal chords ready, blood sugar replenished. let the show begin.

AD: Switching scenes, on a grassy field, a round, intact mirror reflects the sky, as a hand holding a wildflower slowly reaches into the frame from one side of the mirror.

AD: In the broken mirror, a woman gazes at and caresses her fragmented reflection.

AD: Returning to the previous scene, in the intact mirror, the wildflower sways in the hand.

AD: Switching back to the broken mirror scene, the woman’s hand touches the shattered glass.

Child voiceover: It was a lively theme that sounds like musical.

AD: Switching scenes, time flows slowly through an hourglass.

AD: A close-up of the hourglass.

Child voiceover: Here the melody becomes slightly melancholic, yet the motif is not yet resolved.

AD: Switching scenes, a woman’s hand holds broken glass.

AD: Switching scenes, two toys lie by the bedside: a teddy bear

Child voiceover: Gentillhomme Usher

AD: and a toy dog.

Child voiceover: Surindentante Chevalmarin.

AD:A damaged doll holds the teddy bear, and the scene fades out.

Child voiceover: Mademoiselle Crabeletta.

AD: A sequence of scenes is about to play. Switching scenes, a hand hold a red marker and writes Bye XX on a mirror.

Child voiceover: The world is but a stage, why cry when you can laugh instead

Child voiceover: laughter is humanity’s reserve.

AD: Lipstick draws an exaggerated smile.

AD: Broken mirrors.

AD: Hands trying to breach a thin film.

AD: A woman dances.

Child voiceover: laugh it all off, fret not, let’s just enjoy the moment

AD: Eyes shedding tears. (yet the motif is not yet resolved)

AD: Peeking through curtains. (not yet resolved)

AD: A woman gazes at herself in a shattered mirror.

AD: Continuing to peek. Party scenes.

AD: A woman looks upward to the left.

AD: A drowning woman. (World is a stage.)

AD: Numerous laughing phantoms. (World is a stage.)

AD: The drowning woman again.

AD: A hunched saint angel statue.

AD: The dance continues.

Child voiceover: Welcome to the most spectacular show.

AD: Hands emerge amidst psychedelic lights.

AD: The dancing woman reaches straight out towards the camera.

Reflection questions:

Artistically, I tried to make the text “sing” with the background music, which is the cadenza movement of Shostakovich Cello Concerto No.1. For example, I intentionally let “Not yet resolved” and “World is a stage” become the lyrics for the 4-note motif and the motif variation in this music, so that they could still match with the somewhat dissonant and irregular melody and rhythm.

In terms of accessibility considerations of this piece, I think there is still some space for improvement, and the current version excluded the BLV audiences as well. I spent a long time exploring and experimenting with the idea in the beginning phase and only got the chance to refine the artistic idea at last. However, I still added captions for the voiceover and the background music. If I had more time, I would make audio descriptions for the visuals as well, and also add some sound effects from freesound.org to go with each footage, so that more context could be given to BLV audiences. The latter might also enhance this piece artistically if done properly.

While experimenting with converting visual captions into audio voiceovers, I felt lost in the middle because visual captions were mostly converted from audio parts in the beginning (for my case in assignment 1, most of the captions were for character lines and background music). It was challenging to think about how to make this remix not a simple copy of the original video. Thus, I chose to leave behind much Furina-specific content in the original caption texts and only picked out the vague parts so that I could gain a chance to let new meanings be conveyed through these texts.

Additional Modality:

For better accessibility for the BLV audiences, I added audio descriptions for this video. I generated normal-speed AI female voice for these, because I want people who are not used to hearing things in 2x speed can also understand. However, I realized that I would have to freeze the frames in order to leave spaces for the audio description, which will mess up the rhythm of the background music and its synchronization with voice over and the visuals. So I decided to make the audio description very concise, and place them in between the individual scenes, so that it becomes a part of the music which gets furious near the end while still maintaining the legibility.