Voices Replacing Visuals
The other day I was getting ready to go out. As I prepared to take a shower, I realized that I was out of shampoo, but I remembered I had a new bottle under the sink. As I opened the cupboard door, a loud voice issued from the depths of the cupboard demanding, “Are you ready?”
One might think that this voice heralded the beginning of a horror movie scene in which some alien creature burst from beneath the counter and inhabited my body, or a Narnia-esque scenario in which I was invited to enter a fairy-tale land through the sink’s plumbing, but in fact this occurrence was much more mundane. The voice actually originated from my bathroom scale, and my response was to snarl, “no, I am not ready,” slam the cupboard door shut, and wait for the defeated voice to reply, “powering down.”
My life is full of an abundance of voices issuing from everyday devices. My iPhone, computer, iPod, voice recorder, bathroom scale, thermometer, embosser (braille printer), Apple Watch, Apple TV, and electronic dictionary all speak. They have a range of voices, from my embosser’s pre-recorded voice which has an outrageous Swedish accent that I mock every time I turn it on, to my computers synthesised voice which sounds like a robot, to my phone’s voice which is synthetic but sounds almost human. I also experience pre-recorded human voices in the form of audio books and audio description tracks on movies and TV shows. These recorded and programmed voices enable me to navigate and interact with our increasingly visually-oriented world alongside my sighted peers.
For instance, VoiceOver, the screen-reading program on my iPhone, reads the text on my phone’s screen to me. I simply drag my finger around the screen and it speaks whatever is under my finger. Through the use of other modified touch-screen gestures, I can completely navigate my phone’s interface and many common apps including Facebook, Twitter, YouTube etc. If you possess an iPhone, you can give this a try, too. Simply ask Siri to turn on VoiceOver. Try moving your finger around the display or flicking left and right with your finger to move from item to item. When you’re finished, just ask Siri to turn VoiceOver off.
I do not often consider how these voices must seem to sighted people who do not need to interact with or utilize them. They have been an integral part of nearly every facet of my existence for much of my life. From the moment I wake up and reach for my phone to check Twitter, to the moment I crawl back into bed at night and navigate my iPod to turn on an audiobook, my life is permeated with voices that most people do not experience emanating from devices. I imagine these voices probably provide some quick amusement and entertainment for sighted people when they try them out. However, while sighted people view them as a novelty, I require them to survive in our world.
Richard Marsh’s 1898 short story “The Adventure of the Phonograph” Contains a somewhat similar situation. While Tress and Pugh view the phonograph as “the latest thing in playthings” (Marsh 1), an object of curiosity (Marsh 1), a “pretty sort of plaything” (Marsh 2), and a “pleasing novelty” (Marsh 3)., the original owners of the phonograph utilize it to earn their livelihood (Marsh 8). Thus, a certain classism exists around objects that speak. For those who have the privilege of not needing them, they are an entertainment that can be easily put aside, just as Tress quickly forgets about the phonograph and the voice by the next day (Marsh 4). However, for those who need the voice to survive for whatever reason, the voice does not exist as a plaything, but instead as a necessity.
I am reminded of an experience I had a few years ago. My family had settled down to watch a movie. Luckily for me, the movie contained a descriptive video track which described the action of the characters, the scenery, and any other relevant visual aspects of the movie. As soon as the movie started, and a pleasant voice started describing the opening scene of the movie, my stepbrother immediately started mocking the voice. His enjoyment at hearing the voice lasted about thirty seconds before he started whining, “Do we really have to listen to this?” Even though I needed the description, I understood his complaints. Because the descriptive audio tracks must be placed in breaks in the dialogue, often events are described slightly before or after they visually occur in the movie, which I am sure creates an uncanny viewing experience for those with sight.
Despite the uncanny nature of the descriptive video voice, I continue to utilize it because it enables me to experience a movie or TV show at nearly the same level as my sighted peers because the descriptions of the actions paint images of the action for me in my mind. Tress has an experience like this when he first listens to the phonograph: “Heard it! I should think I did. I seem to have seen it, too. I almost feel as if I had been the cold-blooded witness of a murder, or of something very like one.” (Marsh 3). Tress’s mere act of listening to a recorded voice “almost” provides him with a visual experience. This is what drew me to this story. It is a story where sound deeply impacts people. It is a mystery of voices, not of visual evidence. It involves a person clutching listening tubes to their ears while they listen deeply for meaning in recorded voices. I frequently have headphones trailing from my ears, and I can relate to Pugh’s startled reaction when Tress walks in on him listening deeply to the phonograph as people frequently startle me when I am concentrating intensely on what my phone or computer is telling me (Marsh 1).
It is a rare thing for characters in a story to have an auditory stimulus move them as deeply as the voice recording affects Pugh. Our society tends to place the most value on sight and visual stimuli. In fact, while Pugh and Tress both experience intense emotions while listening to the phonograph record, they immediately revert to visual means to sort through and quantify their experiences. Tress dictates the contents of the phonograph record to Pugh who writes them down (Marsh 3). When Pugh confronts John Clinch, he shows him the sheet of paper with the transcript of the record instead of requesting him to listen to the phonograph itself (Marsh 12). As well, Pugh’s “final attempt to prove that murder had been done” involves him confronting Jane Clinch with the record of her words (Marsh 16-17), not the auditory record of her voice. A possible explanation for this rapid reversion to visual technology, beyond society’s automatic prioritization of visuals over the other senses, is that the phonograph is a new technology to Tress and Pugh. Tress mentions that he counts himself among the “decent, respectable people—who never have seen a phonograph” (Marsh 1).
Since listening to recorded and synthesised voices is so normal for me, I cannot even fathom the uncanny experience that would be listening to a voice emanating from a machine for the first time, let alone a voice crying out in torment and fear. As Tress states, “reading the written words was sufficiently gruesome; listening to them had been much worse. No mere description could give an adequate idea of the horror of them, as they proceeded from the interior of that thing of wood and wax and metal” (4).
A similar emotion-provoking first-time experience with a phonograph appears in Bram Stoker’s 1897 novel Dracula. The first time Mina Harker listens to Dr. Seward’s phonographic diary she states, “I have been more touched than I can say by your grief. That is a wonderful machine, but it is cruelly true. It told me, in its very tones, the anguish of your heart. It was like a soul crying out to Almighty God. No one must hear them spoken ever again!” (261). Like Tress and Pugh, Mina also transcribes the spoken words into a visual form with the purpose of utilizing it as evidence (Stoker 261).
Because Tress and Pugh are experiencing the medium of the phonograph for the first time, and because the other recordings Pugh bought are faithful recordings of musical numbers (Marsh 3), there is no wonder that Pugh takes the Jane Clinch recording at face value and believes a murder has been committed (Marsh 5). This situation evokes the infamous “War of the Worlds” broadcast in 1938, which caused some listeners to believe an alien invasion had occurred due to the radio play’s format as official-sounding news bulletins interrupting a music program. These occurrences demonstrate how certain mediums evoke implicit trust, until people discover that those mediums can exist as purveyors of misinformation. As our modern-day experiences with click-bait headlines on social media has demonstrated, this problem with people implicitly trusting voices (mostly in text form) stored and presented on machines still exists to this day (in the form of memes and angry social media rants).
A final interesting point from the story for me is how Tress and Pugh humanize the phonograph because of the recorded voice, and by extent dehumanize Jane Clinch. This humanizing is evident in Pugh’s referring to the “voice of the phonograph” (Marsh 1), and his proclaiming that the “phonograph has confided to us its terrible secret” (Marsh 5). Tress’s dehumanizing of Jane Clinch is most evident at the end of the story when he refers to Jane Clinch’s voice on the phonograph record as “the woman’s voice” when he knows Jane’s name (Marsh 18). I frequently find myself humanizing my technology because of the voices that I require to utilize it. I have held arguments and conversations with my screen-reading software, and I have also yelled at it to be quiet. I tend to think of my screen-reding software as the voice of my computer or phone, and thus explain to people that my technology “speaks” to me. Because we tend to view voices as originating from people, experiencing a voice emanating from a device can humanize the device, especially if the voice sounds human. In our modern world, people talk to virtual personal assistants on their phones like they are humans. These assistants tend to have natural-sounding voices and natural-sounding responses. Most of these assistants also have human names (Siri, Alexa, Cortana etc.) which create a closer personal bond with the device and allow the end user to forget that they are dealing with a corporation’s cloud-based programming instead of a real person.
Over all, I found “The Adventure of the Phonograph” to be an entertaining and relatable story, especially because the other works we have dealt with this semester have been incredibly visual in their incorporation of aesthetic and decadent themes, which I can intellectually understand, but with which I struggle to emotionally connect, while Marsh’s story focusses on voices and auditory technology.
As I wrote this post utilizing my robotic, inflectionless screen-reader set to a speed beyond most people’s level of comprehension, I realized just how uncanny some of my experiences must seem to those of people with sight. However, I have also realized that I have not had to often think about the uncanny nature of my experiences because the rapid advancement of technology has placed voices into mainstream machines, which has somewhat normalized my specialized assistive technology, and has made the experiences of Tress and Pugh outdated to some extent (though aspects are still relatable to me, as I have explained). However, as recent news stories have reported, Amazon’s virtual assistant, Alexa, has started randomly laughing. This is undoubtedly a highly uncanny thing to experience, and though in all likelihood there exists a simple explanation of a bug in the programming behind the laughter—like the simple explanation Jane Clinch offers for the phonograph recording (Marsh 17)—there also exists a feeling that there is something more behind the voice, just as Tress senses that, “The whole thing—those screams in particular—did sound so very much like earnest!” (Marsh 18).
Marsh, Richard. “The Adventure of the Phonograph.”
Stoker, Bram. Dracula, edited by Glennis Byron, Broadview Press, 1997.