Dr. Michelle Cohn's Talk on Speaking Style Differences Towards Humans and Devices
Introduction:
Dr. Michelle Cohn is a Postdoctoral Scholar in the UC Davis Phonetics Lab, associated with the Department of Linguistics. She received her Ph.D. in Linguistics at UC Davis in 2018. Her postdoctoral training includes a 2.5 year Social, Behavioral, and Economics (SBE) Postdoctoral Fellowship through the National Science Foundation. Since 2022, Dr. Cohn has also been a Visiting Researcher with the Google Responsible AI and Human-Centered Technologies group. Dr. Cohn's research program aims to uncover the cognitive sources of variation in how people produce and perceive phonetic details in speech.
Dr. Cohn gave a talk for the Cluster on Language Research entitled “Voice assistant- vs. human-directed? Speech style differences as a window to social cognition” on Monday, April 10, 2023! We interviewed Dr. Cohn by asking her to discuss some of the topics of her talk.
Interview Questions
1. In your talk, you discussed how differences in speaking style towards voice-AI assistants (e.g., Siri) and humans provides a “window to social cognition”. In your past work, what acoustic differences have you found between voice-assistant-directed speech and human-directed-speech? More broadly, why is studying acoustic-phonetic variation important for understanding social cognition?
Interactions with voice assistants (e.g., Siri, Alexa, Google Assistant) -- and language technology more broadly -- are increasingly widespread. What does it mean to use language with non-human entities? What can it reveal about the people using the technology, such as how they reason about others (i.e., social cognition)? Along with Dr. Georgia Zellou (Associate Professor of Linguistics, UC Davis) and graduate and undergraduate students in the UC Davis Phonetics Lab, I design psycholinguistic experiments to get at these questions. In the talk, I discuss how we generate identical interactions -- the same content, error rate, and types of errors -- to probe speakers’ style differences in human- and voice assistant-directed speech. Across studies, we find that speech directed toward a voice assistant is often louder and slower, suggesting speakers anticipate a barrier in intelligibility. By and large, however, speakers produce similar acoustic adjustments based on the local communicative contexts, such as decreasing effort when communication goes smoothly, and making prosodic and segmental adjustments in response to an error. In some cases, we see evidence of targeting, such as making vowels more distinct when Siri mishears a word, compared to when a human mishears a word. Acoustic-phonetic features in speech are cues to cognition: the barriers speakers experience (or assume) are present for their addressee, and the ways speakers can target and enhance certain types of contrasts to be better understood.
Cohn, M., Segedin, B. F., & Zellou, G. (2022). Acoustic-phonetic properties of Siri-and human-directed speech. Journal of Phonetics, 90, 101123.
Cohn, M., & Zellou, G. (2021). Prosodic differences in human-and Alexa-directed speech, but similar local intelligibility adjustments. Frontiers in Communication, 6, 675704.
2. From 2019-2022, you were the principal investigator of a National Science Foundation grant entitled “Device-DS as a window into mechanisms of speech production: An investigation of adults and children”. What have you found in the course of your investigation?
In this project, we tested college-age adults and school-age children (ages 7-12) from California in identical psycholinguistic experiments: participants completed the same task with a voice assistant (Amazon Echo; Alexa) and a real person (research assistant) in a Zoom call. As in our other projects, we included ‘staged’ errors: planned misrecognitions by the human and voice assistant that would require the participant to repeat what they had said. For example, if the participant says “The word is pig”, a misrecognition might be “I think I misunderstood, I heard pick or pig”. Then we examine the acoustic-phonetic adjustments speakers make in their error repair, when repeating “The word is pig” again. At the prosodic level, we’ve found that utterances are longer and higher pitched toward voice assistant addressee. Additionally, children produce even higher pitch to the voice assistant, and increase pitch *even more* when Alexa mishears them, compared to when a person mishears them. These results suggest that children make adjustments in anticipation of an intelligibility barrier, and are not simply talking to voice assistants “like people”. The differences across adults and children can also shed light on developmental trajectories in speech style adjustments, and social cognition more broadly.
3. What other research projects are you involved in?
On the production side, I have a series of projects examining vocal accommodation toward the acoustic-phonetic properties of human and text-to-speech (TTS) voices (i.e., synthesized voices used in voice technology). I’m also working on a range of projects investigating speech perception, including how people learn language from a human or a device interlocutor, and the extent listeners perceive socio-emotional features in TTS voices compared to human voices.
4. What advice do you have for students interested in pursuing a career in academia (linguistics or otherwise)?
Find areas of research that interest you and pursue them! Develop research questions that intersect with multiple subfields; often the boundaries between disciplines are underexplored. You would be amazed how much you can learn about your own “home” field by working with other fields. Read E.O. Wilson’s Letters to a Young Scientist for inspiring (and approachable) advice for a meaningful research career. And perhaps most importantly, find collaborators and research mentors you get along well with personally and professionally.
To learn more about future events held by the Cluster on Language Research, you can visit our website or e-mail us at ucd (dot) langsymposium (at) gmail (dot) com.