Liliya Tsirulnik: Giving Computers a (Singing) Voice
Update to this story, Feb. 13, 2013: The Singing Voice Synthesis Database is now available to the public.
San Diego, Calif., Aug. 15, 2012 — Anyone who has ever suffered through a particularly excruciating karaoke rendition of Neil Diamond’s “Sweet Caroline” will appreciate Professor Liliya Tsirulnik’s research.
Tsirulnik, who is a visiting scholar at the University of California, San Diego division of the California Institute for Telecommunications and Information Technology (Calit2), is working to improve what’s known as Singing Voice Synthesis (SVS), software that uses text and musical notes to synthesize an (ideally on-key) singing voice.
Tsirulnik conducts her research in collaboration with Music Professor Schlomo Dubnov in Calit2’s Computer Audition Laboratory, which is led by UC San Diego electrical and computer engineering Prof. Gert Lanckriet.
Although Tsirulnik originally developed the experimental software for text-to-speech synthesis for the Russian and Belarusian languages (she hails from the Academy of Sciences in Belarus), she expanded the system to include new signal processing algorithms for synthesizing and changing the pitch of phonemes (speech-based sounds) to create natural-sounding singing voices.
“Text-to-speech synthesis is not adequate for producing synthesized singing voices, especially professional singing voices, which are capable of making a very broad spectrum of sounds,” says Tsirulnik. “It was necessary to develop a new voice database that consists of the entire range of musical notes that a singer can produce.”
The SVS database now features vocal as well as glottal recordings of professional singers affiliated with the UC San Diego Department of Music, including professors Susan Narucki and Philip Larson, as well as graduate student Bonnie Lander. To capture the glottal recordings, the researchers place a microphone near the singer’s glottis, or esophagus, and record him or her as s/he sings.
“We do this to see what kinds of special singing effects are produced in the vocal folds,” explains Tsirulnik. “We have also augmented the SVS database with a set of ‘singing expressions,’ like ‘natural,’ ‘sweet’ or ‘deep.’ This will allow us to research and compare more complex voice characteristics, such as vibrato, amplitude, duration and other voice parameters to see how we might re-create these sounds synthetically.”
The software creates synthetic sounds by choosing the appropriate phonemes from the database and implementing prosodic processing to change the fundamental frequency, or intonation, of the recorded voice, as well as the duration and amplitude of sound, thus creating a realistic-sounding singing voice.
The database recordings, which will eventually be open-source, are also annotated at the phoneme and pitch levels to make it easier to find specific notes.
Adds Tsirulnik: “It’s basically a way of creating ‘automatic karaoke,’ or music using only a database and algorithms. Eventually we will have realistic computer music with [simulated] human singing voices.”
So will it one day be possible to create a digital person who also synthetically sings? Previous research conducted by Tsirulnik, in addition to her current focus on SVS, suggests this might be the case.
“Another interesting direction in my research is audiovisual synthesis – a means of creating a ‘talking head’ whose articulatory organs change based on the phonemes that have been entered," continues Tsirulnik. “But to create an avatar who can synthetically sing, we will need to include more ‘visemes,’ or pictures of the articulatory organs, because in singing these organs are moving in more varied ways.”
SVS software, combined with audiovisual synthesis, could therefore eventually be used to train amateur singers or – when all else fails – to digitally enhance a person’s real singing voice (good news for people who insist on belting out Guns N’ Roses’ “Night Train” during Friday night karaoke).
Tiffany Fox, (858) 246-0353, email@example.com