The definition of vocal work is the most important aspect of the voice acting industry. Vocal work involves live performance, broadcast, or recording of human vocal narration. The industry of vocal acting, voiceover, or voice-over reaches billions of people across the globe, and there are millions of voice actors, even in the English language.
We often hear human voices in various media without seeing the speaker. The voices we hear on the radio, commercials, television broadcasts, story films, documentaries, and those giving remarkable human expression to illustrations, animations, and immersive games go unnoticed most of the time.
In drafting the Standard for Voice Actors, we understand the uniqueness of this craft. Voiceover work is not writing, as the narrations and scripts performed by voice actors are usually not written by them. Most voiceover work takes place in homes, cars, and other sites, where professionals rehearse and practice, and then is recorded in a studio and often combined with different types of media, including visual and musical media.
Voiceover must be heard to be experienced, and in cases where the audience cannot hear the actor's voice, closed captions are sometimes provided. However, these can only attempt to capture the writer's words and make written reference to the vocal inflections of the voice actor.
There is no evidence of crossover or mixed use between AI-simulated and human voices in the industry. Therefore, we have not attempted to address how AI is combined with voiceover work, but we simply delineate the human voice from using any non-human simulated voice.
We acknowledge the hard work of skilled professionals - women and men who have honed their skills and mastered this craft in various languages. We appreciate your human voices and are grateful to be able to help you mark them as appropriately yours.
We also express our sincere support for voice actors who rely on AI in ways not addressed here or find it challenging to agree to the standard for various reasons.
DEFINITIONS IN THE STANDARD
Here are definitions of words and ideas used in the Standard for Voice Actors.
A specific description of human behavior
Voice Actor/Voiceover Artist
A person who narrates (or performs) vocal work
Present, share, or show work with others
Use of human vocal narration for live performance, broadcast, or recording.
A group of people working together
A work or invention that is the result of creativity, such as a manuscript or a design, to which one has rights and for which one may apply for a patent, copyright, trademark, etc.***
The fundamental elements or characteristics of something
Uttering, making a sound, or using one's voice aloud to create vocal work (see Essential Vocalization below)
Essential elements of the human voice
The sound made by the vocal tract of a human person which can be heard as talking, singing, laughing, crying, screaming, humming, mumbling, or shouting.
Noun: a human person; adjective: from a human person
Shorter definition–machines that create novel (new) content | Longer definition–generative artificial intelligence (AI) describes algorithms (such as ChatGPT) that can be used to create new content, including audio, code, images, text, simulations, and videos.****
"Systems that act like humans"*
Other generative processes
Other processes involving AI or machine learning to create novel (new) content
*IBM: What is artificial intelligence (AI)?
**Microsoft | Learn | LLM
Other definitions of words and ideas related to the Standard for Voice Actors are here.
The prominence of a syllable; characteristics pronunciations of one spoken language that can be heard in another
Artificial Intelligence (AI)
A field that combines computer science and robust datasets to enable problem-solving. It also encompasses machine learning and deep learning sub-fields, frequently mentioned in conjunction with artificial intelligence. These disciplines comprise AI algorithms that seek to create expert systems that make predictions or classifications based on input data.*
AI language modeling (or Large Language Modeling (LLM))
Systems that can use natural language text from large amounts of data. Large language models use deep neural networks, such as transformers, to learn from billions or trillions of words and to produce texts on any topic or domain. Large language models can also perform various natural language tasks, such as classification, summarization, translation, generation, and dialogue. Some examples of large language models are GPT-3, BERT, XLNet, and EleutherAI.** LLMs can translate language or text into visual images and vice versa.
The production or reproduction of sound, live or recorded
Special and significant stress of voice laid on particular words or syllables, stress laid on particular words by means of position or repetition
Way of understanding or explaining the meaning of something
Communication by voice in the distinctively human manner, speech; a body of words and the systems for their use common to people who are of the same community or nation, the same geographical area, or the same cultural tradition
Morphological vocal processing
When the features and structures of a vocal sound's characteristics are described and processed as words
A temporary stop or rest
Personal, original idea
An idea representing a specific human being's unique insight or experience in the world
Height or depth of a tone or sound, depending upon the relative rapidity of the vibrations by which it is produced
Producing the sounds of speech, including articulation, stress, and intonation, often concerning a standard of correctness or acceptability
A measure, quantity, or frequency, typically one measured against some other quantity or measure, rate of motion, or progress
Representation and description
Representing a sound in a way that can be analyzed by a computer, like numbers or words that have been assigned to specific sounds and features of recorded sounds
Vibrations transmitted through the air or other medium experienced through hearing
Also known as text-to-speech (TTS), using written or spoken words to prompt an AI generator to create vocal work
The characteristic quality of sound, independent of pitch or volume, from which the manner of production can be inferred, dependent on the relative components of resonant frequencies
A sound considered concerning its quality, pitch, strength, and source; the quality or character of a sound
Principles or standards of behavior
Relating to principles, values, or ethical assumptions that motivate human behavior
The voice of an offscreen narrator, announcer, speaker, or reader, as in a commercial, using such a voice
*IBM | What is artificial intelligence (AI)?
**Microsoft | Learn | LLM
****McKinsey & Co. | What is generative AI?
*****AI-Based Affective Music Generation Systems: A Review of Methods, and Challenges | ADYASHA DASH and KAT R. AGRES
In vocal performance, artists use their minds and distinctive physical and verbal traits to bring human expressions to life, crafting unique soundscapes. These come alive in movies, radio, audiobooks, and more. We establish essential human vocalization with a simple question.
The essential question of vocalization is: Who (or what) gave voice to this sound?
We hear and experience spoken vocalizations, either human voices or voices simulated by a machine.
The result is vocal work when a human's voice can be experienced live or in recorded media.
ASSUMPTION OF ESSENTIAL HUMAN VOCALIZATION
If a human (and not a machine) gives voice to words and sounds, then a human is the essential vocalizer.