On behalf of the Utrecht School of the Arts – School of Music & Technology I visited the Game Developers Conference (GDC) 2012 audio tracks. For colleagues and peers, I’ve made an overview that can be found below.
There were two simultaneous audio tracks and I tried to attend lectures within a broad range of topics such as programming, composition, sound design, practical examples as well as more academic topics. It must be said though that making the choice which session to attend was rather difficult because of the large number of very interesting and relevant topics.
The day before the regular audio tracks start, the Audio Boot Camp is held to present an overview of the professional field. It was mainly attended by sound designers and composers and the rather large room was nearly full. Various speakers with ‘a shared expertise of over 140 years’ introduced the field anno 2012.
Scott Selfon (Development lead – Microsoft Corporation) and Garry Taylor (Audio Director, SCE) gave an introduction for the Audio Bootcamp Volume 11. Right at the beginning, Scott mentioned the importance of the integration (also referred to as implementation) of audio; it is of high importance that content creators know what the audio programmers do. Garry wants his team to know what else goes on, and he wants a good grounding of his team with other areas within game design. In contrast to the past, sound designers and composers interested in working for games now need to have depth as well as breadth. This does not only include other fields of expertise, but the awareness how to run a business as well.
Martin Stig Andersen (Composition & Sound Design at Playdead) started off with a highly interesting talk about the audio design for the game Limbo. Martin has a background in electronic computer music and mentioned why this is a good background for game audio design: it’s about making decisions in realtime based upon what you hear and I found two interesting topics in this talk: the sonic design of sound effects and the integration techniques.
Concerning the sonic design of sound effects, Martin wanted to accomplish a ‘bleak sound’ that fitted with the graphic design of the game and decided to sonify only the environment of the game (‘Zone’ & ‘Effect’ ). By doing this, the game would stand in contrast to the enormous number of loud games already out there. The bleak sound was created by re-recording sound effects on a Webcor wire recorder. In addition to sound effects from various sources, Martin used operation noise and sounds from a tape recorder to generate sound effects. This is Limbo:
Concerning the implementation of sound assets in this game, there was a relatively tight file size ‘budget’ of about 50 Mb so all sort of short sound files were creatively reused, layered and combined in the sound implementation tool (Wwise) to cater various actions. According to Martin, audio design & implementation can be regarded as ‘both sides of the same coin’ and exemplificatory for this statement is his work on Limbo. He got involved with the audio design for this game when it was already completed – which is not ideal – but still there was room for ‘audio ideas’ and in the actual implementation, a lot of innovation could still be accomplished. One thing that I find extremely well done in this game is the adaptive or prioritized mixing. Considering that this is a ‘silent game’, the developers wanted to use a very balanced set of sounds that do not become repetitive or annoying. The prioritized mixing in this game consists of attenuating sounds that deserve extra focus. To give an example:
The footsteps of the avatar – the walking boy – are played with a -6dB attenuation. Once the boy walks on a different terrain surface, the footsteps change according to the new terrain but are also played with 0 dB attenuation thus louder. The louder footsteps sounds are considered to be relevant for the player so he knows the surface has changed. Gradually, the footsteps will become softer until the -6dB level has been reached. This prevents the footsteps from becoming overly present.
Another example of prioritized mixing was the dynamic changing ambiences according to the state of the puzzles. Martin gave the example of approaching a spider, fighting the spider and then proceeding after the spider has run away (see the example above that starts at 5m33s and listen to the Zone. These three states were reflected in the background ambience sound. In this way, by using ‘subjective mixing’ in order to stipulate player anticipation, the sound designer can use sound in a similar way as he would do in film sound design .
Jobs / profession
Composer Jason Graves mentioned that earning lots of money is not a valid justification to step into the business of game music. Many novice game audio designers mentioned the catch 22 that seems to be present in most job opportunities: because 10-15 years of experience is required in order to qualify as a game sound designer it’s impossible to be able to gain experience in this field. According to Jason, 10.000 hours of practice is needed in order to consider one a professional and the best way to gain this experience is to get involved in indie game development. Networking is #1 when it comes to getting work. Still, it is worth it, compared to other contexts (e.g. commercials), making music for games is creatively liberating.
Another important thing is to have an understanding of the industry standard of awesome, instead of a student standard of awesome. In other words: do you compare yourself to your classmates or the industry? (Mike Caviezel).
Sound Design vs. Composition
More apparent than in my previous visits of the GDC (2006. 2007) was the separation between composition and sound design. Sound designers are usually required to be more strongly involved with the game development team and in studios that work on AAA titles, sound design and audio programming tasks are mostly fulfilled by in-house audio experts. Smaller studios frequently outsource audio tasks, but generally ‘WAV-slavery’ (delivering the sound assets on a list without any contact with the game) is no longer common practice.
Composers are almost always working solitarily in their own studios and because of this, communication is extremely important. Jason Graves said: “The composer is the single point of failure, defending 50% of the audio and most of the time he’s not there”. He stated that it is extremely important to stick with the music direction, as audio directors ‘put their butt on the line by hiring you and are generally nervous’. That being said, he also noticed the trend of being asked to develop something original and more unique, to make the game stand out from what’s already been done.
We see that composers with a certain reputation sometimes get a ‘carte blanche’. According to Jason Graves, the fun games are games where they let you innovate, be yourself. It helps if audio leads know what they can expect and having a clear identity as a composer might contribute to this. Jason clearly works on his own musical identity: ‘It’s better to be a first-rated version of yourself than a second-rated version of someone else.’
The eye-opener concerning the identity of a composer was Darren Korb. As opposed to buying a studio with expensive orchestral libraries, Darren just created the whole soundtrack for the game Bastion in his apartment in NYC using a simple computer with Logic, two microphones (one for talkback) and guitars (‘some of them are not real’). He configured the game audio around his talents and had only a very limited amount of money, but did make it sound like he had. He accentuated the importance of early involvement in the process so he could shape the sound of the game, and exploit his talents. He defined the musical style as Acoustic Frontier Trip-Hop. Now that’s a unique identity for sure…
According to Mike Caviesel (Audio Director, Microsoft) you should work with a target in mind when creating sound effects. “It’s not good enough to just play an instrument, you have to play it in a manner that makes me want to keep listening.” Experimentation is good, but focused experimentation is better! Microphones, software and sound libraries are just tools which can distract. You need to build a vocabulary to be able to express your ideas and concepts to game designers. Mike Caviesel mentioned words such as airy, vicious and ominous to get close to what the client is expecting. Offline mockups are extremely valuable to make explicit what you mean. Cooperate and ask other persons about their opinion continuously.
Brandi House and Marty O’Donnell presented the results of user research into the specific reactions of players with respect to certain types of music. By quantifying emotional responses to music and gameplay video footage (making an ‘emotional footprint’ of user responses) and comparing the numerous combinations between the music and the video footage they showed that 92% of the emotional footprints had been influenced by fingerprint of the music. For the other 8%, the users could not accept the combination as it was impossible to connect the two.
Mona Mur vocalist, composer, audio artist told about her experiences with creating ‘non-music’ for Kane & Lynch 2. She got some of the inspiration for this job out of her activities in the Industrial Music Culture in the early 80s.
Garry Schyman (BioShock) and Jason Graves (DeadSpace) talked about incorporating aleatoric music in their scores. Aleatoric music is basically about writing expression into a score instead of notes. The instructions used were for instance:
- Rat-like rodent features.
- Your lowest yet loudest note.
- Ad lib, as fast as you can.
- Contours of a line or a block of notes.
- Timing is fixed, notes are free.
This is a fragment of a typical ‘aleatoric’ orchestral sound (Mark Snow – X-Files: Fight the Future)):
Garry found that it took the musicians a while to get used to the liberty he gave them but they really loved the creativity involved. Sometimes his initial suggestions were not clear enough: “the highest note on your lowest string” did not exactly deliver the result, but mentioning “like a dying cow” was spot-on. Also making a choir sing a score that is impossible to sing with the briefing: “try to sing it as best as you can” worked extremely well to get the result he was after.
The biggest limitation is that aleatoric techniques only work with live instruments (and good musical performers!) and not at all with virtual orchestration. This means that you need to gain some trust from the other stakeholders, as you cannot present a demo before actually stepping into an expensive recording session. Another limitation is that these techniques can only be used for a short period as “the only thing worse than boring tonal music is boring atonal music.” You’re adviced to alternate the aleatoric parts with well-written musical scores…so aleatoric is not the lazy composer’s ideal friend.
AAA, Indie and…
In addition to the AAA-studios (BIG) and the Indie developers (SMALL) there’s another context that seems to be having specific properties concerning audio design: social games (BIG, possibly even BIGGER…but different). Dren McDonald explained how he got into the business of developing audio for social games. Most social games used to be silent and he got involved early in the cycle of developing CityVille. Meet CityVille:
Dren explained how CityVille compares to the Call of Duty (CoD)-franchise. The complete CoD franchise sold 70 million copies in total. CityVille had reached 60 million players after 50 days, and soon they had a higher total amount (90m+) of players than CoD. Even after critically taking these numbers into account (what exactly is called a ‘player’?), you must conclude that these audiences are indeed H-U-G-E.
Dren explained there was a lot of room for innovation, as the other social games hardly featured audio and he told the developer that audio actually was a way to stand out from the rest (or: ‘elevate your title’).
It comes as no surprise that supplying audio for millions of daily players actually generates quite a lot of data traffic. Therefore, SWF-files (pronounced: ‘swiff’) containing MP3-compressed audio are used. Usually, developers choose to preload one big SWF containing the basic sounds and load separate SWFs depending on what is needed for specific players. Quite a lot of smart preloading is required as optimising only a small amount of data per player might dramatically alter the server load for millions of players.
Another big challenge is preventing repetition; a limited amount of audio data in combination with hours of stat-driven, repetitive gameplay might deliver problems. In CityVille, music as primarily used for giving a good first impression at the beginning of the game and many sounds can only be replayed after 4 minutes to prevent annoyance.
Dren concluded that there are many innovations coming up in this field and mentioned that CastleVille has a 15-minute full-orchestra soundtrack. Listen to the main theme here.
SFX and integration
According to John Byrd (Gigantic Software) the game mix problem is getting bigger and bigger. The key is to make all the thousands of sound effects sound good and balanced, while it is in many cases difficult to predict which and how many sounds play at the same time. The biggest problem (perhaps from a functional perspective -SH) is masking. Zombies that are further away have an audio source that is further away so softer in comparison to your gun sound, thus not heard, even though hearing these sounds can be quite important for a player. John gave many solutions for overcoming these kind of problems (refer to the video in the GDC Vault for more details)
John also pointed to a commonly made mistake when using periodic sounds in games (usually sound loops). Sound designers should listen to periodic cycles in sound effects before they crop loops (e.g. for engines or automatic weapons) and try to understand how the sound is actually produced (in this case cylinders or the loading mechanism of a rifle). To give a simplified example: one would loop a periodic sound this way: ♫♫♫- ♫♫♫ and not like this ♫♫♪ – ♫♫♪.
Wwise is working on testing tools for developers that publish for a wide range of platforms. If you have a game that is ported to Xbox 360, PlayStation 3, iOS and want to roll out an update, a lot of performance testing has to be done on all these platforms. This usually takes a lot of time, but how does one know that users hear what they should be hearing? Wwise presented the software they use to test the simulated audio output of various game platforms faster than realtime on test servers to enable output analysis in order to find deviations. This is done by generating the diff output: difference between the reference audio file and the daily test output. Large deviations might suggest audio dropouts or missing audio files. If you want to know more about this, see the Wwise profiler explanation video:
Concerning the available hardware and software resources of audio during runtime, Alistair Hirst (Omni Audio) mentioned that we should focus more on intelligent implementation of audio as the resources for audio are limited and will always be as we’re working in realtime and want to accomplish decent framerates. Tradeoffs have to be made so we want the biggest bangs for the buck. However, each and every one of these tradeoffs is platform (and even genre) specific. To give examples of such tradeoffs:
CPU cycles vs RAM.
Decompressing MP3 on the fly
Decompressing MP3 into RAM
PCs have more RAM than consoles, so decompressing MP3 into RAM is more suitable for PC than consoles. Decompressing on the fly on consoles does require more CPU cycles but might prevent taking too much RAM.
Streams versus buffer size versus latency
Streaming: read head needs more time
More streams > Bigger buffers
Bigger buffers > higher latency
In this example, the designer wants to find the ideal tradeoff between buffer size, the number of streams and latency. All this is related to the realtime performance that is needed for the game concerned. This is what game audio programming is also about (and I can tell you this from my own experience as well: every platform has its own quirks, especially concerning realtime audio!).
Alistair concluded that there are many things that can be done smarter. In racing games, you can safely assume that players with a specific velocity will keep driving forward, so sound events can be preloaded based on the distance in relation to the velocity. Non-dynamic usage of DSP-effects such as EQ of distortion that is only used as effect should be avoided: audio middleware is not a DAW (Digital Audio Workstation) and using these effects is just too costly in his opinion. That being said: changing realtime effects are good as you incorporate an effective method of making the sounds more dynamic. You get it: again…a tradeoff!
There are many unique technological properties of all platforms and technologies involved. The best way to detect the concerned issues and tradeoffs is to deploy on the actual platform as early in the process as possible. To give an example: switching layers on a dual side DVD is slow. This introduces latency, so one is better off running DVD emulation tests early in the process to prevent this from becoming a serious problem when you’re working with a tight deadline.
Scott Selfon ran through all the compression formats and raised a very valid question: we have to use compression, but why don’t use something smarter than general compression settings (i.e. adaptive compression setting)? OK, we have tens of thousands of audio assets, but using for instance 120 kbps for every asset is not optimal, as some sound files can still sound well with a lower bitrate, while other deserve higher bitrates. Obviously, this is impossible to do manually, so tools that serve the tradeoff between compression and audio quality would be very useful!
Darren Korb explained that the narrator in Bastion never interrupts himself (‘line stomping’) and never repeats the same sentence. They integrated the assets with variables such as queue=neverqueue (just don’t play this file if something else is playing) or queue=always (play this file after the current file has completed).
The Music Technology Group in Barcelona presented a system using automatic classification of the user-generated content of the Freesound.org library for automatic sound generation. Jordi Janer gave a demonstration of the sound classification method (solid, liquid, aerodynamic, etc., see the slides for the overview). The Music Technology Group also presented a plugin for Unity3d that allows direct integration of Freesound content in the Unity editor, see https://github.com/jorgegarcia/UnityFreesound
Damian Kastbauer shared preliminary results of the Racing Game Sound Study and described various parameters of the sound design. The study is about to be published on http://blog.lostchocolatelab.com/ Have a look at this entertaining video showing all the race game sound design throughout the years:
John Rodd explained what mastering (for games) is about and as opposed to most mastering engineers who keep mastering a very mysterious activity, John gave a surprising amount of tips. Refer to the vault if you’re interested.
Leonard Paul presented his work on the game Vessel. All sound effects were recorded with a Zoom portable recorder and implemented in FMOD. Leonard has created quite a ‘living’ game environments by using all sorts of techniques as combining samples, procedural audio, manual segmentation (to prevent transients from being pitch shifted), layering sounds and granular synthesis.
Music and integration / Interactive-Adaptive Music
In many cases, nonlinear music is used, and I’ve seen quite some examples of it. The technical implementation of the music, and the parameters, events or actions triggering the music were frequently not explained in the attended sessions. Most applications of nonlinear music seem to be using stems (vertical) in combination with horizontal looping or triggering.
In Bastion, a technique was explained: approaching Zia, a game character that performs on a harp is reverb dependent and approaching Zia will make the reverb disappear and focuses your attention on the music. Listen to this technique in this walkthrough (8m08s is the moment when the magic happens):
Jason Graves showed how he used several layered loops in combination with stingers to build up tension in a particular game scene (‘stem mixing’). See this example video in the Vault.
Austin Wintory (Journey) explained the development of music thoughout the game Journey: the progression in music could be seen as ‘Darwinian composing’ (first drums with flute, later harp, then full orchestra) while creating emotional arcs. How this was actually implemented and triggered, was not explicated.
Bobby Arlauskas of GL33k showed a prototype that illustrated the concept of synchronizing game events to a metronome in order to have an audio-driven design instead of a reactive design. This basically is the same kind of approach that is found in Gluddle, the game developed by Richard van Tol and yours truly, where several actions such as enemy reaction time are synchronized to the musical beat and flow. Why would you wait for a time counter to trigger events of you can use the timing of the music as well? It is my belief that we’ll see game events triggered in musical timing more frequently in the near future.
Music? SFX? VO!
It seems like we’re always talking about music and sound effects, leaving speech in games completely behind. In my PhD research, it was found that many players regard good voice acting as a very effective constituent for immersion while unbelievable voice acting usually completely disrupts any form of immersion . Fortunately there’s DB Cooper to defend this important yet often overlooked side of game audio! DB ran through the process of making exertion sounds more realistic and showed that it’s quite important to have some insight in the action. If you’re for instance hit by a large object, your jaw probably falls open and it’s more logical to say “aaa” instead of “iii”. Furthermore, it’s wise to simulate what is actually happening, so when you’re acting like you’re wrestling, have someone hold you, or use some Pilates workout material to simulate the tension in your body. The examples DB and some session attendees gave clearly illustrated that the difference between ‘regular’ voice acting and voice acting ‘under pressure’ or simulated stress is actually enormous. At the end of the session, which was probably just as entertaining as informative DB got the question what was her favorite exertion sound and she responded: person dying in a fire. Of course, she was willing to illustrate this one as well: Goosebumps! Fortunately, the GDC-organisation was warned on beforehand so there was no panic breakout in the other rooms like there was last year.
Darren Korb (Bastion) certainly took voice acting serious and recorded 3000 lines in his closet with voice-actor Logan Cunningham. This took about 2 minutes per line, as we can see in the video on the following page:
[go to 29 minutes or click narration video demo at the left.]
Here’s an interview with Logan Cunningham:
In 2006 and 2007, the new game consoles and their apparent unlimited resources and the adventitious ‘fight’ for resources dictated the attention of game audio professionals. I would summarize the main developments of the professional field as two main things. In game sound design the integration (also: implementation) of sound assets has more priority than in the past years. we see that sound designers find many creative opportunities in the integration of assets, making combinations of sound files, reusing sound files for various sound instances and changing the response to interactivity can make the game world sound alive.
In game composition we see that originality and the contribution to a unique identity of the game has become more and more common. Many game composers try to search for unique combinations, new styles and new techniques instead of looking at other media or falling back on already established musical identities. They want the music to make the game stand out from the crowd, which could be regarded as a motivational choice (reinventing yourself as a composer and your creative process) as well as a commercially valid reason (why would players buy your game and why would media puslish on your work if it sounds the same as every other game?).
Immersion still is the most commonly found word to define a gratifying and absorbing gameplay experience. It still is a major point of interest for sound design as well as composition.
Silence is still mentioned as being underutilised in games (e.g. Austin Wintory, John Byrd).
Besides game audio and adaptive music systems, I’m also involved in the Creative Design Practices research program at the Utrecht School of the Arts. The focus in this programme is the collaboration in multi-disciplinary projects. I’ll summarise the above briefly from this perspective.
Most composers state that they want to get involved as early in the process as they can. This way it is possible to follow the iterations of the game design process and, possibly, influence the style of the game as well. Martin Stig Anderson got involved later in the process – not ideal in his opinion – but stated that he managed to be innovative in the integration instead. Darren Korb was involved early (just after early prototyping) and ‘drove the tone of the game’, he helped the audio feel organic and an integral part of the game and also influenced the overall art style.
Jason Graves mentioned that in comparison to other contexts, making game music is more satisfactory but also a completely different process. According to Jason, it’s about making many iterations, never giving up and the only valid reason should be that it’s your life. He also gave specific information about how to deal with the many, many iterations that have to be done in order to get to the final result: “Act like it’s Christmas Day” or in other words: be happy to change everything over and over again – ‘What doesn’t kill you makes you stronger.’
Austin Wintory (composer for Journey) mentioned that the game designers’ perception of the game differs from the view of the composer.
Concerning the constitution of game development teams, AAA-studios usually work with partly in-house, partly outsourced audio team. The audio programmers, audio designers, sound designers usually work in-house while composers usually work separated from the game development studio. Indie developers generally want ‘the audio guy’ (girl? – SH) that ‘does everything’.
Concerning the game audio tools we see a general focus on cross-platform capabilities (e.g. compatibility with mobile devices, game consoles, PCs, MACs etc. which allows the game to be easily ported to another platform), more capabilities of realtime processes such as DSP-effects and a more DAW-like user interface and functionality in audio middleware.
Izotope demonstrated their special bundle of realtime DSP-effects which are optimised for in-game sound processing while keeping the CPU-usage low. Examples of these effects are a peak limiter for realtime ‘game audio mastering’ and a hybrid reverb which utilises convolution for the early reflections and algorithm for the reverb tail. The hybrid reverb combines the sound quality of convolution with the efficiency of algorithmic reverb. The Izotope effects are also integrated in FMOD studio, see below.
FMOD demonstrated their new ‘studio’ – more like a DAW and less like game middleware.
See the demonstration of Audiokinetic Wwise:
GDC was a great experience thanks to all the speakers and I would like to compliment them with their great contributions. Particular care has been taken in summarising their experiences and opinions. Please contact me if there’s anything that is incorrect. For the reader it is recommended to visit the GDC-vault and watch the available videos to get a complete impression.
I would like to thank the Utrecht School of the Arts – School of Music and Technology for giving me this opportunity to visit the GDC!
GDC Vault links
- 80,000 Lines, Three Lessons Learned – Ariel Gross.
- AI-driven Dynamic Dialog through Fuzzy Pattern Matching. …– Elan Ruskin.
- Audio Boot Camp – Scott Selfon, Garry Taylor, Jason Graves, Martin Stig Andersen, Alistair Hirst, Sergio Pimentel, John Byrd, Bernard Rodrigue, Mike Caviezel.
- How To Ship a Game With Voices In 10 Languages? …On the same day? …And Keep It Consistent? – Alexandre Piche.
- Journey vs Monaco: Music is Storytelling – Austin Wintory.
- Orchestral Recording at Abbey Road for Lord of the Rings: War in the North – Craig Duman, Inon Zur, John Kurlander.
- Racing Games: A Semi-Formal Sound Study – Damian Kastbauer.
- Real-time Sound Propagation in Video Games – Jean-Francois Guay.
- Spot the Difference: AAA vs Indie VO Techniques – Michael Csurics, David Gilbert.
- Authoring Soundscapes with User Generated Content and Automatic Audio Classification – Jordi Janer.
- Digital Orchestration for the Video Game Composer – Fletcher Beasley.
- Build That Wall: Creating the Audio for Bastion – Darren Korb.
- From Minsk to London: How to make a live orchestra production in Europe happen – Pierre Langer.
- Squeeze Play: The State of Ady0 Cmprshn – Scott Selfon.
- The Art of Non-Music: Crime Shooter Kane & Lynch 2: Dog Days – Mona Mur.
- The Dynamic Audio of Vessel – Leonard Paul.
- The Emotional Puppeteer: Uncovering the Musical Strings that Tie Our Hearts to Games – Marty O’Donnell, Brandi House.
- The Weight of the World: creating massive destruction audio for Red Faction: Armageddon – Stephen Hodde.
- What We Learned About Practical Audio By Going To Disneyland – Dwight Okahara, Chris Olander.
 But it must be said that solving a puzzle in this particular case is based on events or actions that happen in a particular order. The player only controls the time between the events, but the events must happen in the order the designers have defined. One could refer to this as a linear sequence of events within a variable amount of time. This might explain why this game is suitable for sound design that is closely related to film sound design.
 I asked around for the right pronunciation of the name of the middleware Wwise as I’ve heard all sorts of varieties: it’s been said that the employees say wise without any stuttering or an accentuated -w. From now on, let’s stick with wise. Nice!
 Huiberts, S., Captivating Sound: the Role of Audio for Immersion in Games. Doctoral Thesis. University of Portsmouth and Utrecht School of the Arts, Portsmouth, 2010. http://bit.ly/audio-immersion
 Actually, John presented more very specific solutions. If you’re specifically interested in this, please refer to the video in the GDC Vault for more details.