Game Sound & Immersion – Annotations

Gijs Driesenaar sent me his annotations of Captivating Sound: the Role of Audio for Immersion in Games. If you’re a little short on time but would like to know what’s in the thesis, you can consider this the essence of the book. The numbers refer to page numbers. [Download as PDF]

6. This thesis explores the relation between game audio and (computer) game immersion. Audio is studied using the IEZA‐model (Interface, Effect, Zone, Affect) and the SCI‐model(Sensory- Challenge-based- and Imaginative Immersion), and several design issues are described. This yields a conceptual framework that describes various audio design issues that can be used to reflect upon conceptual decisions relevant for the design of audio in relation to immersion.

8. While playing games, players often become immersed, which is an important aspect for the game experience. While immersion is considered an important component of the game experience, it is still not really understood how players become immersed and how audio can contribute to immersion.

9. In the beginning, when the first game platforms started featuring sound, the engineers who constructed the arcade systems were the sound designers for games as well, but as both quantity and quality standards became higher, it was, in most cases, no longer feasible to have all the sound files and music designed by one person, and at present most commercial game developers employ large audio teams to produce the necessary quantity of sound and music files.

10. Although the audio teams are becoming substantially larger, many companies in the game industry still focus strongly on the visual aspects of games, or in other words, that video is dominant over audio. As the game industry continues to search for more pervasive experiences, sound offers many possibilities that have only marginally been investigated.

11. This thesis attempts to make a contribution to the field of game audio design focusing on issues relevant for conceptual audio design, i.e. the decisions that are made before the actual design of the assets takes place. Here, we will focus on the conceptual decision‐making in order to provide a resource for reflecting upon design: analysis before, during and after synthesis, instead of the act of design itself (synthesis). This study only deals with single‐player computer games

12. The model is intended as a conceptual tool for conceptually designing and reflecting upon designing game audio and in this thesis its main function is to provide a coherent vocabulary for the definition and typology of game audio and to provide a conceptual model for understanding the functioning of game audio in relation to immersion. Through the general discussion rooms of non‐genre‐specific gaming forums, regular gamers were asked to participate in this user survey about game audio. Ten gamers were interviewed about five different games. After ten minutes of game play, a digital questionnaire was presented.

13. As a second way of involving game players, the website Pretty Ugly Gamesound Study (henceforth: PUGS) was created for this study. It has delivered several interesting cases for understanding the experience in relation to game audio.

14. For the sake of a clear description of the relation between sound and immersion central in this thesis, a coherent and usable typology, that offers an insight into the conceptual structure and organisation of game audio, is essential.

15 – 20. Generally, five types of distinctions are used to categorise game sound, and some typologies combine these types.
production­ based, which often relates to the three types of audio: speech, sound and music. Game music composer Folmann (2004) extends this classification and discerns the four dimensions vocalisation, sound­FX, ambient­FX and music, which form ‘the four main dimensions of Game Audio’.
organisation of sound assets in a game (­system). This concerns how different groups of sounds are implemented in a specific game: avatar sounds, object sounds, (non­player) character sounds, ornamental sounds and instructions.
Combining the approaches of discerning different sound types and the origin of sound within the game environment within the soundscape, linked to the diegesis. The categorisation contains five categories of ‘sound objects’: score, effect, interface, zone and speech, which form the game environment.
– the meaning of sound for the player, which contains four main types of signal‐referent relationships: diegetic sounds, symbolic sounds, masking sounds and non­diegetic sounds.
distinguishing different types of interactivity. Three categories of diegetic can be distinguished: non­dynamic diegetic audio, adaptive diegetic audio and interactive diegetic audio and the non‐diegetic part of the soundscape consists of non­ dynamic linear sounds and music, adaptive non­diegetic sounds and interactive non­ diegetic sounds.

20 The IEZA model incorporates two conceptual dimensions that describe the communication of meaning with game sound. A difference with the previously discussed frameworks is that IEZA links two dimensions that both relate to what is communicated with the auditory soundscape, which offers four different domains.

21 Four different occurrences of game audio can be distinguished, which form the current scope of game audio:
– Audio during the interactive gameplay.
– Audio during other interactive moments, or when the game is paused, for instance during the pause option, different menus and save dialogues.
– Audio that is part of the game when the game is active but the player is not interacting, for instance, during an intro or cut scenes.
– Audio outside the context of the game.
It will be clear that for immersion, the actual interactive gameplay is most important, as the player is able to fully concentrate on the experience by participating.

22, 23. IEZA is primarily applicable to game audio during the interactive game play. When the communication by means of game audio is examined, the game environment produces sound that is linked to sound sources that exist in the fictional game world. These are used to communicate the world within the game and the objects that are present in that world. These are the diegetic sounds. Opposed to these sounds within the game world are those with sources outside of the fictional game world. These objects as well as the background music communicate on a different level in the game environment and are not to be perceived by the avatar. These are the non‐diegetic sounds.

24. While the diegetic dimension distinguishes domains belonging to the game world (diegetic) and those who are not belonging to the game world (non‐diegetic), the dimension of ‘interdependence’ contains two poles: the Activity (Interface and Effect) and the Setting (Zone and Affect) of the game. The Activity communicates events occurring in the game environment, while the Setting provides a background or context for the Activity.

25 – 28. The two intersecting dimensions of the model establish four domains.
– The first domain, named Effect (diegetic), contains sound objects that are perceived as being produced by or attributed to sources that exist within the game world.
– The second domain, Zone (diegetic), consists of sound sources that clearly originate from the diegetic part of the game and which are linked to the environment in which the game is played. Zone corresponds with what game designers often refer to as ambient, environmental or background sound. Although Zones started out as being non‐responsive to the player ‐ static background layers ‐ possibly because of the limitations of resources, Zones can also be reactive to the player.
– The third domain, Interface (non-diegetic), consists of sound that belongs
to sound sources outside of the fictional game world.
– The fourth domain, Affect, consists of sound that is linked to the non‐diegetic part of the game environment and specifically that part that expresses the non‐diegetic setting of the game.

29 – 31. As far as active gameplay is concerned, there are two main perspectives on the expression of in‐game audio aimed at the player’s experience.
– On the one hand, audio is used to optimise game play, in such a way that it is helping the player to play the game by providing necessary gameplay information.
– On the other hand, sound in games is used to dynamise game play, in other words, to make the gameplay experience more intense and thrilling.
Both optimisation and dynamisation can have a positive influence on the game experience. Often, the optimising role is more applicable to the Activity side of the IEZA model, and the dynamising role more often accomplished with the Setting side of the game, but not exclusively.

32, 33. So far, the communicative properties of the domains have been addressed. With IEZA, it is also possible to address general design properties within the domains.
– Affect with a reference to the Activity is found. Especially in action games, Affect is designed as reactive to the Activity by letting the Setting respond to the actions of the player.
– designers can refer with the Affect to the diegesis. This is done by adding atmospheric sounds that are not clearly diegetic, but also less clearly recognisable as music.
– audio is recognisable as diegetic but designed in a way that refers to Affect.
– Zone that more clearly responds to the Activity.
– Effect sounds that have a connection with the Setting
– sounds belonging to the Effect domain that are purely reactive. The sound of hitting a stone with a wooden stick only refers to the action belonging to that sound.
– non‐diegetic sounds that have a reference to the diegesis. This is found when the sounds of the HUD have diegetic properties.
– sounds belonging to the Interface domain that have a small connection with Affect.

36. Most people who have ever played a computer game are likely to have experienced certain feelings of being absorbed by it. In this state, which is most commonly called ‘immersion’, it can be rather difficult – or at least undesirable ‐ to react to stimuli from the real world. A very important reason that explains why immersion is found to be attractive is that it makes players less aware of themselves and the real world the player is in, reinforcing the experience of playing a game.

37 – 39. Game audio designers are also confronted with immersion. Yet, there has hardly been developed any theory on the connection between audio and immersion. Although the word ‘immersion’ is often used, definitions are rather sparse. A reason for this lies in the fact that immersion is a term that defines a state that is also relevant outside the context of interactive media or games. It covers the state of being submerged in a liquid, as well as the state of being deeply engaged with an activity, to give the two most common meanings.
A few descriptions of an immersive experience are:
– ‘one in which a person is enveloped in a feeling of isolation from the real world’
– ‘the experience of losing a sense of embodiment in the present whilst concentrating on a mediated environment.’
– ‘lose track of immediate physical surroundings’
– ‘the player’s sense of actually being in the game world.’
– ‘the state of mind where a person is completely absorbed in what he is doing’
– ‘the pleasure of being in a different environment than usual, the pleasure of living a different life’
While the ‘loss of sense of self’ is likely to occur in many forms of media consumption, the feelings of being immersed in a game world in combination with intense concentration is a distinct property of the experience of game play.

40. The three basic aspects of immersion are:
– transportation into the game world
– absorption in the activity
– identification with the situation

41 – 44. Over the past decade, several attempts have been made to classify what immersion consists of.
– ‘audiovisual quality and style’, ‘level of challenge’ and ‘imaginary world and fantasy’. These correspond with three dimensions of immersion: sensory immersion, challenge‐based immersion and imaginative immersion (SCI-model)
– The first type, tactical immersion, is immersion in the ‘moment‐by‐moment act of playing the game, and is typically found in fast action games.’ The second type, strategic immersion, is a ‘cerebral kind of involvement with the game.’ Lastly, narrative immersion in games concerns ‘absorption in a narrative when a player starts to care about the characters and wants to know how the story is going to end’.
spatial immersion: extensive manoeuvring in the game world in real time. emotional immersion: narrative, similar to books. cognitive immersion: abstract reasoning, complex problem solving. sensory­motoric immersion: result of feedback loops between repetitious movements. psychological immersion: immersion outside of the game, confusing real world with game world.
tactical involvement: related to all forms of plan formulation and on‐the‐spot decision making. performative involvement: related to all modes of avatar or game piece control, ranging from learning controls to the fluency of internalised movement. affective involvement: related to the cognitive, emotional and kinaesthetic feedback loop that is formed between the game process and the player. shared involvement: involvement with controlling an avatar in a represented environment. ‘Anchors the player firmly to the location, both spatially and socially. Covers all aspects of communication with and relation to other agents in the game world.’ narrative involvement: involvement with ‘narrative elements like a game world’s history and background, or the back‐story of a current mission or quest (designed narrative) and the player’s interpretation of the game‐play experience (personal narrative).’ spatial involvement: ‘is related to locating oneself within a wider game area than is visible on the screen. It can take the form of mental maps, directions from other players or referral to in‐game or out of game maps and covers aspects such as exploration and exploitation of the game‐space for strategic purposes.

The three aspects that were mentioned earlier are present in most of the classifications.

45 – 47. In the SCI-model there are three dimensions of immersion.
– The first dimension of immersion, sensory immersion concerns engagement with the sensory rewarding aspects of a game. In games that do feature a game world, the sensory features often stimulate the feeling of being there, in the sense that the game world becomes a new reality for the player and the real world moves to the background. In games without a virtual world, such as puzzle games, sensory appeal can make the virtual experience become a new reality for the player as well, for example when attractive physics and beautiful sounds involve the player.
– The second dimension of immersion in the SCI‐model, challenge­-based immersion, concerns the engagement with a competitive process, problem solving, interacting with the game and competing or cooperating with others. this dimension of immersion occurs when players experience a balanced level of challenge and skills, are succeeding and advancing, and are immersed in the ‘overall suspense of playing.’ It concerns ‘sensomotor abilities’ such as using the controls and reacting fast on stimuli but cognitive challenges are also involved.
– The third dimension, imaginative immersion, concerns the engagement with the ‘imaginary world and fantasy, game characters, worlds and story line’. This dimension is concerned becoming immersed with the story or world, or identification with a game character.

48 – 49. Besides classifying immersion, with the purpose of describing immersion as a multi‐ dimensional phenomenon, there is also the time‐based character of immersion. Players become immersed over time and will eventually stop being immersed after a certain period of gameplay. Linked to this is the fact that losing track of time due to immersion is frequently mentioned by players. Gameplay evolves and progresses in time. For game audio design, this time‐based character is very important to acknowledge as sound exists in time. 3 stages of immersion: Engagement, Engrossment and Total Immersion. These stages of immersion describe the subsequent process of a player becoming immersed in time.

50, 51. For sensory immersion, audio that is positioned at the diegetic side of IEZA is likely to be used for making the player experience a feeling of presence in the game world, because it is the diegetic side that mainly builds the game world. Challenge‐based immersion is mainly connected to the gameplay activity, so audio that is positioned at the activity side of IEZA is mainly used for enhancing this dimension of immersion. As imaginative immersion has a strong connection with the narrative aspects of games, it is mainly induced by audio that is positioned at the Setting side of IEZA.

52, 53. Just as the immersive experience can be enhanced by components of the game, immersion can also be disturbed if the dimensions of immersion are hindered, either by audio or by other game components. We can assume that sensory immersion is hindered or diminished by audio features that decrease the general appreciation on the sensory level or disturb the player from feeling present in the game (world). It is likely that challenge‐based immersion is hindered or diminished by audio features that disturb the feeling of flow of the player. For imaginative immersion, audio features that disturb the player’s ability to empathise with the game or the situation might disturb immersion. In addition to these assumptions, it also assumed that immersion can be hampered by a lack of audio elements that players expect in the game.

54, 56. Audio has undergone an analogous development. The sensory side of games has consequently become more in line with the real world and thus more convincing, contributing to sensory immersion. Two main aspects concerning the enhancement of sensory immersion with audio can be defined: feeling of presence and sensory gratification.

57 – 62. Three topics can be distinguished in relation to the enhancement of the feeling of presence:
– Stimulating the feeling of presence with details in world design. Detailed worlds can be accomplished, for instance, by implementing many sound sources in the diegetic side of IEZA (Effect and Zone).
– Stimulating the feeling of presence with spatial audio. In order to induce a feeling of presence in the game world, audio can be used to surround and thus immerse the player with sound.
– Stimulating the feeling of presence with audio‐only assets. Effect is, for instance, used to convey the presence of game characters, objects or other instances in the Activity of the game that can be interacted with but are not in range of the avatar, for example, opponents behind the avatar or in other rooms. Zone can be used to sonify what surrounds the player and is a very useful category for adding references that are not present in the graphics. Interface can be used as an alternative for communication of information that is at that time difficult for the player to see on screen (e.g. health status) when the player has to focus his eyes on the virtual world, thus helping the player to fully concentrate on the game world.

63 – 67. Three topics will be distinguished in relation to sensory gratification:
– Stimulating sensory gratification with dynamics. Dynamics in the auditory soundscape keep the game interesting. Especially when a great deal of hours of gameplay is required, dynamics and the accentuation of intense moments can help the player to enjoy the game for a longer time.
– Stimulating sensory gratification with spatial audio. In games that do not feature a world enabling players to experience a feeling of presence, surround sound can still make the experience more intense.
– Stimulating sensory gratification with appealing audio. Generally, high‐quality audio contributes to the sensory gratification of players.

68. The second dimension, challenge‐based immersion, chiefly comprises engagement with gameplay, where the player is triggered by challenges. Audio in games can be a valuable constituent for supporting and challenging the player in this process, since both activity and sound progress in time.
Four topics will be examined in relation to the enhancement of challenge‐based immersion:
– Audio and the tempo of gameplay
– Audio and the structure of the game
– Audio‐driven gameplay
– Audio‐based gameplay

69, 70. The tempo of gameplay varies amongst games. In some games, the actions of the player are continuous and rapid, while in other games the player has to interact more cautiously or reflect upon his decisions. Here, we find two characteristics: motor skills (such as reacting rapidly to specific events) and cognitive challenges (strategic thinking or logical problem solving in a puzzle game). Music in particular can alter the perception of the duration of different levels, making them seem longer or shorter. An important factor for the perception of time is the musical tempo.

71, 72. With regards to the tempo of gameplay, we see that many games that mainly require motor skills offer fast music, accompanied with very direct sound signals, thus supporting the player to focus on the activity of gameplay. What can be noticed in the games that principally require cognitive skills, on the other hand, is that these tend to feature music with a more relaxed mood and more subtle sound design, making the experience more reflective and allowing the player to concentrate on strategic planning.

73, 74. Most games have a certain structure, for instance in the form of a division of levels or a storyline. Enhancing the changes in the gameplay can increase challenge‐based immersion. When there is certain progression in the levels of a game, players appreciate music that follows that progression.

75 – 78. As opposed to the function of audio to support the tempo of gameplay or structure of the game, as discussed in the previous two sections, music can also actively drive the actions of the player. The numerous ‘music games’ (also: rhythm games) that have been developed in the past years are good examples of audio‐driven gaming, as they use audio as a basic constituent of the gameplay activity. In these cases, a direct drive on gameplay by audio can be distinguished, which comprehends that the actions are to be performed in the same rhythm of the music. When music is synchronised to the gameplay, the pace of the gameplay consecutively is dictated by the tempo of the music.

79 – 81. In addition to audio‐driven gameplay, where the player reacts to visual stimuli that are synchronised to the rhythm or other aspects of music, interaction directly and solely based on audio is also found in games. This kind of interaction is not as common as interaction on purely visual stimuli or visual combined with auditory stimuli, so the number of available cases is limited. In addition to visual stimuli dictating gameplay – which is found in most games – interaction dictated by auditory cues can also enhance the challenges for the player. This provides a different experience and mostly changes the way the player is listening.

82, 83. A key aspect for the imaginative dimension of immersion is the player’s empathy with the game. Empathy with the game character is very important during the deepest stage of immersion, Total Immersion. Game music that is found in the Affect domain of IEZA is often used similarly, with the intention of making the player’s connection to the imaginative dimension stronger.
Three main topics of imaginative immersion will be discerned in this section. Firstly, audio can be used as contribution to the player’s empathy with game. Secondly, audio can enhance the setting of the game world. Thirdly, audio can be used to enhance the player’s empathy with the story of the game.

84. The non‐playing characters in games enhance the story or the world and help the player to empathise with the game. The sounds belonging to these characters are mostly recorded by human voice actors. When voice acting is found to be believable by players, it is easier for them to empathise with the characters of the game, which contributes to imaginative immersion. Regarding voice acting, we see that three aspects can be distinguished: the verbal meaning, the intonational meaning and the timbre. For the verbal meaning, the character should convey the right information. The intonational meaning mainly depends on the skills of the voice actor; this is the person who has to interpret the setting. The third aspect, the timbre of the voice chiefly depends on the vocal casting: if the verbal meaning and the intonational meaning are coherent, the character could still be wrong for a specific game character.

85. Besides the voice acting, also sounds belonging to the Activity side of IEZA can be significant for imaginative immersion, mainly because of the emotional impact they can have on gameplay. Players mainly point to sounds that are important for the state of the avatar, either because the sound scares the player (it is a threat for the avatar) or because sounds are associated with vitality (weapons make the avatar stronger). Just as the sounds of tools should fit the characters using them, the sounds of opponents of the avatar can make players empathise with the situation.

86. the Affect domain can be used to make the player empathise with a game. Often, it is used to induce a mood which is perceived rather unconsciously and manages to convey the Setting instantly.
The Affect is often linked to other contexts and at least four general categories can be distinguished:
– Affects based on the computer game genre
– Affects based on films
– Affects based on pop music
– Semi‐diegetic Affects
These four categories, frequently to be found in games, as we will now see, have distinct properties and various implications for empathy.

87. The computer game music genre is formed by the repertoire of music found in classic games, such as arcade and action games. The limited amount of technical resources in classic consoles has dictated the sonic qualities of the music and the simplistic tunes that are played with basic tone generators or sound chips forms part of the identity of classic games. Now, this style has (almost) become a musical genre in itself.

88. A second category of Affect is based on the film music style. The use of this Affect originates from the use of music in films in the Hollywood tradition. When MIDI files and on‐board midi synthesisers were implemented into the game systems the game composer had a larger number of instrument tracks at his disposal, offering new timbres for creating different types of scores. Mostly, these Affects try in some way to correspond with the narrative setting and story of the game, in a similar way that film scores function. While films are traditionally linear, the score in games often requires a more flexible system with fragments to correspond with the interactive character of games; interactive and adaptive systems make the Affect in games correspond with events in the game or the behaviour of the player

89, 90. When recorded audio tracks started to be implemented into games, a different type of Affect was imported to the soundtrack: pop music. Inherent to pop music is the social or cultural group reference, which can be used to make players identify with an identity or subculture. This type of Affect can be very appealing for specific target audiences, but can also exclude users. One of the difficulties is that this kind of music in a sense includes the attached artist or subculture from the real world into the non‐diegetic of the game. Generally, players state that if the artist of the song is able to match with the identity of the game world, the use of pop music has a positive or neutral influence on immersion, provided that the musical style fits the situation or narrative.

91, 92. A fourth type of Affect in games that can be distinguished is the semi­diegetic Affect. This is often found in games where the experience of the diegetic world is important and a clearly non‐diegetic Affect is found to intrude this diegetic experience. This often concerns first person shooters and the use of this type of sound corresponds with soundtracks of films with tense moments that use sonic layers of sound rather than (orchestral) music. Very distinct for this type of Affect is that its effect is less direct and often blurs with the Zone category, in such a way that the complete Setting forms a background atmosphere.

93, 94. Besides the use of Affect, empathy of the player in relation to the setting can be induced by enhancing the Diegetic Setting (world setting) with audio. An ambience mostly conveys a mood by incorporating elements that refer to settings in the real world, games or in movies. In addition to these world elements, reverberation can be used for mood induction. A very important task for the designer is to provide an answer to the question ‘where am I?’ that arises after starting up a game. Carson writes that for an optimal engagement of the visitor, it is best to answer this question within 15 seconds, and thereafter, it is important to give some more information about the relationship to the place the user is in, which is essential for players to know what their role is in the setting. Presenting musical clues at the very beginning of the game will help most players to catch the setting instantly.

95. Relevant for the functioning of sound and music related to the imaginative dimension of immersion is the concept of the magic circle , which is the frame in which a game exists, where the rules of a game create a special set of meanings for a player. Audio is present in this frame and forms – in a sense – an ‘imaginary contract’ with the player. This contract consists for example, of the expectations of the player in combination with the properties of the game, for example, certain video game conventions or the style. By participating, the player agrees with the contract that is offered and consequently has specific expectations concerning how things sound in that game. The presence of a sound asset that fundamentally goes against the contract is liable to disrupt immersion, while auditory components in accordance with the contract support immersion.

96 – 98. A component that contributes to imaginative immersion is the story of a game. Audio can be used to enhance the story and this is recognised by players as positively influencing immersion. When sound is used to enhance imaginative immersion, it is important to acknowledge two types of emotional response to the game: primary (character) and secondary (audience) emotions. The primary emotions concern the character, the secondary the player, who experiences the primary emotions but is able to feel differently from the reflected emotions of the game character. Concerning these two emotional settings, there are two types of usage of sound: creating empathy with the avatar in the setting of the narrative, and supporting the secondary emotions of the player. Typically, Affect for challenge‐based immersion relies more often on secondary emotions (the player has won) while for imaginative immersion the aim is to couple the primary and secondary emotions (the story has ended and the player is happy).
Designers can use the Affect to refer to the emotions belonging to the domains of IEZA. For the Affect domain, this gives the following options:
– Emotions belonging to the Activity: expressing the emotions belonging to how the player is performing at a specific moment
– Emotions belonging to the Setting: expressing the emotions belonging to the feel of the game
– Emotions belonging to the Diegetic: expression of for instance the emotional responses to the world of a level
– Emotions belonging to the Non‐diegetic: primarily representing the status of the game: the feel of the game

99 – 105. Affect is often used to evoke a feeling or cultural setting at the beginning of levels or during gameplay. Moreover, music is used to indicate special events that provoke the player to react in a specific manner. Especially in an imaginative context, there is another method, which can be used to enhance empathy: the use of Affect as an evaluative component of the game. For the longer narrative structure, for instance, adding impact to a very important event in the game, music can be used to emphasise the importance of what has happened. Narrative may contain: abstract, orientation, complicating action, evaluation, result or resolution and coda. The abstract is optional but quite often used in the very beginning of a narrative to briefly summarise the whole story. The orientation gives the setting, it provides a method of identification with the time, place and situation. Evaluation is used by the narrator to indicate the point of the narrative, clarifying why it was told and giving context to the listener. The coda is used to bridge the gap between the end of the story and the present. When Affect is used to elaborate evaluation in the narrative structure, it not only indicates what is occurring in the game at this moment, or what is going to happen, but is aimed at attributing meaning to what is happening or has just happened, hoping to make the player empathise with the situation. To keep the player in the mood and to help concentration during game play, some of the barriers of immersion concerning the disruption of the flow of the game (for instance due to loading screens), can be overcome with sound.

106 – 108. Music in games is mentioned most often for having a negative influence on immersion. In many cases, this occurs when no relation is felt to exist between music and the activities or events in the game. In other cases, this relation is felt to be too obvious, so the player becomes aware that walking across a specific trigger causes the music to play. Repetitive music – often because there is no relation between gameplay and musical structure ‐ is also mentioned: short, repetitive and continuous musical loops or too much repetition of musical fragments become obtrusive and easily decrease the player’s immersion.
In‐game speech fragments are frequently mentioned for negatively influencing immersion. This mostly involves unconvincing voice‐acting: voice fragments that do not seem to represent the drama of the storyline. Regarding speech recordings in games, the main finding is that incoherency, repetition or the reflection of an incorrect setting makes game characters unrealistic which negatively influences immersion.
Sound effects are also mentioned as having a negative influence on immersion. Often, this concerns sounds that are failing to convince the player or do not match the player’s interpretation of the game world. Sounds can also be intrusive, often when they are perceived as too loud or ‘ugly’. Players describe these as annoying, and they often want to mute the sound output or stop playing.

109. This section will explain the negative influence of audio on sensory immersion. Audio is found to be diminishing the feeling of presence of players. Often, this occurs when the level of detail is too small and visually different objects have the same sound. A diminished amount of sensory gratification caused by audio is also described by players. This often occurs when the feedback is unpleasant, irritating or disappointing.

110. Cases are found that confirm that audio can negatively influence challenge‐based immersion by disturbing the feeling of flow. The feeling of flow is often interrupted when the response of audio does not suit the game play, by being overly reactive (audio expresses the Activity too obviously) or non‐ responsive (audio does not respond to the Activity).

111. Audio can have a negative influence on imaginative immersion by diminishing the player’s empathy with the character, setting and story. When audio disturbs imaginative immersion, there often is a discrepancy between the occurrence of the audio in the game and the interpretation of the game by the player.

112. Cases have been found of players mentioning that an absence of sounds can negatively influence immersion, for instance, when seeing a character talk without the presence of an auditory equivalent. Most players value audio as an important factor for playing games and a silent game soundscape or missing sound objects are likely to diminish immersion, unless the user environment does not allow for audio playback.

113 – 116. In order to increase understanding of IEZA and SCI, audio with a diminishing factor on the dimensions of immersion has been mapped to the domains of IEZA. For sensory immersion, the non‐ diegetic side of the model (Interface and Affect) is mainly mentioned for unpleasantly sounding instances, while the diegetic side often is mentioned in relation to the game world. For challenge‐based immersion we mainly see issues where the response of audio to the actions of the player is concerned. For imaginative immersion, issues have been found where the occurrence of specific sounds or music conflicts with the characters, setting or story.
Some general issues concerning the quality can be distinguished in the corners of IEZA. When the quality of sound speech and music in the domains of IEZA is not sufficient, these instances are likely to annoy the player, often with a diminishing effect on immersion.

Read the full PDF here