Audio-only menus

This post is about an old thesis written in 2002 for the Utrecht School of the Arts, School of Music and Technology.[1] It contains guidelines for the usability of audio-only menus. It’s written in Dutch and I’d like to share some insights that might be useful for designing audio menus or audio games.

In the past years, I’ve designed quite some audio menus for audio games and supervised projects that used audio-only interaction for blind users. Below I share some of my experiences concerning these menus, and include the original recommendations of the thesis.

illustration by zkukkuiz

illustration by zkukkuiz

An updated summary of Audio Menus (2002).

An audio menu is a menu, which utilizes sound as primary feedback mechanic. The user interacts in various ways with the menu system, for example with keys or speech control. Audio menus are mostly found in applications where the user is not able to use the visual alternatives of menus. Also, when the menu is not in sight (controlling the menu of a ‘smarthome’-system throughout the building) or when a visual task is required (e.g. in a car), audio menus can be valuable. At present, audio menus are most frequently found in audio games [2] or games for the blind.

In general, these menus do not seem to be as pleasant and user-friendly as most visual menus are. Current menus found in phone systems often read out all the options, before letting the user make his choice. In this case, the user often feels passive due to this very passive way of interaction. In this case, the user is waiting continuously.
When developing an audio menu, one could choose a ‘more active’ interaction model, in which the user can browse with two keys and listen to each option in his own tempo. In that case, he can interrupt options and confirm his choice. With this model the waiting time is decreased because the menu-system does not have to tell the user which option corresponds to which key every time, which prevents listener-fatigue when used frequently. Also, the user might feel ‘in control’ instead of slaved to the application, while carrying out instructions. Yet, the instructions for a more active interaction model requires more explanation and some players have difficulties, because of their non-existent expectations.

One key issue to acknowledge is that the domain of audio fundamentally differs from the visual domain. The information in the auditory domain is presented in time instead of at the same time on screen. For the user, it can be rather difficult to keep overview. This also has implications for the number of options that can be represented in one menu layer: after 7 options, most users have forgotten what the first options were. A good source on this aspect comes from Buxton, Gaver & Bly (1991) [3]. The following information is abstracted from their article, which shows the strengths and weaknesses of using audio in the interface:

Time Space
  • sound exists in time
  • good for display of changing events
  • available for a limited time
  • sound exists over space
  • need not face source
  • a limited number of messages at once
  • vision exists over time
  • good for display of static objects
  • can be sampled over time
  • vision exists in space
  • must face source
  • messages can be spatially distributed

Other considerations to improve usability are:

  • decrease the number of choices within one level of the menu, to ease navigation and increase overview;
  • rearrange the options in a more logical order so the desired option is found faster, preferably adaptive to the presumed task of the user, following user profiles;
  • using background sound to give the user an impression of the menu-layers and the structure of the menu [4]. In a prototype that was made for the menu, most users understood the function of an ambient sound layer that altered when a different layer of the menu was accessed. The main menu can be easily accentuated with sound;
  • choosing the correct terminology in the explanation to inform the user about the menu-interaction. Visual references can be misleading when interacting in the auditory domain. To give examples: what is ‘back to top’ when using sound? What is the most right option?

One should also take into account that the user goes through a learning process. Few users have encountered an audio menu, and do not have experiences with audio menus on regular computers. The presence of a computer screen often has high impact on the expectations of the user. It can be desirable to replay the spoken explanation for users that do not interact with the menu and use and alternative – more complete – explanation when they do not seem to understand at all how to interact with the system.

A difficulty with audio menus is that experienced users as well as novice users have to be able to work with the menu. In graphical menus, this is not an issue, as the user knows the location of the button and does not have to wait for the options to appear.
A way of handling this is to assume that experienced users use higher interaction speeds: they know the options and are already acquainted with the controls. A designer can choose to wait for a short period before playing the explanation files, so experienced users do not have to hear “This is an audio menu. Use these keys to browse…”. Also, auditory icons can accomplish this ‘need for speed’, when they replace the function of the speech samples. In the audio game Drive (2002) [5], these techniques were applied to allow the player to navigate with high speed, while hearing only the short and subtle icons (so only the beginning of the files). The short silence between the tones and the speech fragment prevents clicks in the audio. Novice users just listen to the complete sound file and make their choice in their own time.

The short auditory icon at the beginning of the sound file allows fast navigation for experienced users.

The short auditory icon at the beginning of the sound file allows fast navigation for experienced users.

A more frequent use of audio menu’s and a (perhaps spontaneously arised) standardization are likely to have a positive effect on the expectation and experience of users.


[1] The author would like to express his gratitude to Jan IJzermans, Richard van Tol,Hugo Verweij and Kees Went.


[3] Buxton, W., Gaver, W. & Bly, S. (1991). The use of non-speech audio at the interface. Tutorial no. 8. In: CHI’91 Conference proceedings, Human Factors in Computing Systems, ‘Reaching through technology,’ New Orleans, ACM Press: Addison-Wesley.

[4] This technique was implemented in the main menu of the Audio Game Maker by The Bartiméus Accessibility Foundation.

[5] Huiberts, Van Tol, Verweij. Utrecht School of the Arts. Review and research report at


Huiberts, S. (2002). Audio menus. Unpublished thesis. (Utrecht School of the Arts, The Netherlands). Updated abstract, 2008, Retrieved 15 December, 2008, from