Explorations in Sonic Browsing

M. Fernström, L. Bannon

Interaction Design Centre, Foundation Building, University of Limerick, Ireland,

Email: mikael.fernstrom@ul.ie, liam.bannon@ul.ie

 

Abstract

This paper describes a novel browser prototype that has been designed and implemented on PC's and soundcards. Our focus has been on the development of a usable and engaging interface which exploits both visual and aural features of the data space. The project involves state-of-the-art work in human-computer interaction and multimedia development. We are working on a data set of musical compositions, and are designing and testing the prototype with a group of musicians. This paper provides some detail on the development process, the current architecture of the system, and describes some of the problems encountered.

Keywords: browsing, multimedia, visualisation, sonification, spatial audio.

Introduction

The BROWSE project is a 2-year research project focused on the development of Novel Multimedia Browsing Mechanisms. We are particularly interested in investigating how sound can be used more fully in order to assist users in comprehending complex data sets. Initially, a survey of browsing methods and techniques currently available in various software applications was conducted. This was followed by a number of exploratory interviews and observations on everyday browsing behaviours and strategies of specific user groups. We then decided to focus our attention on browsing in particular domains, in order to focus our activities in prototype development. We have focused our work to date on a musical data set, which is the well-known Fleischmann collection of Irish traditional music [Ó Súilleabháin 1997]. Our work with this data set has been assisted by our close collaboration with musicians - our user community - in the Irish World Music Centre at the University of Limerick. We have developed a prototype system running on a standard multimedia PC with Windows software where we are investigating a star-field type display [Shneiderman 1994] of data utilising a number of user-controllable parameters in the display — shape, colour, size, and location of objects. As already mentioned, we have been particularly interested in developing the audio output in the prototype, as this is a seriously neglected area in current systems. We have thus had to re-design a number of device drivers so that we now can support multiple stream audio. From the user's perspective, they are now browsing a soundscape as well as a screenscape, and as they move the cursor in the star-field display representation, they receive audio streams of the nearest neighbouring objects. We are currently conducting user trials on the prototype in order to determine the most optimal cues for browsing in this domain. We believe that the software environment that we have created can be used for a wider field of application especially within the field of sonification. The low-level drivers and application program interface can be applied in a multitude of ways and we intend to further develop these functions. In what follows, we provide a more detailed account of the issues that we have investigated, the data set we have been working with, and the development process that we have conducted in order to construct the prototype. We provide a snapshot of its current status, and finally discuss our plans for further work in the coming final year of the project.

Remarks on current systems

Browsing is an iterative and interactive activity, often with an opportunistic or serendipitous attitude. Initially facing an unknown space with unknown symbols, the users investigates items - noting similarities and differences in the properties of objects and drawing conclusions about links between the objects, thus becoming familiar with the information-space. They can then either be "lucky" or skilled enough to find an object worth a more detailed inspection, or to be in a position to apply more formal search methods [Jerke 1990, Marchionini 1995 pp 100-138]. In this context it is important to note that information objects have both external and internal properties. When people browse, they start by taking the external properties into account. When they have located objects of interest, they start to peruse the internal properties of the objects. Sometimes this exercise will force them to re-evaluate their understanding of the external properties and a new browsing session is initiated, studying the external properties again but now with a revised understanding of the mapping between external and internal properties of the objects.

The problem we are addressing in this project is how to make it easier for users to find items in complex data sets and to develop appropriate domain metaphors and representational mappings. These should help the user to get an understanding of the information-space available in specific domains and with a high level of virtuality insofar as it enables users to understand and manipulate information in new ways, that are not possible on paper or with other media [Erickson 1990, Hutchins 1987, Marchionini 1995 p.45, Nelson 1990]. The development of multimedia has started to provide a richer user interface, with sound, images, video, graphics and text on a single platform — a computer, thus multimedia in itself is not a product or solution, it is merely a method with the potential to create a more effective communication with the user [Bannon 1993]. The common WIMP-ideas and the use of poor domain metaphors have become restrictive in allowing access to information [Nelson 1990] and new models and methods are required. It is also important to note that the user's requirements and abilities change; in time, with different tasks and with different individuals [Bannon 1991].

Most existing browsing systems only deal with visual representation, but some systems also provide audio representation. Systems like Albers' Audiable Web [Albers 1995B], and Blattner's digital interactive map [Blattner 1994] and LoPresti and Harris' loudSPIRE [LoPresti & Harris 1996] show how sound can be added and closely integrated with the interaction, with emphasis on ease of use and the sound cues applied discretely to provide a richer user experience. Brewster has demonstrated that sounds are easier to differentiate if more complex waveforms are used (as in MIDI controlled synthesizers with instrumental timbres), as well as enhancing the users’ performance [Brewster 1994]. This is not surprising as timbre is a strong cue for perceptual grouping of sounds and stream segregation [Bregman 1990 pp 478-490].

As a starting point for the visual aspects of browsing, we investigated Shneiderman and Ahlberg's work with a star-field representation and dynamic queries [Shneiderman 1994]. The main features of their system is that it forms a domain metaphor of high virtuality by creating a graphic mapping of many different kinds of domains such as Filmfinder. With Ahlberg’s latest implementation, Spotfire [IVEE 1996] the user only gets visual browsing-support, at least until an object is clicked on and an information-box opens up. In later realisations they have demonstrated browsing of, for example, demographic, financial and environmental information. In Spotfire, the information objects are represented as coloured dots displayed in a two-dimensional diagram and in the case of demographic and environmental data, this 2-D space can be mapped onto an image of geographical map. By means of sliders within the display area the user can build dynamic queries, which means that the coloured dots will be visible/invisible (or highlighted/greyed-out) depending on the query formed by the setting of the sliders. When information objects of interest have been located, the user can click an object (a coloured dot) for further inspection of the information contained in the object, i.e. the internal properties of the object.

Browsing sound objects.

Most people that have worked with, for example, multimedia production have eventually faced the challenge of having hundreds or thousands of sound files (or other media files), having to navigate, browse and chose between them to fit into the production. With the traditional WIMPS interfaces, files or objects have to be clicked or double-clicked. Alternatively some sort of media player that involves pull-down menus and clicking can be used for activation of sound playback.

A metaphor that came to mind while working on the early phases of this project was the old longwave radio, where the user could move a "cursor" along a dial, often with several radio stations heard at the same time, trying to turn the dial in such a way that a desired station would become stronger or clearer.

The selected data set

From a complexity point of view an ideal domain and data set would contain all the elements of multimedia: text, images, sound and video. Another important point is that due the fact that our design philosophy involves close links with user communities and utilising rapid prototyping methodology, we required close contact and direct access to suitable domains, data sets and individuals with domain knowledge expertise [Bannon 1991B, Marchionini 1995B].

We found a suitable domain and data set in the Irish World Music Centre at UL — the so called Fleischmann collection, or "Sources of Irish Traditional Music c. 1600-1855". This data set contains 11,734 records of high complexity, including musical score notation and a number of different classifications and keys (Figure 1). As music is a very complex art-form in itself, the data set also contains cross-references between records for tunes that are similar, a feature that can be explored in terms of hyperlinks. The musical score in itself contains a very rich annotation, unfortunately mainly accessible to musicians with classical training. One important feature that can be considered is to make the data-set accessible to a larger population (including many traditional musicians) with notation in for example tin-flute tablature, numerical tablature, alphabetic notation, etc. With the score data available in electronic form, it is fully possible to let the user select among a number of different score representations as well.

Figure 1: An excerpt from the Fleischmann Collection.

Using Sound

We set out to investigate how to get a tighter coupling of the interaction with the data set and to provide users with richer support from the system. The application of sound in the user interface have been neglected for a long time and some of the important and early work by Blattner et al. and Gaver [Blattner 1989, Gaver 1989] is still not available on everybody’s PC’s, although attempts and trials have been running for several years.

In evolutionary terms, sound provides both an "early warning system" that eventually might guide our vision and attention to locations important for our survival, as well as having a strong influence on our mood and how we interpret what we see. Here, it is also interesting to note systems like the ARKola simulation [Gaver 1991] which improved the users performance by making them aware of surrounding activities by means of sound.

In the visual interface, we began to explore how to add and combine more dimensions. Relative to the Spotfire system, we added geometrical shape to the range of visual representation of information objects. In the sonic interface, we investigated multiple stream spatial audio representations. For the integration of multimedia in the browser interface we attempted to combine the visual and sonic representations, to provide consistent interaction and to increase the user's feeling of direct engagement [Gaver 1997].

What makes our browsing prototype different is also the fact that our data set containins music, and music is best perceived by listening. In our secondary data set, containing environmental data with no natural or direct representation or meaning from an auditory point of view, other strategies will have to be applied such as auralisation to represent multivariate data [Bly 1982, Albers 1995, Slaney et al 1996].

Mapping and meaning

With a musical data set such as the Fleischmann, it is quite easy and natural for the users to learn the mapping between visual and aural representations. When using the system it becomes clear that if, for example, the user has selected to map the type of tune (jig, reel, hornpipe, polka, etc.) to shape, and then moving around in the screen and soundscape, the musical meaning associated with the visual representation will be learnt as the user hears each respective type of tune played when the cursor is in the neighbourhood of, or on, the different visual representations.

Another aspect of browsing a musical data set in this manner is that the musical content itself has its own semantics and syntax such as tonality, rhythm, timbre, etc., which will assist the user to form streams and to keep each stream together in the general cacophony when moving around the soundscape between and among tunes. The kind of sounds used also seems to be quite important for the user’s ability to discriminate between tunes, i.e. the timbres of the different tunes.

When the user moves the cursor in the screenscape, the soundscape is also active and changes. Up to eight of the objects closest to the cursor will be playing. Each tune is represented by its own sound source that is placed in 3-D audio space. The mapping of the of screenscape and soundscape is done so that an object above the cursor on the screen will be heard in front of the user, an object below the cursor will be heard behind the user, and vice versa. From initial user testing, we have observed that users seem to be able to navigate the soundscape quite well. Some researchers have reported problems with spatial sound representation and small movements of the head that seem to enhance the ability to localise sound [Kobayashi and Schmandt 1997]. In our system, this seems to be compensated by the fact that the user can move the cursor and the soundscape will the change. This tight coupling could be similar to small head movements, that are emulated by small movements of the cursor which in turn is controlled by the mouse. This would, of course, not fulfil exactly the same function for the user because a normal cursor cannot be turned around in a single location, but the dynamics in the interaction can make it easier for the users to localise the sound sources. The current version of the spatialization library that we are using does not support turning around in one point, but this function can be emulated by moving all sound sources around the listener, with which we are experimenting.

Implementing the Prototype

Based on a traditional WIMP approach, as much as possible of the active screen area is made available to display a visual representations of the data-set. In addition to this, a number of function areas are needed: menus for general control of the browser, tool bar for widgets and controls, and a status area to provide working memory support for the user for the visual representations in the main viewing area. (Figure 2).

Objects in the data set are displayed as different geometrical shapes in colours, sizes and horizontal and vertical locations. This level of symbolic representation would primarily reflect the external properties of the objects.

When the cursor enters the main window area, the objects closest to the cursor reveal something about their internal properties, and this level of representation is of course highly dependant on what kind of objects we are dealing with. The status area has icons that can be selected by the user to control aspects of the representation of the objects in the main window. When an association has been made by the user, a descriptive text is displayed above each respective icon. If the user clicks a specific icon, a dialogue-box opens up that allows the user to modify the mapping of the particular property.

Figure 2: Screenshot of the prototype browser

Initial default values for the mapping are provided and the user can arbitrarily modify the settings. Mapping to colour present problems because it does not naturally define an ordered continuum and since colour ordering does not have a strong population stereotype. It is also important to note that in a browsing task our ability to differentiate between colours is around five different colours [Marcus 1995, p.430, Wickens 1992, p.102]. Assuming that the user can set the general default order of colours (for example red, yellow, green, blue and violet in 16 steps), the hue will then be mapped accordingly. By selecting the property/colour icon the user can modify the mapping to fit the individual and the task.

Similar problems applies to the mapping of shape as there is no universal understanding of the meaning of different geometrical abstract shapes. For the geometrically minded it might make sense to regard a circle as 1, a line as 2, a triangle as 3, square as 4, etc., but it only makes sense to the individual that makes the choices and defines the mapping. Our ability to differentiate between different shapes is also limited and in general it is in the same range as the ability to differentiate between colours. In the prototype work on the browser we tried a number of algorithms to generate shapes, but because we wanted to make it possible for the user to override all defaults and create a mapping that suits the individual user and the task at hand we decided to use Windows TrueType fonts for a set of shapes as most basic geometric shapes are available in the Wingdings character set. This worked quite well, because the fonts is scaleable which made the mapping of size easy to implement and still all mappings can be modified by the u