Scene Perception
Ronald A. Rensink, Cambridge Basic Research, Nissan Research & Development, Inc., Cambridge MA, USA.

In AE Kazdin (ed.), Encyclopedia of Psychology. vol. 7. (pp. 151-155). New York: Oxford University Press. 2000.   [pdf]

Introduction

Scene Perception is the visual perception of an environment as viewed by an observer at any given time. It includes not only the perception of individual objects, but also such things as their relative locations, and expectations about what other kinds of objects might be encountered.

Given that scene perception is so effortless for most observers, it might be thought of as something easy to understand. However, the amount of effort required by a process often bears little relation to its underlying complexity. A closer look shows that scene perception is a highly complex activity, and that any account of it must deal with several difficult issues: What exactly is a scene? What aspects of it do we represent? And what are the processes involved? Finding the answers to these questions has proven to be extraordinarily difficult.

However, answers are being found, and a general understanding of scene perception is beginning to emerge. Interestingly, this emerging picture shows that much of our subjective experience as observers is highly misleading, at least in regards to the way that scene perception is carried out. In particular, the impression of a stable picture-like representation somewhere in our heads turns out to be largely an illusion.

To see how this comes about, imagine a seashore where there is a sailboat, some rocks, some clouds, and perhaps a few other objects. How do we perceive this scene? Intuitively, it seems that the set of objects in the environment would give rise to a corresponding set of representations in the observer. Thus, there would be detailed representations of the sailboat, clouds, etc., with each representation describing the identity, location, and 'meaning' of the item it refers to. In this view, the goal of scene perception is to form a literal re-presentation of the world, with all of its visible structure represented concurrently and in great detail everywhere. This representation then serves as the basis for all subsequent visual processing.

As it turns out, however, memory for visual detail is generally quite short-lived (maybe 100 ms, Irwin, 1996). And since successive eye fixations are usually separated by at least 150-200 ms, it follows that their contents cannot be integrated into a complete, detailed representation. Conversely, it has also been found that a complete, detailed representation is not necessary--the meaning of a scene (e.g., whether or not it is a seashore) can be determined within 100-120 ms (Biederman, 1981; Potter, 1976), a time that allows recognition of only a few objects. Evidently, a small set of object and scene properties is enough to provide us with an impression of a scene that is complete and detailed everywhere.

This realization causes a shift in perspective: scene representations are no longer structures built up from eye movements and attentional shifts, but rather are rapidly-formed structures that can guide such activities. More generally, the goal of scene perception appears to be the establishment of an immediate context for various aspects of visual processing, as well as for visuo-motor operations such as reaching or locomotion.

How might this be done? Scene perception is a special case of visual perception, and so likely involves the same processing levels as vision generally. [See VISION.] The first of these is low-level processing, which uses the incoming light to recover simple properties of the environment visible to the observer, such as the color of the sky or the texture of the clouds. The second is mid-level processing, concerned with more complex tasks, such as separating the sailboat from its background, and representing it as a distinct object with its own size, shape, and colors. Finally, there is high-level processing, concerned with issues of meaning. For example, high-level processes might identify a mid-level object as a sailboat and a scene as a seashore, and so allow us to expect such things as seagulls, whitecaps, and fishing vessels.

The exact nature of the processes involved in scene perception is largely unknown. However, at least some understanding--summarized in the following sections--has been obtained of the kinds of operations carried out at each processing level, and their interactions with each other.


Back to The Need for Attention to See Change.