The intentionality and vision project, funded by a grant from the National Science Foundation, represents an interdisciplinary collaboration between researchers in the Department of Psychology and Human Development and the Center for Intelligent Systems at Vanderbilt University. We are exploring both how people think about the capabilities of other people, computers, and robots, how these cognitions affect interactive behavior, and the perceptual basis for action parsing.
As computers become increasingly powerful, they will become progressively more and more integrated into the real world. This phenomenon is especially salient for the kind of humanoid robots that are currently being developed to fill real-world functions ranging from household chores to elder-care. Among the challenges these devices pose, perhaps the most difficult is the need for a two-way understanding between the robots and their human users. Not only do humans need to understand robot capabilities and representational states, but robots require the same understanding of humans. This is particularly true if robots are to have productive and flexible interactions with humans, a process that requires a careful alignment of understanding that is dynamic enough to coordinate a complex flow of changing circumstances, beliefs, desires, and intentions. The research proposed here represents an attempt to understand A) how people will construe the representational states of robots, particularly with respect to vision and B) the cognitive and perceptual basis for this construal.
Recent research by the PI suggests that humanoid robots will invoke a powerful anthropomorphism that will organize people’s basic understanding of the device’s representations of the visual world. This research suggests that people make systematic mispredictions about visual experience, vastly overestimating their own and others’ ability to see visual changes, and further that these overestimates also apply to mechanical representational systems such as computers when they are described as having anthropomorphic beliefs, goals, and intentions. Although this research has little precedence in the adult cognitive literature, a rich tradition of research within developmental psychology has explored children’s emerging understanding of the representations inherent to people. This research has also begun to identify specific perceptual cues that may serve to bootstrap knowledge about representations. Dr Megan Saylor, a co-PI on this project has been exploring these cues, testing the degree to which infants segment ongoing action in preparation for associating it with human representational states.
We plan to explore people’s beliefs about ISAC, a highly anthropomorphic humanoid robot that co-PI’s Kazuhiko Kawamura and Mitch Miller have been using as a testing environment to simulate a wide range of cognitive functions. In the most basic experiments, we will ask whether subjects overestimate ISAC’s ability to see visual changes. Follow-ups will explore the degree to which these misunderstandings affect assumptions that might underlie on-line human-robot interactions, and will explore the perceptual basis for invoking an anthropomorphic model. Our focus is both to understand intentional vision in the human users of robotic systems, and ultimately to use this understanding as the basis for the AI underlying the robots’ processing of human users’ intentions. In particular, Drs Kawamura and Miller have been exploring the need to structure ongoing events for long-term storage. The representational understanding we will explore may provide a set of heuristics that can generate this structure.
Research in this area documents subjects’ apparent inability to detect anomalous changes to objects, even when these changes occur in objets they are attending to. These experiments started with short edited motion pictures in which each new shot included an anomalous change, referred to as a “continuity error” by filmmakers. For example, in one shot an actor might be wearing a scarf, and in the next the scarf might disappear. We found that subjects were strikingly unable to detect the vast majority of continuity errors, even when they were intentionally looking for mistakes.
This research went on to ask if subjects would notice changes in objects they were directly attending to. In these experiments subjects view short motion pictures in which the sole actor in a scene changes to another person across an edit. Subjects also missed this change in a majority of cases. Finally we found the same in naturalistic real world interactions. Even when we unexpectedly substitute the subjects conversation partner (in the middle of a conversation between a first experimenter and the subject, two other experimenters carry a door between them, and while the experimenters are occluded from the subject, one of the experimenters carrying the door takes the place of the first), approximately 50% have failed to notice the change.
In all of these cases, we are interested in what information is used to combine different views of a scene. More recent follow-ups to this work have explored the degree to which changes can be missed even when people have represented the changing objects, and have tested the degree to which change blindness is sometimes caused by overwriting. In addition, we have been exploring the degree to which scene jumbling affects detection of different kinds of changes, and the degree to which beliefs and concepts affect change detection.
Intuitively, and according to some models of visual perception, the visual properties of objects in each view are matched to determine the relative displacement between views. This finding, along with others, suggests that this kind of visual detail is unlikely to underlie integration of views. Further, this finding conflicts with metacognitive beliefs that our sense of personal continuity of experience is based on the on-line availability of such details. Therefore, one our projects has been to explore these beliefs and to document not only how they diverge from reality, but also how they might be related to more basic kinds of reasoning about intentional representations. Thus, this research explores reasoning about vision in adults, but makes close contact with developmental research on children’s emerging theory of mind.
This research represents an attempt at understanding how the visual features that define different face categories (such as race, gender, or possibly age) are processed. At first, I was interested in the contrast between same-race (SR) and cross-race (CR) face categories because they appeared to represent a clear case where differences in perceptual expertise cause interesting qualitative differences in object perception. This was based on research (along with long-standing folk knowledge) which shows that people recognize CR faces less accurately than SR faces. Most researchers believe that this is caused by reduced perceptual expertise with CR faces, and further that this reduction is marked by an inability to code subtle variations in the configuration of facial features in CR faces.
As this project progressed, however, it began to appear that something else might account for the CR recognition deficit. In a number of studies, I have shown that we seem to code category-specifying information in CR faces that we do not code in SR faces, and further that individuals who show no CR recognition deficit also do not appear to code much of this group-marking information. I therefore have been arguing that the CR recognition deficit might be caused by a feature selection process whereby group-specifying information is coded as a feature in CR faces at the expense of individuating information. Therefore the CR recognition deficit is caused not by reduced expertise, but rather by a perceptual frame that gives high priority to race-specifying information in CR faces.