An integrated development environment (IDE) is a set of user interfaces with which the programmer can write, compile, execute and debug a program. The most commonly used IDEs, including VisualStudio, Eclipse and Xcode, are all text-based and used for development of any type of program. Most programmers these days use text-based programming languages and use text-based develop-ment environdevelop-ments to create programs.
Programmers are people, too. Therefore, it is important to think of human factors in development environments to improve the productivity of programming activities. One way of taking human factors of computer systems into account is to consider the gulf of execution and evaluation [121]; i.e., the gap between the user’s intention and the results of their actions. It may also be applied to development environments [81]; when a programmer develops a program using a text-based programming language, there may be a gulf of execution, as shown in Figure 1.1. It is not straightforward to correctly translate the intent to the text-based code. Code completion and interface builders aim to bridge this gap.
Furthermore, when a programmer debugs the program using a text-based de-bugger, there is a gulf of evaluation. It is not easy to understand its dynamic
public static void main(String[] args) { // some code // to do novel stuff }
Development environment
Brown fox j Cancel OK
OKBrown fox j ncel Intermittent &
simple input
Runtime environment
Figure 1.1: Development of the programs with conventional input and output.
Image processing Posture data processing Mobile robot control Robot posture control
Figure 1.2: Examples of programs with real-world input and output.
behavior from textual information presented by the debugger. A debugger that visualizes program execution [94] and one that helps reasoning errors [80] may assist with this. As seen in these examples, a typical approach to bridge the gulf of execution and evaluation in programming is to provide an appropriate graphi-cal user interface (GUI), which provides visual cues in development environments and connects the static description of the program and its resulting behavior.
Compared with the relatively conservative development of IDEs over the past several decades, the variety of computer applications has grown considerably, and this evolution has been accompanied by new input and output modalities. The computer was invented as a machine to automate calculations, and the programs were stored as punched cards and executed in a batch manner without user in-teraction. The keyboard and display later provided scope for more interactivity through character-based user interfaces (CUIs). Computers subsequently became personal devices, following the realization of GUIs. In addition to the keyboard, a mouse was used to provide information on the status of several buttons and movement in two-dimensional space. The two-dimensional movement was explic-itly bound to the movement of a pointer, which is part of a windows, icons, menus and pointer (WIMP) environment. Numerous efforts have been made to improve user interface tools [116], and event languages represent a successful standard-ization of the user input, and map physical actions to GUI operations. Event information is provided to the programs intermittently, each time user input oc-curs. Post-WIMP paradigms include recognition-based interfaces [101] with new devices, such as the Freeform User Interface with pen input [67], gesture-based touch interface and voice recognition with a microphone. While these early at-tempts made use of new input modalities to control GUI applications, recent trends place more focus on physical interactions in the real world.
Interactive programs that deal with real-world input and output (real-world I/O) are growing in popularity. Such applications include camera-based interac-tions, augmented reality, tangible user interfaces, physical computing, and user interfaces for robots. Examples are shown in Figure 1.2. In these applications, there are no standardized I/O events. Raw I/O data are received from sensors and sent to actuators, and have to be processed by the program continuously in real time. In particular, this dissertation focuses on a certain kind of real-world I/O whose data is best represented visually by photos and videos. Potential ap-plication of our method to other kinds of real-world I/O such as audio, tactile sensation and smell will be discussed in Subsection 7.2.2 but is not our main contribution.
When the program deals with real-world I/O, the gulf of execution and eval-uation becomes wider. This results from the use of different development and runtime environments, as shown in Figure 1.3. When the program uses a conven-tional CUI or GUI, the development environment shares the same I/O devices as the runtime environment, as shown in Figure 1.1. The programmer uses the
Runtime environment Development environment
public static void main(String[] args) { // some code // to do novel stuff
} Cam1
LocX
Img1 Val2
Continuous &
complex input
Figure 1.3: Development of the programs with real-world input and output.
mouse and keyboard to both develop and run the program. In this case, input to and output from the program can be intuitively represented and reproduced by the user interface, using either a CUI or GUI. For instance, a keystroke can be represented by a character code and the movement of a mouse can be repre-sented by change of the location of two-dimensional coordinates. The primary difficulty in bridging this gulf was how to visualize and provide intuitive navi-gation over complex data structures and the dynamic behavior of the program.
However, when the program deals with the real-world I/O, the development en-vironment typically employs conventional CUI or GUI with a mouse, keyboard and display, yet the runtime environment may involve physical movements of one or more of users, objects or robots, which cannot be represented well by the ex-isting user interfaces of IDEs. Therefore, it is difficult to develop and debug such programs. Please note that the scope of this dissertation is to aid development of interactive programs by filling the gap between the I/O modalities. While there have been much work on remote debugging tools that aims to fill the gap be-tween a computer that develops the program and the other computer (typically a microcomputer) that runs the program, our work assumes that the program is developed and ran on the same computer but with different set of I/O devices.
Existing approaches to address this gulf include programming by example (PbE), in which the user demonstrates operations to the system and the system guesses the program [33, 93]. When it is applied to the development of programs that deal with real-world I/O, it can eliminate the need for explicit program-ming, and the user does not require prior knowledge of programming. In PbE systems, the program is specified using the runtime environment; in other words, there is no distinction between the development environment and the runtime environment. Therefore, the user does not need to alternate between different modalities, and the gulf of execution and evaluation of programming can be re-moved. The drawback is that it does not allow the user to precisely describe the logic of the program. While it may be sufficient for end users, another gulf of execution and evaluation arises for the programmer who wants complete control over the resulting program. It is difficult for him to infer what kind of and how many examples are sufficient to realize his intent. It is also difficult to test the outcome logically. The programmer has to give more and more examples to test whether the program functions as intended.