1.4 Research Papers
2.1.1 Programming by Demonstration
Dated back to as early as the 90s, Cypher published the book, Watch What I Do[25], which compiles early Programming by Example (PbE) and Programming by Demonstration (PbD) systems until 1993. Works from that book paved ways for PbE and PbD system design and implementation. Examples of notable systems in the book include Pygmalion, the first PbD system which introduced the concept of creating a script by observing a user demonstrating steps of the task instead of hav- ing the user write abstract logic in a programming language; TELS, a system which predicts the next text editing action similar to what Microsoft Excel’s Autofill does
2.1. Traditional Approaches to End-user Program Synthesis 31 nowadays; Eager, a PbD system that learns from a set of user demonstrated exam- ples to produce the more generalized scripts; and Triggers, a system that uses visual information of the display screen to trigger keyboard and mouse macros. These system build concepts some of which are still used in current PbD systems. Current well known PbD systems are discussed below.
Sheepdog[19, 53] is a PbD system specially designed for IT support tasks. Af- ter observing IT experts demonstrate variations of procedures to complete a specific task, it then uses an Input/Output Hidden Markov Model to learn how to complete the task from the set of demonstrated sequences of actions. At the execution time, Sheepdog couples user interaction with the inference to guide the system on each step. Moreover, Familiar [77] was developed as a PbD system to automate itera- tive (looping) GUI tasks by learning from a few performed examples of the task. Nevertheless, target applications of both systems need to conform to OS accessibil- ity APIs which restricted them to the closed environments while HILC proposed in Chapter 3 and RecurBot proposed in Chapter 4 are not restricted by the APIs.
CHINLE [22] allows applications which were developed under the SUPPLE framework [29] to automatically generate its own PbD functionality. The system takes as input SUPPLE’s functional specification of the application’s interface. Al- though the applications of the system are restricted to the SUPPLE framework, the paper studied extensively on an important problem which is recovering from user’s mistakes at the demonstration phase. CHINLE allows users to modify the generated script before the execution phase. It is also confirmed in Chapter 4 that user demon- stration inputs are brittle and users are willing to help preventing the script to run into disaster. In contrast to CHINLE, RecurBot solves the issue automatically with the proposed motif discovery algorithm.
Sikuli Slides[5] was developed as an extended work of Sikuli [95], described in detail in section 2.1.2, to simplify the GUI automation script creation process which allows less programming-skilled users to generate such scripts. Instead of coding with Sikuli script, Sikuli Slides represents a process as a PowerPoint slide presentation, listing each sequential step on a separate slide. This generated slide
2.1. Traditional Approaches to End-user Program Synthesis 32 can be executed as a script later by the system. Additionally, Sikuli Slides provides an action recorder, so a user can record the process by demonstrating and saving it as a starting draft of the Sikuli Slides script. Although the system process GUI screen-captured images instead of requiring access to the Accessibility API, similar to HILC in Chapter 3 and RecurBot in Chapter 4, the ability of the action recording feature is still very limited. For example, it can only interpret simple user events and linear tasks. The scripts generated by the action recorder are often incomplete and need modification from the users to work.
Nowadays, smartphones have become a part of human everyday life. There have been many attempts to allow users to create automation scripts which operate their smartphone [9, 2, 3, 1, 67, 58, 10]. Out of those works, Keep Doing It [67] and Sugilite[58] were developed under the Programing by Demonstration concept, both of them relying on Android Accessibility API to listen to users demonstrated events. Keep Doing It [67] analyzes users’ mobile interaction logs to generate automation script for the task users intended to do. Toby et al. [58] recently proposed Sugilite. The system gives users many flexibilities such as users can modify the generated scripts later when the GUI is changed, creating forks for the conditional tasks. The system learns to generalize tasks well thank to the Android Accessibility API which not only allows the system to access the demonstrated events but also the software hierarchy of the involved applications. Unfortunately, These two systems rely on the APIs like existing PbD systems, so they also suffer from tasks involving web- based applications and poorly labeled alternative text applications.
It is a truism to say that most of the existing PbD systems hugely rely on the Accessibility APIs, except the Sikuli Slides’s action recorder which is still at the primitive stage. In this thesis, two PbD systems which successfully analyze visual data to observe user demonstrations instead of relying on the Accessibility APIs are proposed. This allows the systems to work across applications and to be independent of domain applications.
2.1. Traditional Approaches to End-user Program Synthesis 33