• No results found

Email has also been an attractive area of study for artificial intelligence, in particular nat-ural language processing and machine learning. Many established tasks in these areas have been explored in email domains. Document classification has been applied as a way to sort emails into folders, and approaches have tried popular linear classification tech-niques [10], TFIDF centroid classifiers [213,214], rule induction [49] and pattern detection from graphs [1]. Another document classification task for email is call center classifica-tion, whereby incoming emails to support centers are automatically routed to necessary specialists or replied to automatically [28,137,178]. Perhaps the most common document classification task for email is spam detection, of which there are many hundreds of meth-ods and studies [205]. Information extraction extracts free text from emails to populate information structures. Sarawagi and Cohen [207] used Markov models for extracting person names mentioned in emails. Information extraction from emails has also been part of larger email processing systems, such as automated help desk answering [137].

Topic models have also been adapted for email. McCallum et al. extended the author-topic model to include recipients, as well as inferring the role a sender might have in an

email conversation [158]. Probabilistic latent semantic indexing (PLSI) has also been stud-ied in the context of email [232]. Similar Bayesian graphical models have been applstud-ied for other applications, such as recipient prediction [185]. Other standard machine learning techniques have been tried on email for various applications, including co-training [132]

and clustering [121, 169].

Established natural language processing tasks have been adapted for email, such as classification of speech acts to express actions like “propose” and “commit” [35, 50].

There has been a large amount of work on developing summarization systems for email.

Some efforts have applied standard techniques for extractive sentence summaries, like cen-troid methods [248] and sentence classification using RIPPER [195]. Others have devel-oped approaches based on properties unique to email threads, like question-answer sum-marization [165,248] and sumsum-marization using the quotation structure of a thread [30,31].

Some summarization tasks are unique to email, like indicative summaries of threads in in-formation retrieval systems [179], summarizing topics in an email corpus [182, 183] or task focused summarization of emails [52].

Understanding large collections of email data requires analyzing social structures in email. Social network analysis and relational learning techniques can discover these pat-terns. As an example, identifying significant relationships in email, often according to known relationship types, is important for identifying relevant social communities in an organization [62, 74]. In addition to learning from communication patterns, latent con-cept models, such as LDA and PLSI have been used to associate semantic concon-cepts with learned relationships [158, 257, 258]. Parties in the discovered relationships can be classi-fied according to role (directory, manager, associate, etc.) [177] or more general roles like the leader of a group [38, 241]. Learning in these email networks is useful for other tasks, such as resolving named entity references [73, 119, 168].

1.2.1 Intelligent Agents

Clearly, there is no shortage of papers applying established learning techniques or adapting known applications to the email domain. However, the critical difference between these applications and the ones we consider in this thesis is the learning goal. We define learning problems to support intelligent user interfaces that give users better information by which to make email decisions. In contrast, the primary contribution of the above work concerns the learning method or how it can be extended to email. The learning goals are typically email analysis and understanding while ours are user focused.

Several recent major projects have developed intelligent assistants for the email and desktop environments. The goal of the Personalized Agent that Learns (PAL) project1 is the development of cognitive assistants that improve the way computers support human information management. One aspect of this project, CALO (Cognitive Assistant that Learns and Organizes), has been brought to fruition through the IRIS client, which handles a wide variety of rich applications supported by intelligent assistants [47]. One such tool is LAPDOG, which acquires fully or partially automated procedures by observing a user’s interaction with PIM applications [104]. These procedures can span multiple applications and includes tasks like setting up a meeting or tracking a requisition. Another example is TaskTracer, a learning system integrated with the user’s desktop environment that infers the current task of the user [78, 230]. Once TaskTracer learns user activities it can sup-port user actions by suggesting relevant webpages [144] or appropriate folders for finding and saving documents [7]. Finally, the RADAR project, also part of PAL, has focused on supporting users engaged in managing tasks like organizing conferences. RADAR devel-oped components to manage user attention, learn important information from emails and learn to carry out tasks from user examples [101]. An automated assistant guides the user through a task based on previously observed behaviors.

These systems all developed learning technologies to support user actions. However, the key difference between these cognitive agents and our intelligent email applications

1http://www.darpa.mil/IPTO/programs/pal/pal.asp

is the nature of interaction between the user and the system. These applications manage information and applications for the user, relying on a cognitive agent to plan an event or carry out a task. While this may be useful for some tasks, users are wary of trusting complex learning systems. Glass et al. [107] found that users lacked confidence in the actions of systems like CALO because of their complexity and lack of transparency. In contrast, or goal is to enhance user actions rather than automate or supplant them. Our tools enhance the current email environment without replacing or redesigning it. Our tools provide users information; the user still completes the task. This style requires less trust from the user. We note that there are some previous systems that fall under our goal of intelligent email, including MailCat (later SwiftFile) which suggests folders to a user [214]

and CutOnce, which suggests recipients for outgoing messages [5, 36]. However, this thesis is the first to define the goal of intelligent email as supporting smarter user actions and defining the design and methodology of intelligent email systems.