Hipikat: Recommending Pertinent
Software Development Artifacts
Davor Čubranić and Gail C. Murphy ICSE 2003
Hipikat?
◼
A tool
▪ Recovering an development group memory which
is not recorded.
▪ Recommending artifacts related with new comer's task
Motivation
◼
A large , varied amount of information of an
existing software project
◼
Complicated!
◼
Mentoring? Not possible in open-source
projects
◼
How new comer joining an existing software
project can be productive more quickly?
Goals
•
Recommending relevant artifacts to be useful for
newcomer’s tasks
– Artifacts
• The versions of the source, the bugs, archived electronic communication (mailing lists or newsgroups)
• No explicit relationship between artifacts
An implicit group memory
•
Validating if Hipikat is useful for this
– Inferring links between the artifacts?– Suggesting relevant parts of the group memory for a newcomer’s task (mentoring)
Outline
◼
Understand open-source project
◼
Approach
◼
About Hipikat
◼
How to validate
Understand Open-source projects
◼
Four electronic artifacts
▪ Revision control system
▪ E.g. CVS source repository
▪ Issue-tracking system
▪ E.g. Bugzilla
▪ Communication channels
▪ E.g. Mailing lists, newsgroups
▪ Online documentation
▪ E.g. Reference manual, programming guides
Understand Open-source Projects
◼
Example: Eclipse.org
▪ CVS
▪ Bugzilla
▪ A web site with documentation
▪ Wiki: Sharing knowledge
▪ Public newsgroups and mailing lists
Wow!
◼
Really large and varied
information!
◼
How to recommend useful
artifacts to newcomers?
Approach
◼
Two parts to recommend artifacts
▪ Form the implicit group memory from artifacts and communications
▪ Present to the developer artifacts selected from the
implicit group memory
Hipikat!
Approach – How possible?
◼
There is an implicit group memory
Approach – How possible?
mailing lists newsgroups BugZilla CVS Design Documents Project membersImplicit group
memory!
Approach – How possible?
◼
But…
▪ Missing artifacts or links not contained in the linkage schema?
▪ How?
▪ Reverse engineering for missing artifacts
▪ E.g. design documents generated from source code
▪ Using project artifacts and meta-information
▪ E.g. using check-in comment of CVS referencing the relevant change of bugs reports.
▪ E.g. comparing the revision time with the closing time of a bug report
Approach – Yes possible!
◼
Seems to be possible to
form
the implicit
Approach
◼
Hipikat, how to present?
▪ Three distinct functions of Hipikat
▪ Identification
▪ Implicit group memory with inference of missing links and artifacts
▪ Selection of relevant artifacts by queries
▪ Update of the implicit group memory
▪ Additions and changes?
Hipikat prototype
◼
Eclipse plug-in
◼
A client-server system
◼
Client? For what?
▪ Request for suggestions
▪ Three parameters for a request
▪ Users: for future extensions
▪ Artifact (Artifact type and its identifier, e.g. CVS revision#)
▪ Additional description (optional)
▪ Display results from a server
◼
Server?
Hipikat Client
◼
How to query in Eclipse
▪ E.g.
▪ Selecting a class in a Java package browser
▪ “Query Hipikat” from a pop-up context menu
◼
Display results
▪ Recommendations grouped by artifact type and selection criteria
▪ Manage recommendation lists
▪ Delete or move items
▪ For future use to rate recommendation items
Hipikat Server
◼
Implantation of:
▪ Update function
▪ Identification function
Hipikat Server
◼
Update module
▪ Four sub modules for BugZilla, CVS, newsgroups and the web site.
▪ Monitoring each artifact
▪ Insert data from new and changed artifacts into DB
▪ Metadata (CVS revision, the author, check-in time,…)
▪ Text (bug description or check-in comments) for indexing
Hipikat Server
◼
Identification
▪ Four sub modules
1. Check-in comment matcher (log-matcher)
▪ Infer ‘implements’ link
2. Check-in time matcher (activity-matcher)
▪ Monitor all activities in BugZilla
Look for check-in close to the activities in BugZilla by time (within six hours)
Hipikat Server
•Check-in comment matcher •Check-in time matcher
•Text similarity matcher •Newsgroup thread matcher
Hipikat Server
◼
Identification
▪ Four sub modules (Contd.)
3. Text similarity matcher
▪ Indexing the text of new artifacts by turning each artifact into a document vector, measuring:
The term’s global weight (overall importance)
The term’s local weight (frequency of terms in each document) log-entropy combination
▪ Use of a standard information retrieval vector-space cosine similarity measure
5. Newsgroup thread matcher
Log-entropy combination
The number of times term i occurs in document j
The number of documents
The number of documents containing term i
Hipikat Server
◼
Selection
▪ By following links from the artifact specified in client’s request
▪ Sub modules for making recommendations
▪ By artifact types and their links
Hipikat Server
◼
How to implement
▪ As a web application with Tomcat
▪ SOAP (Simple Object Access Protocol) for communication
How to vaidate
◼
Initial Qualitative Study
▪ Questions?
▪ Helpful to developers working on a change task?
▪ Used recommendations?
▪ Required recommendations not suggested by Hipikat?
▪ Building the implicit group memory of a medium-sized software system, AVID, manually
▪ Mock-up client
▪ Giving an assignment to students to make two changes to AVID with mock-up client or other tools (Rigi, Chava, jRM)
How to vaidate
◼
Initial Qualitative Study
▪ Results
▪ Possible to make Reasonable suggestions
▪ Suggestions were useful to developers
▪ Based on the results, Hipikat prototype for Eclipse was implemented
▪ Two enhancements
▪ The return of reason for recommendation to a user ▪ Making suggestions based on ‘a query’ from the user
How to vaidate
◼
Case Study: An Eclipse Change Task
▪ Change task
▪ Automatic update of Eclipse’s CVS repository browser when a
new version is created. ▪ Use of Hipikat
▪ Open the Bugzilla item describing this change in Eclipse
▪ A Hipikat query from the context menu of the opened item
▪ Choose the most relevant item in Hipikat search result window (Bug 11419)
“Automatically add version to repository view when tagging”
▪ Another Hipikat query for Bug 11419
▪ Choose the CVS revision of Bug 11419 and request ‘a diff’ of it
▪ Apply changes based on the fix of Bug 11419
Request a Diff Query Hipikat
Advantages
◼ Good for finding a similar bug fix and relevant artifacts
◼ Tell me similar to what I’m doing
◼ Applicable to other software projects
◼ The development process of eclipse similar with other open source projects
Limitations
◼
Hipikat is only text-based
▪ Text similarity and use of meta-data
◼
Accuracy of recommendations by Hipikat?
▪ Activity-matcher
▪ Low confidence in a link when the distance is over five minutes.
◼
Validation is not enough
Discussion
◼ How to improve identification of links between
artifacts?
▪ Collaborative filtering: e.g.
▪ High priority of newsgroup articles posted by developers
▪ User modeling
▪ Remember / reuse recommendations
▪ Anything else?
◼ How to validate a tool such as Hipikat?
▪ Measure performance (precision, recall…)
▪ User study
Discussion
◼
Lessons from validation
▪ Good for finding a similar bug fix
▪ Required to evaluate code and understand how it works
▪ The quality and the number of recommended items
Discussion
◼
Eclipse
▪ A rich history of project artifacts is available
▪ Similar development process with other open source projects
Discussion
◼Accuracy
▪ Approximativeness ▪ Inference algorithms ▪ Recommendations ◼Beyond search
▪ Search of limited functionality on eclipse.org
▪ The high amount of noise
▪ What to search?
Discussion
◼
Role of Task
◼
Collaborative filtering and user modeling
▪ Hipikat: content-based
▪ Collaborative-based techniques
▪ Enhance link identification
▪ E.g. newsgroup postings from authoritative sources
▪ Refined schema-based selection