• No results found

Hipikat: Recommending Pertinent Software Development Artifacts

N/A
N/A
Protected

Academic year: 2020

Share "Hipikat: Recommending Pertinent Software Development Artifacts"

Copied!
38
0
0

Loading.... (view fulltext now)

Full text

(1)

Hipikat: Recommending Pertinent

Software Development Artifacts

Davor Čubranić and Gail C. Murphy ICSE 2003

(2)

Hipikat?

A tool

▪ Recovering an development group memory which

is not recorded.

▪ Recommending artifacts related with new comer's task

(3)

Motivation

A large , varied amount of information of an

existing software project

Complicated!

Mentoring? Not possible in open-source

projects

How new comer joining an existing software

project can be productive more quickly?

(4)

Goals

Recommending relevant artifacts to be useful for

newcomer’s tasks

– Artifacts

• The versions of the source, the bugs, archived electronic communication (mailing lists or newsgroups)

• No explicit relationship between artifacts

An implicit group memory

Validating if Hipikat is useful for this

– Inferring links between the artifacts?

– Suggesting relevant parts of the group memory for a newcomer’s task (mentoring)

(5)

Outline

Understand open-source project

Approach

About Hipikat

How to validate

(6)

Understand Open-source projects

Four electronic artifacts

▪ Revision control system

▪ E.g. CVS source repository

▪ Issue-tracking system

▪ E.g. Bugzilla

▪ Communication channels

▪ E.g. Mailing lists, newsgroups

▪ Online documentation

▪ E.g. Reference manual, programming guides

(7)

Understand Open-source Projects

Example: Eclipse.org

▪ CVS

▪ Bugzilla

▪ A web site with documentation

▪ Wiki: Sharing knowledge

▪ Public newsgroups and mailing lists

(8)

Wow!

Really large and varied

information!

How to recommend useful

artifacts to newcomers?

(9)

Approach

Two parts to recommend artifacts

Form the implicit group memory from artifacts and communications

Present to the developer artifacts selected from the

implicit group memory

Hipikat!

(10)

Approach – How possible?

There is an implicit group memory

(11)

Approach – How possible?

mailing lists newsgroups BugZilla CVS Design Documents Project members

Implicit group

memory!

(12)

Approach – How possible?

But…

Missing artifacts or links not contained in the linkage schema?

▪ How?

▪ Reverse engineering for missing artifacts

▪ E.g. design documents generated from source code

▪ Using project artifacts and meta-information

▪ E.g. using check-in comment of CVS referencing the relevant change of bugs reports.

▪ E.g. comparing the revision time with the closing time of a bug report

(13)

Approach – Yes possible!

Seems to be possible to

form

the implicit

(14)

Approach

Hipikat, how to present?

▪ Three distinct functions of Hipikat

Identification

▪ Implicit group memory with inference of missing links and artifacts

Selection of relevant artifacts by queries

Update of the implicit group memory

▪ Additions and changes?

(15)

Hipikat prototype

Eclipse plug-in

A client-server system

Client? For what?

▪ Request for suggestions

▪ Three parameters for a request

▪ Users: for future extensions

▪ Artifact (Artifact type and its identifier, e.g. CVS revision#)

▪ Additional description (optional)

▪ Display results from a server

Server?

(16)

Hipikat Client

How to query in Eclipse

▪ E.g.

▪ Selecting a class in a Java package browser

▪ “Query Hipikat” from a pop-up context menu

Display results

▪ Recommendations grouped by artifact type and selection criteria

▪ Manage recommendation lists

▪ Delete or move items

▪ For future use to rate recommendation items

(17)
(18)

Hipikat Server

Implantation of:

▪ Update function

▪ Identification function

(19)
(20)

Hipikat Server

Update module

▪ Four sub modules for BugZilla, CVS, newsgroups and the web site.

▪ Monitoring each artifact

▪ Insert data from new and changed artifacts into DB

▪ Metadata (CVS revision, the author, check-in time,…)

▪ Text (bug description or check-in comments) for indexing

(21)
(22)

Hipikat Server

Identification

▪ Four sub modules

1. Check-in comment matcher (log-matcher)

▪ Infer ‘implements’ link

2. Check-in time matcher (activity-matcher)

▪ Monitor all activities in BugZilla

Look for check-in close to the activities in BugZilla by time (within six hours)

(23)

Hipikat Server

•Check-in comment matcher •Check-in time matcher

•Text similarity matcher •Newsgroup thread matcher

(24)

Hipikat Server

Identification

▪ Four sub modules (Contd.)

3. Text similarity matcher

▪ Indexing the text of new artifacts by turning each artifact into a document vector, measuring:

The term’s global weight (overall importance)

The term’s local weight (frequency of terms in each document) log-entropy combination

▪ Use of a standard information retrieval vector-space cosine similarity measure

5. Newsgroup thread matcher

(25)

Log-entropy combination

The number of times term i occurs in document j

The number of documents

The number of documents containing term i

(26)

Hipikat Server

Selection

▪ By following links from the artifact specified in client’s request

▪ Sub modules for making recommendations

▪ By artifact types and their links

(27)

Hipikat Server

How to implement

▪ As a web application with Tomcat

▪ SOAP (Simple Object Access Protocol) for communication

(28)

How to vaidate

Initial Qualitative Study

▪ Questions?

▪ Helpful to developers working on a change task?

▪ Used recommendations?

▪ Required recommendations not suggested by Hipikat?

▪ Building the implicit group memory of a medium-sized software system, AVID, manually

▪ Mock-up client

▪ Giving an assignment to students to make two changes to AVID with mock-up client or other tools (Rigi, Chava, jRM)

(29)

How to vaidate

Initial Qualitative Study

▪ Results

▪ Possible to make Reasonable suggestions

▪ Suggestions were useful to developers

▪ Based on the results, Hipikat prototype for Eclipse was implemented

▪ Two enhancements

▪ The return of reason for recommendation to a user ▪ Making suggestions based on ‘a query’ from the user

(30)

How to vaidate

Case Study: An Eclipse Change Task

▪ Change task

Automatic update of Eclipse’s CVS repository browser when a

new version is created. ▪ Use of Hipikat

▪ Open the Bugzilla item describing this change in Eclipse

▪ A Hipikat query from the context menu of the opened item

▪ Choose the most relevant item in Hipikat search result window (Bug 11419)

“Automatically add version to repository view when tagging”

▪ Another Hipikat query for Bug 11419

▪ Choose the CVS revision of Bug 11419 and request ‘a diff’ of it

▪ Apply changes based on the fix of Bug 11419

Request a Diff Query Hipikat

(31)

Advantages

◼ Good for finding a similar bug fix and relevant artifacts

◼ Tell me similar to what I’m doing

◼ Applicable to other software projects

◼ The development process of eclipse similar with other open source projects

(32)

Limitations

Hipikat is only text-based

▪ Text similarity and use of meta-data

Accuracy of recommendations by Hipikat?

▪ Activity-matcher

▪ Low confidence in a link when the distance is over five minutes.

Validation is not enough

(33)

Discussion

◼ How to improve identification of links between

artifacts?

▪ Collaborative filtering: e.g.

▪ High priority of newsgroup articles posted by developers

▪ User modeling

▪ Remember / reuse recommendations

▪ Anything else?

◼ How to validate a tool such as Hipikat?

▪ Measure performance (precision, recall…)

▪ User study

(34)
(35)

Discussion

Lessons from validation

▪ Good for finding a similar bug fix

▪ Required to evaluate code and understand how it works

▪ The quality and the number of recommended items

(36)

Discussion

Eclipse

▪ A rich history of project artifacts is available

▪ Similar development process with other open source projects

(37)

Discussion

Accuracy

▪ Approximativeness ▪ Inference algorithms ▪ Recommendations ◼

Beyond search

▪ Search of limited functionality on eclipse.org

▪ The high amount of noise

▪ What to search?

(38)

Discussion

Role of Task

Collaborative filtering and user modeling

▪ Hipikat: content-based

▪ Collaborative-based techniques

▪ Enhance link identification

▪ E.g. newsgroup postings from authoritative sources

▪ Refined schema-based selection

References

Related documents

Configuration items include at least the following: safety analysis and requirements; software specification and design documents; software source code modules; test plans and

Should we look back to the preliminary analysis in section 2.3 we briefly mention that the studied ASP.NET project used a script for copying the source code files (for this project

activity (resp. design activity) as it is proposed in the Unified Process, then retro- engineering the design (resp. the code) of a project performed last year

The development in a project like Freenet entails intense discussions on the software development e-mail list and the ongoing authoring and submission of source code as shown by

Figure 3: Single Java class to be generated It is now the task of the students to analyse the source code for parts that represent recurrent concepts for all classes and,

Typically, the source code for each module for be stored in its own file-say, files A, B If a programmer develops an alternative design that requires changes to several modules,

KEYWORDS: Functional Safety, ISO 26262, Model-Based Design, Reference Workflow, Verification and Validation of Models and Generated Code, Simulink, Embedded Coder,

a.  MANIFEST folder Groovy code used for User Interface logic b.  doc folder Location of generated APP documentation c.