• No results found

Spontaneous Code Recommendation based on Open Source Code Repository

N/A
N/A
Protected

Academic year: 2021

Share "Spontaneous Code Recommendation based on Open Source Code Repository"

Copied!
30
0
0

Loading.... (view fulltext now)

Full text

(1)

Spontaneous Code

Recommendation based on

Open Source Code Repository

Hidehiko Masuhara

[email protected]

Tokyo Tech

joint work with Takuya Watanabe,

Naoya Murakami, Tomoyuki Aotani

(2)

Do you program with Google?

What are you looking for

with Google?

(3)

We can’t remember

all the APIs and usages

• Too many big class libraries

• Non-trivial usages:

what should we write, e.g.,

after created a JFrame?

to format a Date in HH:MM style?

to match a regexp to a string?

to download from an FTP server?

(4)

JavaDoc and Google

aren’t sufficient

• API documents and IDE assistance:

do not tell non-trivial usages

• Keyword-based web search tools:

can show

examples

thought interrupting

how to

format

Date?

think

keywords

type-in

keywords

browse

results

(5)

as you write

in the editor…

similar programs

are displayed

— serve as

examples

Selene: a spontaneous

(6)

Let’s make a window

displaying the current time

12:45

(7)

Key Ideas behind Selene

Entire editing text

as a query

no need to think about keywords

Textual search

fast enough with a large repository

language independent

• A large code repository (2 million files)

(8)

Architectural overview

Client

Keyword

search

Server

frontend

Eclipse plugin

Snippet

selection

keyword

extraction

keyword

extraction

3. snippet

selection

& ranking

1. query

extraction

2. find

similar files

(9)

Snippet selection

editing

program

similar files

cursor pos.

similar fragments

snippets

to display

code after similar farg.

= what to do next

(10)

Research challenges

• Huge design&parameter space

Query extraction: entire text / lines around

cursor / term weighting / ...

Snippet selection: similarity algorithms /

# of displaying snippets & lines

• Code clone problem

(11)

Query extraction &

snippet selection

(12)

Basic algorithm & variability

• Query extraction

extracts keywords in the editing text

weighting by distance from cursor

• Snippet ranking

against the lines above the cursor

compute vector similarity / LCS

weighting by inverse term freq.

• Snippet display

# of snippets x # of lines

comments

how much?

how many?

which?

how much?

how much?

show/remove?

(13)

Parameter optimization thru

mechanical evaluation

• Evaluation method:

randomly

chosen

program

extract

query

Selene

recommended

snippets

recommended

snippets

recommended

snippets

recommended

snippets

recommended

snippets

precision

recall

(14)

How many snippets?

• total 120 lines

for all snippets

• more snippets,

less lines/snippet

• Best:

6 snippets x 18 lines

snippets window

0

5

10

15

20

25

1 2 3 4 5 6 7 8 9

# snippets

re

c

a

ll (

%

)

(15)

overall recall ratio 17.6%

25.3%

Optimal parameters

• Query extraction

extracts keywords in the editing text

weighting by distance from cursor

• Snippet ranking

for each fragment in a window

compute vector similarity / LCS

weighting by inverse term freq.

• Snippet display

# of snippets x # of lines

comments

0.5

0.11

10 lines

LCS

0.2

0.2

6x18

remove comments

(16)
(17)

Problem: Code Clones!

(18)

Problem: Code Clones!

(19)

How to remove clones

• Offline vs

online

Offline: 2 M files (or billion lines)

Online: 50 files (or 100-10,000 lines)

• Clone detection algorithms

Matrix based (cf. CCFinder)

Clustering (e.g.,

k

-means)

Freshness count

requires a

(20)

Freshness count algorithm

74.0

64.0

63.0

60.0

panel, JFrame, setDefaultCloseOperation, WindowConstants, setDefaultCloseOperation, WindowConstants, setSize, JPanel,

TabelLayout

setDefaultCloseOperation, WindowConstants, setSize, JPanel

TableLayout, setPaintBorderLines setDefaultCloseOperation,

TabelPanel, JPanel

TableLayout, setPaintBorderLines, setSize

JPanel, contentPane, BorderLayout setBackground, Color, WHITE,

fieldPanel, TableLayout

similarity

tokens

“known tokens”

(21)

Freshness count algorithm

74.0

64.0

63.0

60.0

panel, JFrame, setDefaultCloseOperation, WindowConstants, setDefaultCloseOperation, WindowConstants, setSize, JPanel,

TabelLayout

setDefaultCloseOperation, WindowConstants, setSize, JPanel

TableLayout, setPaintBorderLines setDefaultCloseOperation,

TabelPanel, JPanel

TableLayout, setPaintBorderLines, setSize

JPanel, contentPane, BorderLayout setBackground, Color, WHITE,

fieldPanel, TableLayout

Freshness

count

remove known

tokens; count

# of remaining

tokens

3

4

5

8

(22)

Freshness count algorithm

74.0

64.0

63.0

60.0

panel, JFrame, setDefaultCloseOperation, WindowConstants, setDefaultCloseOperation, WindowConstants, setSize, JPanel,

TabelLayout

setDefaultCloseOperation, WindowConstants, setSize, JPanel

TableLayout, setPaintBorderLines setDefaultCloseOperation,

TabelPanel, JPanel

TableLayout, setPaintBorderLines, setSize

JPanel, contentPane, BorderLayout setBackground, Color, WHITE,

fieldPanel, TableLayout

show the

snippet with

max. (similarity

+ fresness)

3

4

5

8

77.0

77.0

68.0

68.0

68.0

68.0

68.0

68.0

(23)

Freshness count algorithm

74.0

64.0

63.0

60.0

panel, JFrame, setDefaultCloseOperation, WindowConstants,

setSize, JPanel, TableLayout WindowConstants, setSize, JPanel,setDefaultCloseOperation, TabelLayout

setDefaultCloseOperation, WindowConstants, setSize, JPanel

TableLayout, setPaintBorderLines setDefaultCloseOperation,

TabelPanel, JPanel

TableLayout, setPaintBorderLines, setSize

JPanel, contentPane, BorderLayout setBackground, Color, WHITE,

fieldPanel, TableLayout

add shown

tokens to

“known”;

(24)

Freshness count algorithm

74.0

64.0

63.0

60.0

panel, JFrame, setDefaultCloseOperation, WindowConstants,

setSize, JPanel, TableLayout WindowConstants, setSize, JPanel,setDefaultCloseOperation, TabelLayout

setDefaultCloseOperation, WindowConstants, setSize, JPanel

TableLayout, setPaintBorderLines setDefaultCloseOperation,

TabelPanel, JPanel

TableLayout, setPaintBorderLines, setSize

JPanel, contentPane, BorderLayout setBackground, Color, WHITE,

fieldPanel, TableLayout

1

2

6

77.0

77.0

65.0

65.0

65.0

65.0

66.0

66.0

show the

snippet with

max. (similarity

+ fresness)

(25)
(26)

Result

~ 5% increase in recall ratio

(w/our mechanical evaluation [RSSE’12])

0%

10%

20%

30%

40%

B

u

tl

er'

s

Ra

ndo

m

pr

o

blem

set

recall

Duplication

removal

Original

(27)

Query extraction

(again & ongoing)

• Do lines around cursor

represent the programmer’s thought?

?

extract

query

(28)

Programs have

no linear structure

• Common actions are often refactored

into another method

e.g., GUI component initialization

• ...or into another class (e.g., factory)

• Two classes interact for one concern

(29)

Recently visited locations

programmer’s thought

• We see the refactored code fragments

if it is related to what we want to do

IDE offers many supports

• Hypothesis:

recently visited code fragments are

better than text around cursor

(30)

Conclusion

• Selene is spontaneous code

recommendation tool supported by

a large code repository

fast textual search engine

• Interesting research questions

keyword extraction, code clones,

programmers’ concerns, GUI, ...

References

Related documents

q w e r t y Description Rod cover Head cover Cylinder tube Piston rod Piston Bushing Cushion valve Snap ring Tie rod Tie rod nut Wear rod Rod end nut Back up O ring Rod seal Piston

This Service Level Agreement (SLA or Agreement) document describes the general scope and nature of the services the Company will provide in relation to the System Software (RMS

Named in memory of Conall 6 Fearraigh, a Donegal singer, it was for amhranafocht gan tionlacan nach sean-nos f (unaccompanied singing which is not sean-nos). This was meant to

This is the recurring motto of the unedited treatise Diez privilegios para mujeres preñadas 4 (Ten Privileges for Pregnant Women), written in 1606 by the Spanish physician

In answer to these questions, this paper offers two main reasons for the relative neglect of this motivational component: the first is related to the histori- cal roots of the

Consultation with a wide variety of potential stakeholders, including the public, NHS health and information technology professionals, government departments, the Human

ANGIO-IMMUNOBLASTIC lymphadenopathy (AILD) - a case report by 0 Azizon, NH Hamidah, 0 Ainoon, SK Cheong and KS Phang (abstract).. ANTIBODY responses of dengue fever

Pulverization of the limestone followed by wet separation using dispersion cum settling technique leads to liberation and separation of the clay minerals present in it leading to