Exception Context Representation

5.2 Proposed Approach

5.3.3 Exception Context Representation

In our research, we apply not only the density metrics but also the relevance estimates for main (i.e., noise- free) and relevant content extraction. Each page in the dataset is relevant to a particular exception, and we exploit the technical details (e.g., stack trace and context code) of the corresponding exception for relevance estimation of dierent sections in the page. We analyze the stack trace (e.g., Listing 5.2) and extract dierent tokens such as package name, class name and method name from each of the method call references. We also analyze the context code (e.g., Listing 5.1) of the exception and collect class name and method name tokens. We use Javaparser2 _{for compilable code and an island parser for non-compilable code in order to extract} the tokens [72]. We then combine tokens both from the stack trace and the context code, and append the exception name along with the exception message (i.e., highlighted in Listing 5.2) to the combined set. Thus we get a simplied context for the exception of interest, which is used in text relevance estimation (Section 5.2.2). For example, Listing 5.3 shows the context for an EOFException with stack trace in Listing 5.2 and context code in Listing 5.1.

5.3.4 Performance Metrics

Our proposed approach is aligned with the research areas of information retrieval and recommendation systems, and we thus use a list of performance metrics from those areas in order to evaluate our approach as follows [51, 72, 79]:

Mean Precision (MP): Precision determines the percentage of retrieved content that is expected (i.e., main content, relevant content) from a web page. In our research, we compare the retrieved content by our approach with the manually prepared gold sets. As Sun et al. [79] suggest, we use longest common subsequence of tokens between retrieved content and gold content. Thus precision can be determined as follows, where a refers to the token sequence of retrieved main or relevant content and b refers to that of the corresponding gold content. P = |LCS(a, b)| |a| , M P = PN i=1Pi N (5.9)

Mean Precision (MP) averages the precision measures for all the web pages (N) in the dataset.

Mean Recall (MR): Recall measure determines the percentage of the expected content (i.e., main content, relevant content) of a web page that is retrieved by a system. We calculate the recall of a system as follows: R = |LCS(a, b)| |b| , M R = PN i=1Ri N (5.10)

Mean Recall (MR) averages the recall measures for all the pages (N) in the dataset.

Mean F1-measure (MF): While each of precision and recall focuses on a particular aspect of the performance of a system, F1-measure is a combined and more meaningful metric for evaluation3. We calculate F1-measure from the harmonic mean of precision and recall as follows [79]:

F1= 2 × P × R P + R , M F = PN i=1F1i N (5.11)

Mean F1(MF) averages all such measures.

5.3.5 Experimental Results

We conduct experiments with two datasets main set and relevant set, and extract main (i.e., noise-free) content and relevant content respectively from their pages. We then check those extracted content against the carefully prepared main-gold-set and relevant-gold-set respectively, and evaluate the performance of our approach. Table 5.2 and Table 5.3 summarize the ndings of our evaluation.

Table 5.3: Experimental Results for Dierent Aspects of Page Content

Content Type Score Combination Metric SO Pages (DSO) Non-SO Pages (¬DSO) All Pages (D)

Main Content {Content Density (CTD)} MP 90.89% 88.86% 89.71% MR 89.38% 86.20% 87.53% MF 89.85% 85.75% 87.45% {Content Relevance (CTR)} MP 89.80% 75.40% 81.39% MR 25.66% 37.83% 32.76% MF 33.82% 45.31% 40.53% {Density (CTD), Relevance (CTR)} MP 91.27% 88.90% 89.88% MR 89.27% 86.20% 87.48% MF 90.00% 85.76% 87.53% Relevant Content {Content Density (CTD)} MP 50.91% 49.50% 50.07% MR 91.74% 75.71% 82.18% MF 62.32% 53.76% 57.22% {Content Relevance (CTR)} MP 86.63% 69.17% 76.23% MR 52.17% 57.66% 55.44% MF 61.07% 55.88% 57.98% {Density (CTD), Relevance (CTR)} MP 89.91% 74.12% 80.50% MR 74.90% 80.76% 78.39% MF 80.07% 73.91% 76.40%

Table 5.2: Results of Experiments on Main (i.e., noise-free) and Relevant Content

Content Type Metric SO Pages (DSO) Non-SO Pages (¬DSO) All Pages (D) Main content MP 91.27% 88.90% 89.88% MR 89.27% 86.20% 87.48% MF 90.00% 85.76% 87.53% Relevant content MP 89.91% 74.12% 80.50% MR 74.90% 80.76% 78.39% MF 80.07% 73.91% 76.40%

From Table 5.2, we note that our proposed approach extracts main content with a mean precision of 89.88% and a mean recall of 87.48%, which are highly promising. In the case of main content extraction from a web page, both precision and recall are important, and our approach also performs well in terms of the combined metric, F1-measure (e.g., 87.53%). The main set contains about 41.60% of the pages from StackOverow. During gold set preparation, we notice that the pages from StackOverow (hereby SO) follow a consistent structure with relatively less noise, and relevant content sections are more precise and persistent than those in the pages from other web sites, which are helpful for content extraction. We were interested to check the performance of our approach against three dierent subsets of main setSO Pages, Non-SO Pages and All Pages (Table 5.1). In Table 5.2, we note that the approach performs almost equally well for all the subsets with dierent types (i.e., websites) and dierent sizes (i.e., number of pages), which demonstrates the robustness and generality of our approach.

In case of relevant content, our approach extracts content with a mean precision of 80.50%, a mean recall of 78.39%, and a mean F1-measure of 76.40%, which are also promising. We use the relevant set (Table 5.1) for the experiments, which contains about 40.40% of the pages are from StackOverow. We conduct experiments with dierent subsets of pages, and nd that the approach provides the most precise recommendation of 89.91% with StackOverow pages. The aforementioned scenario of StackOverow might partially help our approach to perform better; however, the approach also recommends with a mean precision of 74.12% with non-StackOverow web pages, which is promising and signicantly better than that of the existing approaches (Table 5.4).

Table 5.3 investigates the eectiveness of the two aspectscontent density and content relevance that we propose for content extraction from a web page. We consider each of those aspects in isolation as well as in combination, and evaluate our approach with dierent sets of web pages for dierent types of extractionmain content and relevant content. In case of content density, the proposed approach performs signicantly well in terms of all three performance metrics for main content extraction with all three subsetsStackOverow pages (DSO), Non-StackOverow pages (¬DSO) and D of main set. However, the metric is found not much eective in case of relevant content extraction with any of the subsets of relevant set, and the approach provides imprecise results. For example, it extracts the relevant content from a web page of any of the three subsets (DSO, ¬DSO and D) with a maximum mean precision of 50.91% and a mean F1-measure of 62.32%. In case of content relevance metric, the proposed approach extracts relevant content from a web page with relatively better precision (e.g., 86.63%); however, the recall rates are poor both in main content and relevant content extraction. On the other hand, when we combine both the density and relevance metrics, we experience signicant improvement in all three performance metrics for both types (e.g., main and relevant) of extraction with each of the sets of web pages. For example, main (i.e., noise-free) and relevant sections of a page are extracted by our approach with a mean F1-measure of 87.53% and 76.40% respectively.

It should be noted that in case of main (i.e., noise-free) content extraction, much improvements in performance are not achieved with the combination of metrics compared to with density metric only. The nding disproves our primary intuition about the eectiveness of relevance metric in main content extraction. How- ever, the nding from the relevant content extraction clearly shows the eectiveness of the combination of density and relevance metrics which is one of our primary objectives of this work.

In document Exploiting Context in Dealing with Programming Errors and Exceptions in the IDE (Page 72-75)