• No results found

4 Methodology

4.1 Integration with Google

4.1.2 Web Scraping

The Google Project Hosting Web interface provided another possible source for Google Partner Bugs information. Bug information on the website is displayed in two ways: a list of all bugs in the project, and a page of detailed information for each individual bug. Our bug synchronization service used web scraping techniques to utilize both data formats. More information about web scraping techniques can be found in Section 3.3.

4.1.2.1 Google Authentication

The chrome-os-partner project on Google Project Hosting is a private project with a specific list of users who have permission to view it. Our bug synchronization service provided user credentials in order to retrieve information from the site. The bug synchronization service email address, “bug- [email protected]”, was added to the list of permitted users for the chrome-os-partner project. When the service ran, it used the Requests module [78] to start a session and get the Google Login page. The service then filled in the Email and Password fields and submitted the login form. The “Session” object in the Requests module allows certain parameters to be persisted across requests [77]. We used the Session object to create an authenticated session that persisted our login credentials. All requests made to Google during bug synchronization used the authenticated session.

4.1.2.2 Retrieving Comma Separated Value Files

The Google Project Hosting interface provided a view of a list of all bugs in the project, which could be filtered based on search criteria. The interface also provided a request URL to retrieve Comma Separated Value (CSV) files of all bugs in the list. The request URL contained two major parts: the query string, and the specified columns string. The query string specifies search criteria entered by the user. Only bugs that meet the search criteria are included in the CSV. The specified columns string specifies

41 which fields to include in the CSV. All CSVs included the following values by default: ID, Pri, M,

ReleaseBlock, Cr, Status, Owner, Summary, AllLabels. We could customize this request URL and use it to

request the CSV files directly from our bug synchronization service. Our service queried for all open bugs in the chrome-os-partner project. In addition to the default columns, we included the ‘TimeModified’ field, which provided the last time the bug had been modified. We also included the ‘Proj’ field, which provided the project the bug corresponded with.

Google Partner Bugs truncated each CSV file to 100 bugs. The ‘start’ parameter could be provided to the request URL to specify at which row to start in the CSV. If the parameter was omitted, the default value was 1. The first CSV would contain bugs numbered 1 to 100. To retrieve the next set of 100 bugs, we set the ‘start’ value equal to 101, and so on. We continued this process until we retrieved all bugs matching the given search criteria.

This process was used to retrieve a CSV file of all recently closed bugs in the chrome-os-partner project as well. We looked specifically for any bugs in the project that had been closed within the last two days. By including recently closed bugs, we ensured that any final changes up until being closed were captured and shown on the corresponding NVIDIA bug.

4.1.2.3 Synchronizing the Bugs

Once we retrieved the list of all open bugs in addition to the recently closed bugs in the chrome-

os-partner project, we needed to determine which bugs had been modified since the last time they had

been synchronized. A simple option would have been to run the service on a specified interval and use that interval to determine our assumed last modified time, for example once every hour. If the bug synchronization service runs once per hour, it would look for bugs that were modified in the past hour. However, this approach would not suffice for our system. Each bug that has new updates gets saved individually. Therefore, some bugs could successfully be saved, while others may fail, all within the same synchronization period. The service needed to know when each individual bug had last been successfully synced. The GetSyncAudit method in the Partner Integration API was made available for this purpose (Section 4.2.3.3). It provided an audit of all changes to a bug made specifically by our bug

synchronization service.

Using the GetSyncAudit method, we could get the last time our synchronization service successfully updated an NVIDIA bug. If the bug had never been synchronized, we would create a new NVIDIA bug. If the bug had previously been synchronized, we compared the last time it had been

42 synchronized to the last time the corresponding Google Partner Bug had been modified. If the Google Partner Bug had been modified more recently, we then fetched all available information on the bug via web scraping (Section 3.3). We used this information to update the corresponding NVIDIA bug. A complete list of information retrieved can be found in Appendix C.

Closed bugs could have one of several statuses: Assigned, Duplicate, Moved, Verified, or

WontFix. Bugs with the status Moved would no longer be in the chrome-os-partner database which

meant that we would no longer have access to their corresponding web pages. We updated Moved bugs with only the basic information found in the CSV.

Related documents