• No results found

4.2 Design and Implementation of Domain Feature Extraction Phase

4.2.1 Populating the Semantic Knowledgebase

The aim of populating the knowledgebase is to construct semantically structured information about the problem domain, which is considered valuable for the process of opinion mining at domain feature level. Thus, for each movie review, we populate the movie-review knowledgebase with the relevant ground facts (movie’s name, released date, running time, country and language; movie’s stars, directors, writers, editors, cinematographers, producers, etc.) that are gathered from public data sets such as DBpedia and Internet Movie Database.

As the problem domain is the movie domain, we chose to benefit from Internet Movie Database in addition to DBpedia in terms of gathering names of movie’s stars only. This is because relying on DBpedia as the only resource for gathering movie’s stars may not be sufficient sometimes. This is due to the fact that DBpedia depends on Wikipedia info box as the main resource for Resource Description Framework; and according to our observation, Wikipedia Info box includes only the top main names of movie’s stars, whereas Internet Movie Database contains all names of stars for each movie. Moreover, as in our research, gathering ground facts from DBpedia for a specific movie requires the Uniform Resource Identifiers (i.e. a key to search for any resource in any knowledgebase over the World Wide Web) that we obtained via Google Search Engine and Wikipedia website. We noticed that Google Search Engine sometimes does not return results for some movie reviews that contain movie titles that are written in a format which is different to the way is saved in Wikipedia website. For example, the title of a movie called “THE ADDICTION_1995” sometimes is written in the review as “ADDICTION, THE 1995”, whereas, according to our observation, Internet Movie Database provides advanced search tools that can retrieve the name of the movie even with different format title.

The population process in general is based on extracting a movie’s title from a review, then the relevant ground facts about this movie (movie’s name, released date, running time, country and language; movie’s stars, directors, writers, editors, cinematographers and producers) are gathered from DBpedia and Internet Movie Database resources. The process was performed automatically by following the illustrated steps in Algorithm 1.

37 Algorithm 1 Knowledgebase Population

Input:

Reviews R, movie-review Knowledgebase 1. Do for i=1:R,

2. MovieName=Extract ( Review[i] ) 3. /* Populating via DBpedia*/

4. MovieWikiURI=Search (MovieName)

5. MovieDBpediaURI=MovieWikiURI.Replace(http://en.wikipedia.org, “http://dbpedia.org/resource”)

6. MovieGroundFacts=Retrieve (MovieDBpediaURI)

7. movie-review Knowledgebase =Insert (MovieGroundFacts) 8. /* Populating via Internet Movie Database */

9. MovieIMD-URI=Search (MovieName) 10. Movie’sStars=Retrieve (MovieIMD-URI)

11. movie-review Knowledgebase =Insert (Movie’sStars) 12. End for

Output: Populated movie-review Knowledgebase

Regarding gathering ground facts from DBpedia, steps 2-7 in the above algorithm are executed, which is based on obtaining the target movie’s URI (i.e. Uniform Resource Identifiers) in DBpedia knowledgebase by searching for the Wikipedia page of the target movie (i.e. movie’s Wiki-URI), and replacing the first part of movie’s Wiki-URI with DBpedia URI “http://dbpedia.org/resource”. For example, the Wiki-URI for THE ADDITION 1995 movie is “https://en.wikipedia.org/wiki/The_Addiction_1995” will be changed to “http://dbpedia.org/resource/The_Addiction_1995” and this the DBpedia URI for the target movie.

After that, the obtained DBpedia URI for the target movie, it is used to retrieve from the DBpedia knowledgebase the ground facts about the target movie and inserting them into the movie-review knowledgebase. The retrieving and inserting steps are performed together via composed SPARQL Construct queries as shown in Figure 4.2. SPARQL Construct query is a language that is used to perform semantic queries over semantic knowledgebase where the retrieved data is stored in Resource Description Framework (Prud and Seaborne 2006).

38

Figure 4.2 Example of sparql construct query

Although movie reviews are collected from the crowd-sourced data that provides extensive information with a high level of accuracy, it is likely that some movie reviews may contain incorrect information due to human error. For example, THE ADDICTION_ (1995) movie sometimes is written in the review as “ADDICTION, THE”. Therefore, for disambiguation, the extracted title is inserted into the movie-review knowledgebase in addition to movie’s name that is retrieved from the DBpedia knowledgebase.

Regarding gathering ground facts from Internet Movie Database website, steps 9-11 are performed. The obtained results from this step are names of stars, which they are retrieved from Internet Movie Database page source of the target movie. The obtained list of star names were injected into the movie-review knowledgebase using SPARQL Construct queries. Figure 4.3 presents a snapshot of the populated semantic information about THE ADDICTION (1995) movie into movie-review knowledgebase. prefix owl:<http://www.movie-review-ontology.owl#> prefix dbpedia-owl:<http://dbpedia.org/ontology/> prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#> prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> prefix dbpprop:<http://dbpedia.org/property/>

CONSTRUCT { ?subject owl:movie_Title ?name . ?subject rdfs:label ?label .

?subject rdfs:label “ADDICTION,THE (1995)”.

?subject rdf:type owl:Movie .

?subject owl:hasLanguage ?language . ?subject owl:hasCountry ?country . ?subject owl:has_Starring ?star . ?subject owl:has_Writer ?writer . ?subject owl:directed_by ?director . ?subject owl:edited_by ?editor. } WHERE { VALUES ?subject

{<http://dbpedia.org/resource/The_Addiction_1995>} ?subject a dbpedia-owl:Film.

OPTIONAL {?subject rdfs:label ?label.} OPTIONAL {?subject dbpprop:name ?name.}

OPTIONAL {?subject dbpprop:language ?language.} OPTIONAL {?subject dbpprop:country ?country.} OPTIONAL {?subject dbpedia-owl:starring ?star .} OPTIONAL {?subject dbpedia-owl:writer ?writer .} OPTIONAL {?subject dbpedia-owl:editing ?editor .} OPTIONAL {?subject dbpedia-owl:director ?director . } }

39

Figure 4.3 A snapshot of populated semantic information into movie-review knowledgebase about the addiction movie

4.2.2

Pre-processing the Domain Reviews Using Natural