5.4 Towards a distributed architecture
5.4.3 Distributed Data Storage
This component is responsible for storing, querying and indexing trajectory and PoI data. It is composed by a database management system and a distributed file system that efficiently provides information to the “TripBuilder Engine” component and a distributed data storage to support Stream and Batch layers. The database component contains a well-defined schema to enable flexibility in integrating other data sources. Geo-spatial indexes are used for searching spatial objects, such as PoIs and tourist traces, within a given region (e.g. polygon). The system also takes advantage of indexes over PoI categories and tourist traces, both represented as arrays, to efficiently retrieve relevant PoIs to the user preferences. Moreover, the distributed file system is built by using the Apache Hadoop Distributed Filesystem (HDFS) jointly with HBase and MongoDB. We choose the HDFS technology as it is a mature solution for storing data in distributed environments. As an example, it provides effective and efficient mechanisms to deal with faults thus preventing us to avoid data loss in case of hardware problems.
5.5
Discussion
In this chapter we presented the TripBuilder tool as a web application to create personalized sightseeing tours. We detailed the main components of the systems which include data collection, processing, storage and the tour creation engine. We presented the user friendly interface of the web application and its main functionalities including the creation of the personalized sightseeing tours, and mechanisms to help the users in exploring the city and in sharing the tours with friends from their social network.
We finally presented an augmented version of the TripBuilder architecture, focusing on the modules responsible for collecting and processing data. We detailed how we designed a distributed and scalable architecture by using open-sourced Big Data tools that allows us to distribute stream-
Figure 5.6: Popular places of the city are mined from the collected Flickr photos given important insights for the tourists.
ing and batch computation across several computation nodes in a cloud environment.
The results from Chapters 4 and 5 leverage the following thought: a person usually visit a city in the companions of other ones, like friends, family, etc. This means that in some applications, there is a highly collective appeal. Based on that, we present in the next chapter a framework whose objective is to provide groups by observing the users’ preferences and they are related to each other in the social network.
Figure 5.7: Users can save and retrieve their created tours in order to share them with other users (e.g. friends) that might take advantage of it to plan their visit in the city.
Wikipedia PoI Discovery Photo Discovery Stream Layer City Streams City City Users' Photos Poi Visiting Time Estimation Trajectories Creation Batch Layer Trajectory Split Estimation
Distributed Data Storage
HDFS HDFS HDFS HDFS
Figure 5.8: Layers of the distributed and scalable architecture of TripBuilder for collecting and processing data. City Spout Wiki Bolt Wiki Bolt Wiki Bolt Wiki Bolt HDFS Bolt HDFS Bolt BBox Spout Photo Bolt Photo Bolt Photo Bolt Photo Bolt HDFS Bolt HDFS Bolt
Wikipedia PoI Discovery Photo Discovery
GroupFinder framework for group
formation problem
The work carried out in the area of group recommendation has demonstrated the importance of this problem in real applications due to the need to find relevant and significant items for a group of users instead of individual ones as presented in Chapter 2. The advance of the methodologies has conducted us to a complementary view for group recommendation problem known as the group formation problem in the context of recommender systems presented in Chapter 2. In the group formation problem, the goal is to find or form a group of users considering the users’ recommendations generated by a recommendation system. In this chapter we investigate the research question RQ3: How can we find out the best groups of users (e.g. friends) who can together enjoy a given item?.
This chapter presents a novel framework called GroupFinder that encompasses algorithmic solutions to address a novel group formation problem considering the recommendations of users, as well as the users’ friendship represented by social networks. This chapter is based on the published works [36, 32].
6.1
Introduction
Nowadays, we are witnessing a pervasive use of recommendation systems to support choices in our daily activities, from the most traditional recommendations on books and music, like Amazon1 and Netflix2 discussed in Chapter 2, just to mention well-known examples, to the mobile recom-
mendations of attractions to visit and tour itineraries to follow, like TripAdvisor3, Gogobot4and
TripBuilder5 presented in Chapter 4 and 5. We observe that these last activities are usually better enjoyed with travel companions, thus shifting the problem from recommending a single item to a single user (as typical in the traditional cases) to a new paradigm of recommendation that
1http://www.amazon.com 2 http://www.netflix.com 3http://www.tripadvisor.com 4http://www.gogobot.com 5http://tripbuilder.isti.cnr.it
takes into account a group of users. Traditional recommendation systems primarily focus on iden- tifying relevant items to single individuals using well-known techniques like collaborative filtering [136, 103, 124], or matrix factorization [86]. When the recommendation targets groups of users it is referred to as “group recommendation” and the main goal is to identify items that may have a large consensus among a previously-known group of users [14]. The group recommendation problem is typically hard to solve since a group can be characterized by a diverse mixture of preferences, and finding a trade-off among these preferences may bring to unsatisfactory recommendations for some of the users. In this work we address a complementary perspective of the group recommendation problem. Given a user and a recommended item, we want to suggest the “best” group of friends with whom to enjoy the recommended item. In the TripBuilder system, for example, we want to find a group of friends for the given user with who to enjoy a recommended city, or even a generated sighseeing tour in a city.
Basu Roi et al. has made a great contribution in [15] by investigating the group formation problem from a group recommendation perspective. In spite of that, we believe that an essential characteristic is missing for the group formation: the social networks. In last decade we could see the popularization of social network-based applications and the development of techniques to capture how users are related to each other through interactions (e.g. comments, chat, likes, tags) between them. Therefore, we believe that the social aspect of the users is an essential feature for group formation that may help to find out groups of users that better fulfill the users’ expectations. Consider for example a user who has been recommended to visit Paris: we want to be able to suggest the travel companions who can join her in visiting Paris. Such group should ideally have interest in visiting Paris and also be friend each other to facilitate the staying together. Thus we need to balance the strength of the group internal friendship with the group members interest in traveling to Paris. Considering this last scenario, we design a recommendation technique suggesting the “best” group ofk friends for a pair < user, item > taking into account both the social relations and the preferences of the user and the group. Since this approach focuses on the formation of the group based on an item and a user, we refer to it as User-Item Group Formation problem. In the remaining of the chapter we often refer to it as UI-GF or simply group formation for the sake of readability.
Let us consider the simple example with 7 users and 3 items depicted in Figure 6.1. In this example the items represent destinations that are suggested by a recommender system for tourism. We are interested in finding the best group of 3 users who can enjoy visiting Florence together with useru0. Figure 6.1 (a) reports the relevance score s (ranging from 1 to 5, the higher the value,
greater the relevance) of the cities for each user, while Figure 6.1 (b) shows the social network of user u0 (i.e. her ego network), where links represent friend relationships. A trivial solution
would be choosing the users with the highest relevance scores for Florence: usersu3, u4, andu2.
However, when we look at social relationships the perspective changes: the network in Figure 6.1b shows thatu0’s friendu2is not friend ofu3 andu4. Indeed, a better group ofu0’s friends to enjoy
itemi2 should includeu3, u4and u5, since these three users are all friends each other still having
a good relevance score for Florence. This simple example illustrates the advantage of considering either user-item relevance and the strength of interpersonal relations in a solution addressing the
s u0 u1 u2 u3 u4 u5 u6
Pisa 2 3 1 2 2 1 3
Florence 2 1 4 5 5 2 2
Rome 2 4 3 1 1 3 1
(a) (b)
Figure 6.1: Toy instance of our group formation problem. Table (a) reports the relevance scores of three items for seven users, while the graph in (b) shows the ego network of useru0 having the
same set of users.
group formation problem. To the best of our knowledge this work is the first one that considers both social relations and user-item relevance in group formation.
The main topics covered by this chapter can be summarized as follows:
• we formalize the user-item group formation problem aimed at recommending the best group of friends for a < user, item > pair. We address this novel problem by combining user-item relevance information with the user social network, trying to balance the satisfaction of all the members of the group for theitem with the intra-group relationships;
• we propose two different solutions that are accommodated into a framework called GroupFinder integrating the needed components and information sources;
• we instantiate the problem in the location-based recommendation domain and we experiment GroupFinder on four publicly available Location-Based Social Network (LBSN) datasets, showing that our solution is effective and outperforms strong baselines.
The rest of the chapter is organized as follows. In Section 6.2 we present the formation of the UI-GF problem. Section 6.3 discusses the algorithmic solutions and Section 6.4 describes the components of the GroupFinder framework. The results of the experiments conducted to assess GroupFinder are reported in Section 6.6, whereas in section 6.7 we present a discussion.