Chapter 7: Conclusions and Future Work
7.1. Research summary and contributions
This research has introduced a novel approach to caching queries that demonstrates an efficient way of retrieving previously accessed data without needing any intermediary peer nodes between the data-source peer and the querying peer. Algorithms have been developed to show how the query was reconstructed using the information on cached query. The reconstructed query is built so that it can be used for subsequent query routing at the same peer node. With this approach, the concise query message is ready for direct communication between the querying and data source peers. This new approach to query routing has shown that the use of cached queries can reduce the routing time by reducing the number of messages being passed between peers. The reduction in the number of message passing is proved by the analysis on the reduction number of peers that participated in the query routing has been shown in the previous chapter. Furthermore, our simulation has shown that the query routing process can be done without entirely dependent on super-peer index. Consequently, the processing time for query routing is also reduced.
In order to demonstrate the impact of the proposed approach to super-peer query routing, performance measurement equation is created. The performance measurements consist of several parameters that could be varied to model simulating different routing activities, especially for the message passing through the network. The proposed performance measurement equations are not limited to super-peer networks because the parameters involved can be replaced to simulate various network structures and P2P platform. The use of mathematical based equation is to avoid any bias that generally occurs in some P2P simulator. Normally, P2P simulator is initiated based on specified routing approach that the author intends to prove. Principle of the proposed measurement equation is to identify component(s)
169
involved in query routing tasks. Then, every component is arranged by task according to approaches being compared.
Results of the comparative performance evaluations provide some evidence in identifying the most suitable routing approach for specified peer network connections. Evidence is identified by determining the average cost for each component (of the task). Then, graph is plotted by manipulating all the variables in the equation. Analysis on the impact of the proposed component was analyzed on the plotted graph.
In summary, this research revealed the following: Computer taxonomy for P2P systems.
The proposed taxonomy provides a classification of P2P in which various computer system architectures have been represented in a form of taxonomy. The proposed taxonomy covers the differentiation between architectural structures that lead to different query routing strategies. This contribution is discussed in Chapter 2 and has been published in (Mohamed 2007).
Comparative study of resource discovery mechanisms.
This study has led to a list of challenges to be met in providing the discovery mechanism in P2P networks. Issues relating to the matrix for measuring the costs and benefits when choosing a suitable resource discovery mechanism for a P2P system have been presented in (Mohamed and Satari 2009). In conjunction with the comparative study of resource discovery mechanisms, in Chapter 2 of this thesis, a comparison between several query routing approaches in P2P systems has been done and it is found that query routing need to be directed rather than freely route the query message. The best routing assistance is when it is in locally contained.
The use of a materialized view for query processing in P2P applications.
A feasibility study on the use of materialized views in P2P query processing was explored in Chapter 3, building on ideas that were published in (Mohamed, Basel-Al-Mourad et al. 2006). The materialized view is commonly used in integrated database systems. In the P2P environment, saving query results at the peer or super-peer could lead to obsolete data. Consequently, the ‘materialization’ of data source locations in a database is a ‘view’ and is
170
highly important in an integrated database environment. Thus, a similar concept of materialization of data source locations could also provide benefits in a P2P environment. Therefore, an architectural design for using a materialized data source location for query processing is discussed in Chapter 5, and was previously published in (Mohamed, Buckingham et al. 2007). This feasibility study served as background work to support the use of query caching.
A new query caching mechanism.
This cached query mechanism is a novel approach to query caching over super-peer networks. The design and implementation of query caching to assist the query routing process is discussed in Chapter 4. This leads to the use of query caching in the JXTA platform proposed in (Mohamed and Buckingham 2008), where the query caching mechanism is used to keep the query history that was executed by the local peers. The performance impact of utilizing this query caching mechanism has been tested under different peer-group configurations and messaging patterns.
An amendment of query routing in super-peer networks.
An architectural design for query routing is proposed. The architectural design for the proposed service for pre-processing query routing in a JXTA P2P platform is implemented and evaluated. The implementation uses Java and is piggy-backed onto the JXTA platform for a P2P super-peer network, while the evaluation is done at the client and super-peer levels. The algorithms for the architectural concept are thoroughly described in Chapter 4 and published in (Mohamed and Buckingham 2008), while the implementation of the proposed architecture on JXTA is described in Chapter 5. The implementation of the pre-processed query routing using the query cached list that keeps the information regarding the data source location was also presented in (Mohamed and Buckingham 2010).
A new approach to compare query routing performance evaluations.
The method for collecting query routing performance results revealed a number of performance issues. The experimental results were analyzed and discussed in Chapter 6. The analyses produced an equation that allows a comparative assessment of query routing time. The results of the analyses show the query routing performance is influenced by the cache mechanism and the cache location in the network. Although comparative assessment used the UML sequence diagram to illustrate a super-peer network, the main idea of this comparison is
171
to compare the effect of every component involved in the query routing process without bias towards any P2P platform or P2P simulator. In the sequence diagram, sending a message is represented by an arrow. Message passing occurs between components. Each component represents a peer that is involved in query routing. Message passing represents a step-by-step interaction between peers that illustrates routing behavior. Based on this analysis of routing behavior, a mathematical equation was created. The variables in the mathematical equation represent the parameters involved in query routing or the contributing factors that affect the query routing. The equation was shown to be effective in comparing performances of different routing strategies.