A novel reputation-based model for e-commerce

(1)

O R I G I N A L P A P E R

A novel reputation-based model for e-commerce

Panayotis Fouliras

Received: 12 October 2010 / Revised: 1 March 2011 / Accepted: 21 April 2011

Springer-Verlag 2011

Abstract Reputation systems are very important in e-markets, where they help

buyers to decide whether to purchase a product. Since a higher reputation often represents higher profit, malicious users may try to deceive them to increase their reputation. This may be aggravated through fear of retaliation to bad ratings or collusion of multiple users so that they can perform a fraud of high value before being denounced by their victims. In this paper we provide a new reputation management system (RMS) which prevents retaliation and minimizes fake ratings due to collusion. Furthermore, it combines the price and the sales track record of the seller with the charges he has already paid to the system. In this way, it becomes too expensive both in terms of time and money for the malicious users who may collude to launch an organized attack discrediting an e-commerce platform. We test our proposals with several simulation scenarios, verifying the validity of our claims.

Keywords E-commerceReputationCollusion Ratings Performance

Simulation

1 Introduction

With the growth of the Internet and the World Wide Web, e-commerce has evolved at an explosive rate over the last decade, allowing the trade of a vast variety of goods and services. Accordingly, many related services have sprung in support of e-commerce. Nevertheless, the basic goal remains the same: to complete a transaction without the need for the buyer, seller and often the object of the transaction to come into physical contact with each other.

P. Fouliras (&)

Department of Applied Informatics, University of Macedonia, 156 Egnatia str, 54006 Thessaloniki, Greece

e-mail: [email protected]

(2)

Still, e-commerce is problematic: buyers want privacy to perform transactions without revealing details about their identity or other details, while typically need to know the reputation of the seller they intend to buy from. Reputation is nothing but a combination of the opinions of other people who have had experience with the particular seller. Nevertheless, in a vast marketplace, where the real identity of the participants is hidden this is not easy to achieve. Previous buyers may be truthful and accurate, inaccurate or even malicious, deliberately providing false opinions for sellers. A malicious seller may even force a buyer to provide a positive opinion stating that he will otherwise retaliate providing a negative opinion about the buyer. It is also possible to set up multiple bogus user accounts to organize such actions. This fact is more prevalent in the case of Business-to-Consumer (B2C) commerce, where there are many customers with little or no relation at all among themselves and with diverse requirements.

In this paper we address these issues by proposing several novel but simple ideas, that can diminish retaliation, improve the rating mechanism and enhance reputation calculation so that malicious behaviour can be easily decreased. Furthermore, since collusion for organized attacks cannot be completely eradicated, we propose a mechanism which makes such attacks particularly expensive to launch.

The remainder of this paper is organized as follows. In Sect.2, we present a survey of existing e-commerce models. This is followed by related work on reputation and rating systems in Sect.3. We present our proposals in Sect.4. This is followed by an evaluation of our proposed reputation management system in Sect.5, with our conclusions in the last section.

2 Related work

Commerce refers to the exchange of goods and services all the way from the point of production to the point of consumption; e-commerce is the same process, performed over electronic means of communication such as the Internet. Typically, there are three participants, namely the buyer, the seller and the goods or services being traded from both the seller and the buyer. Since the seller normally pays with money, which can be in electronic form and thus its associated transfer cost and time needed is minimal, we only need to examine the nature of what the seller provides. If the latter is a service of object which can also be transferred electronically (e.g., an e-book, software, etc.), then the cost both in terms of money and time is again minimal. However, if it is not in an electronic form then the package and shipping costs, as well as the time necessary to carry it to the buyer’s premises can be considerable.

There are many e-commerce companies which have established very profitable business since e-commerce traders (Amazon2010) and (eBay2010) pioneered this field more than a decade ago. In the case of eBay, there is no single seller; hence the return policy is more complex depending on the individual seller, enhanced by the respective refund for the whole transaction through their resolution centre— especially for non-delivered of inaccurately advertised goods.

(3)

All the above can play an important role in the overall buyer’s satisfaction and related costs to the seller, but they are of importance after the buyer has shown particular interest in buying the particular good or service. To achieve this, the buyer must first feel confident about the item as well as the terms of the transaction and the integrity of the seller. At the same time, a typical user often requires privacy for e-commerce transactions: others may find what he has bought or sold, his reputation and other information, but not his real name or other sensitive details. Consequently, privacy combined with a seller’s reputation is a big concern for buyers prior to placing an order or making a payment. Hence, we need to investigate the issues of reputation and trust.

Not all researchers agree on the definition of trust and reputation. According to Wang and Lin (2008) who examined eBay, trust is the degree by which a target object (such as software, a device, a server, or any data they deliver) is considered secure. It is also the probability by which partyiexpects that another party jwill perform a given action. A simple but incompetent trust management system could let service providers selectively victimize customers (for example, by providing good services for many low-value transactions but deceiving customers in high-value transactions for large, dishonest profit) and trust management authorities (for example, by launching collusion attacks in trust evaluations). All such attacks will lead to service quality degradation and monetary loss among customers. Thus, the e-commerce industry must have effective trust management.

The potential buyer can also opt to use separate, recommender sites, which often publish reviews on particular goods or even service providers (Lenzini et al.2010). However, this is an additional cost—at least in terms of time—for the average buyer and often does not provide the necessary information for the trustworthiness of a seller. Some e-commerce systems, like eBay, use a mutual rating system—after a buyer rates a seller, the seller can rate the buyer as well. Such a service gains a good reputation after it has accumulated good quality over a long time period, usually determined by customer ratings. According to Wang and Vassileva (2007) reputation is a subjective assessment of a characteristic or attribute one entity ascribes to another based on observations or past experiences. According to their study on eBay’s reputation system, suggest that almost 99% of raters in the system are honest and give providers positive ratings.

Nevertheless, a mutual rating system could cause buyers to worry about receiving bad ratings if they themselves give a seller a bad rating. A seller, who has the ‘‘last move’’, can punish a buyer who leaves negative feedback; the seller may respond with a negative feedback of his own about the buyer (Feigenbaum et al. 2009). Hence, to compute the final reputation value correctly, studies on relationships among raters and ratees are necessary, and might help reduce the rating noise and obtain more objective trust results.

Artz and Gil (2007) point out that reputation is an assessment based on the history of interactions with or observations of an entity, either directly with the evaluator (personal experience) or as reported by others (recommendations or third party verification). Furthermore, they state that a trust decision can be a transitive process. E.g., one might trust a book and its author because of the publisher, and the publisher may be trusted only because of the recommendation of a friend.

(4)

But the biggest difference between reputation and trust is that reputation is an objective concept, and that trust is a subjective concept. We can use one sentence to describe them: ‘‘I trust you because you have good reputation; I trust you despite your bad reputation’’ (Jøsang et al.2007).

Given the importance of reputation, it is of no surprise that a substantial body of research work has appeared recently, studying not only the various aspects of reputation, but also many related problems such as malevolent behaviour and rating systems. Hughes et al. (2008) even argued that Point-to-Point (P2P) research is really an interdisciplinary endeavour and needs to address issues that may fall outside the traditional core of (Internet Service) IS research. This may mean that IS researchers need to collaborate more with colleagues from other disciplines when working on the larger issues of P2P network environments. We examine all this in the following section.

3 Reputation and rating systems

Since reputation is so important and typically there are many sellers to select from, it is not surprising that there have been many proposals to quantify it, so that a comparative evaluation of the sellers is possible. Our presentation takes into consideration the following aspects of a Reputation Management System (RMS):

• The entity or entities that maintain and provide the rating scores of the users.

• Effectiveness and reliability. This is important in the case of malicious users who follow particular strategies to achieve higher ratings.

• Egalitarian. Whether new sellers get a chance to achieve adequately high ratings so that buyers may trust them.

• Usability. The rating system is neither complex for the RMS to calculate and maintain, nor difficult for the users to understand it.

Ratings are typically maintained and provided by a single, centralized entity. Nevertheless, it is also possible to maintain them in a decentralized manner, something especially useful in P2P systems, where malicious users may provide different files from the real ones. Decentralized reputation systems do not make use of a central repository to process reputation ratings and are considered safer (Gudes et al. 2009). Here, both the reputation of users and the ratings they give may be stored locally and known only to the corresponding user. The challenge is to compute reputation while maintaining private data. In general, a centralized approach for an RMS has the advantage that there is no added complexity regarding reputation data emanating from various nodes dynamically.

The actual combination of the individual ratings in respect of the effectiveness of an RMS is the focus of substantial research work. A reputation system is a type of collaborative filtering algorithm which attempts to determine ratings for a collection of entities, given a collection of opinions that those entities hold about each other (Remondino and Boella2010).

In the case of (Amazon2010), there is no need for rating the seller, since it is Amazon itself (except for the Amazon marketplace where other sellers may also sell

(5)

used goods and are rated by customers). Instead there are rate-based reviews (1–5 stars) for books and other items on sale. As an enhancement, the system allows potential buyers to rate the reviews themselves, thus offering a ranking of these reviews according to their perceived usefulness. This is useful when there are many reviews and the potential buyer wants to read them selectively. To avoid malpractice only registered users can participate and each reviewer’s identity (either the real name or the registered one) appears together with the capability to read all his reviews on diverse goods sold.

In the case of eBay after each transaction, a buyer can provide feedback to the system about the seller’s service quality that can be positive, neutral, or negative. eBay stores this rating at a centralized management location. It calculates the feedback score S =P - N, where P the number of positive ratings and N the number of negative ratings left by customers. eBay displays the S value on the seller’s Web page. Another valueR=(P-N)/(P?N), where (1CRC 0) is the positive feedback rate, based on which eBay will reward the seller as a power seller ifRC98% (the current threshold). eBay also provides a table with a seller’s rating data for the past 12 months, divided into the most recent 1, 6, and 12 month columns. Thus, eBay’s mechanism is fairly simple and supplies raw data to buyers for their own judgment (Wang and Lin2008).

This mechanism is also used for a seller to rate a buyer after a transaction both to allow the evaluation of a buyer, but also to present a reputation record in the case that the latter assumes the role of a seller at some point. Separate aggregated ratings are kept for the two possible roles (i.e., seller and buyer). Several studies (e.g., Sundaresan2007) show that buyers provide ratings about sellers 51.7% of the time, and sellers provide ratings about buyers 60.6% of the time. Of all ratings provided, less than 1% is negative, less than 0.5% is neutral and about 99% is positive. It was also found that there is a high correlation between buyer and seller ratings, suggesting that there is a degree of reciprocation of positive ratings and retaliation of negative ratings or even fear of retaliation, thus not properly reflecting the trustworthiness of the respective parties.

Apart from the issues above, there are further problems that may arise from various strategies that malicious users may adopt.

A typical attack occurs when a malicious seller obtains a good reputation by selling low-cost products, and later begins to deceive buyers by selling expensive products, possibly disappearing (Wang and Lin2008). In general, it is possible for a malicious user to display fraudulent behaviour until the RMS clearly evaluates him as such; then he simply assumes a new identity with no ill track record. This is called whitewashing and an RMS should make some effort to differentiate among genuine new users and old ones who try to pass as new ones. A related issue is the reputation-bootstrapping problem where a new user does not have many chances for being selected for a transaction due to his inexistent track record (Malik and Bouguettaya2009).

A combination of the above attacks in P2P networks is the so-called Sybil attack (Marmol and Perez2010):

(6)

• A user joins the community with a new identity, looking like being a trustworthy peer.

• At a certain moment (probably after gaining some reputation in the system) this user becomes malicious, thus obtaining a greater self-profit.

• Once the system has detected his behavioural change and has identified him as a malicious participant, this user leaves the network.

• Finally, he generates a new identity and enters again the system, repeating the process indefinitely.

Collusion is another, coordinated form of attack, described in the same paper. A collusion consists of several malicious nodes (providing a bad service or performing tasks inadequately) joining in order to increase their reputation values by fake rating themselves and, on the other hand, decrease the reputation of current benevolent peers by giving negative recommendations about the latter.

There are even variants of this attack, like a set of nodes providing good services but rating positively other malicious peers and negatively other benevolent ones. Those models which do not distinguish between the reliability of a user when providing a service or carrying out a task and when supplying recommendations cannot effectively tackle this attack.

Eigentrust (Kamvar et al.2003) is pointed out as an example of a system that can successfully withstand such types of attack. It is based on the notion of transitive trust: If a peeritrusts any peerj, it would also trust the peers trusted byj. Each peer

icalculates the local trust value for all peers that have provided it with authentic or inauthentic downloads, based on the satisfactory or unsatisfactory transactions that it has had (Remondino and Boella2010). It also provides each peer in the network with a unique global trust value based on the peer’s history of uploads and thus aims to reduce the number of inauthentic files in a P2P network. Knots is another recent example of this approach, enhanced in a recent work (Gudes et al.2009). The main problems of EigenTrust are summarized as follows (Tang2009):

• The precondition of iteration convergence is unreasonable, since no guarantee for convergence is provided.

• It lacks in punishment for bad behaviour.

• Iteration brings about enormous network traffic in terms of exchanged messages.

• It does not bring security into consideration.

SepRep (Chen et al.2008) is a similar system for the P2P environment. A peer keeps a reputation value for every other peer. Every time a peer X has received a service from peer Y, X updates the reputation value of Y based on the quality of this service. In the course of gathering reputation values from the P2P network, some malicious peers may collude to provide fraud recommendations for the purpose of increasing someone’s reputation or decreasing someone’s reputation adversely. To resolve this issue, SepRep deploys an auxiliary trust system that measures the trustworthiness of peers when propagating reputation values. Hence, there is a different weight of reported reputation ratings, depending on the local reputation of the reporting peer. However, this approach requires each peer to keep a vast amount of information to function properly.

(7)

Gutowska and Sloane (2010) present a similar proposal with reputation calculated as the weighted average of compulsory reputation and (if available) optional reputation. Time is also taken into consideration, with more recent ratings considered more important and valued higher comparing to the older ones. In addition, the context of price, i.e. the maximum price of sold goods/services in the marketplace, to which the reputation system is applied is taken into consideration. The combination of transaction history and amount as the principal factors for reputation ratings were first proposed in Wang et al. (2008), using a complex set of evaluation functions.

Cormier and Tran (2009) focus on the problem of initializing autonomous agents which are to be used as advisors from whom to solicit reputation rankings. They propose two new parameters: agent lifetime and total transaction count. Agent lifetime is defined as the amount of time than an agent has been part of a multi-agent system. The number of transactions during this period is then defined as the transaction count, representing the agent’s activity in the system. A short lifetime is an indication of a malicious agent if combined with a high number of transactions. Also, an active agent for a long time is considered a senior agent, whose recommendation carries more weight. Each time a new agent enters the system gathers or is provided with a collection of existing agents which it ranks according to the parameters above, thus building an initial list of advisors.

The authors performed simulation to evaluate their proposal. Each agent buys at each simulation step depending upon his activity level. There are five such types ranging from Constant (always buy) to Very Low (buy every 10th simulation step). If the initial decision is to buy, the buyer firsts receives reputation values from its advisors about the seller in question and compares it against its own trust threshold, thus either deciding to buy or avoid doing so. The raw rating it provides to the seller is a discrete value in the range [0, 10], depending on its type. In their simulation they assumed 100 sellers, 1,000 time steps and 50 buyers initially, adding 25 new buyers at every 100 time steps (a total of 300 buyers in the end). They assumed 15% of bad sellers, 15% of good, 60% of average and 10% of erratic ones. Advisors were distributed as honest at 70%, with the rest of the three types, one of them malicious, at 10% each.

The simulation results compare the case of selecting 10 advisors at random with the case of selecting 10 advisors according to the authors’ proposal. These results show that under the first case the percentage of good sellers selected ranges from 60% to a peak of 70%, before settling at approximately 55%. Under their proposal the percentage of good sellers selected ranges from 85% to a peak of 95% before settling at slightly under 80%. As the authors point out, the results show that their proposal is vulnerable to collusion by agents who conspire for a long time to bloat their transaction counts.

GDRep (Tang2009) is another reputation calculation mechanism designed for P2P e-commerce. The main idea is that individual users may be combined into dynamic groups. Reputation is calculated not only for individual peers (users), but also for groups; each peer is inclined to join the group of higher reputation through an application mechanism and the application is granted on the base of several criteria. There is a central peer in each group which performs the inspection of

(8)

existing peers at periodic inspection cycles, dismissing peers who no longer satisfy the minimum requirements for group membership, and new peers’ applications.

Simulation involved a total population of 1,000 peers with 50 as the maximum group size, under two scenarios conducted 3 times each. The results show that GDRep is twice as better than EigenTrust in terms of the reputation attributed to colluding users when 30% of the peers are malicious, providing conspiring evaluation, or have a trading strategy that allows for ‘good’ behaviour for low price range transactions.

Nevertheless, this approach has many potential problems: The amount of exchanged messages necessary for it to operate is still quite large, and the central peer needs to select an ‘air’ in his place before leaving a group or quitting altogether (i.e., no provision in the case of central peer failure). Also the simulation scenarios were run only 3 times each.

Li et al. (2010) propose a P2P e-commerce reputation model based on fuzzy logic, with multiple factors taken into consideration. These include reward and punishment for good and malicious users, respectively, and calculation of reputation based on both direct and indirect way (i.e., also based on the opinions of others with the respective weight), as well as the decay of a rating over time.

In order to validate the proposal, a simulation is performed with 100 users, 10 of which are malicious. The model is simulated for 12 successive transactions in three different experiments with the claim that the proposal is superior to that of EigenTrust. Nevertheless, they do not provide more details regarding the actual behaviour of the colluding malicious users, since a negative rating for a malicious user at an early time, can be easily identified and punished in a realistic RMS. For example, colluding users could behave as ‘good’ ones for many transactions in order to build an exceptional reputation before performing a fraudulent action at the highest possible price range.

Wu (2010) studies Taobao, an e-commerce system popular in the Chinese market with a rating system similar to eBay’s. It stores the buyer’s rating after each transaction at a centralized management location and calculates the feedback score via S=P - N, where P is the number of positive ratings and N the number of negative ratings. It displays it on the seller’s web page, together with his positive feedback rateR=(P-N)/(P?N), where 1CRC0. The author identifies cloud computing systems as major computing entities in the future and proposes the investigation of cloud-based trust services, but provides nothing more than a general framework without case studies or evaluation to justify his proposal.

Wu et al. (2010) propose six trust factors as metrics towards trust/reputation evaluation systems: the sum of ratings provided by the participants in each transaction, the ratio of positive ratings over the sum of all ratings for each participant, the credibility of the feedback by each participant, the transaction price for each transaction, the decay of time of a transaction, and the ‘savviness’ of the participants. The third and last factors are important but difficult to calculate as the authors acknowledge. The credibility of the feedback by each participant is, therefore, approximated by the current trust value of the participant, provided by the auction website to all the participants. The last factor essentially puts more weight on the ratings offered by ‘experts’. Since the participants do not appear by their real

(9)

names, ‘experts’ are automatically considered as those users who have participated in many transactions within the same price range. Therefore, a ‘savviness vector’ is maintained for each participant denoting the number of products the participant has bought or sold in each price range. The ‘savviness vectors’ of participants involving in a new transaction are thus matched (i.e., the seller’s and each of the buyers’) so that ratings from participants with similar savviness carry more weight.

Several of the metrics proposed need a long time for effective data to be gathered, which the authors estimate from one to two years. The simulation performed involves 1,000 participants divided into five groups: cheat, indignant, normal, nice and elegant, plus 3 more malicious participants who collude every five transactions to increase their ratings. The amount of participants in each group was chosen randomly with probability 2.5, 7.5, 80, 7.5 and 2.5%, respectively. Simulation results are then compared to those in the eBay reputation system, concluding that the proposed scheme is better in providing indications for malicious participants, especially if run for a long period. The drawbacks in this study are that the malicious participants constitute a low percentage of the total population (2.5–10%), the group of colluding participants is very small (just three), and the data required together with the complexity involved are rather high.

Deviating from the P2P paradigm Du et al. (2009), present an e-commerce trust model called ECTMPR, which is based on the notion of perceived risk. The latter is the result of combining six factors: user’s direct trust degree, recommendation trust degree (i.e., trust dependent on else’s recommendation whom the user trusts), the trust degree of the recommender in reference to the user, the value of the transaction, the sum of previous ratings and the time since the establishment of the seller.

The authors present the results of the comparative simulation of their proposal, together with Average, Bate and eBay models, in relation to the percentage of malicious users in the overall user population of 200 users, in which they show that their model is better than the rest, apart from eBay where the successful transaction percentage is approximately the same. The probability of a user performing a transaction during any simulation run is fixed at the typical value of 80%. Unfortunately, the authors do not present many details, such as the models they used to simulate the other policies; moreover the number of simulation cycles is very small.

In another paper (Dong et al. 2009), the authors propose PTME (Probability-based Trust Management model for E-commerce). The main idea is for a buyerbto calculate the trust value for a sellers, using the weighted combination of opinions of others together with the opinion ofbfrom past experience from direct transactions withb (if they exist), but in a probabilistic manner. The sale price and the age of past evaluation (‘forgetting’ factor) are also incorporated in the proposal. Essentially, it is a combination of direct and indirect reputation where not only the probability for the trust value, but also its variance are calculated and taken into consideration. The latter is used because a good seller with few transactions may present the same trust value as a malicious seller who has many good and some bad transactions, but the variance differs in these two cases. The authors present a small summary from a numerical analysis claiming that their proposal defeats collusion

(10)

and false accusations (ratings) attacks, after approximately 25 transactions, with no further details.

A quantitative model based on game theory has also been proposed (Alpcan et al.

2010). The model formalizes community interactions in the context of trust in online environment. Other models based on game theory do exist, but this one takes into account community influences and interactions explicitly in non-cooperative game setting. The main characteristic is a cost function used by the participating agents in order to calculate the trust for each active participant. The main factors quantified in this cost function are the timidness of each agent with an opinion, the peer pressure on the agent (i.e., the deviation from the mean value of others’ opinions) and how steadfast an agent is regarding his initial opinion. The main goal is to achieve a Nash equilibrium as a solution, in the sense that no agent has any incentive for modifying his own opinion for a particular seller while the other agents keep their opinion fixed. A short algorithm together with a faster variant are proposed that achieve the Nash equilibrium.

The proposal is validated with a numerical analysis in a game model with 20 agents with scenarios that allow each of the factors in the aforementioned cost function to vary. As such, there is little interest since the population is small and the scenarios involved do not take into consideration malicious users or collusion.

Chang and Wong (2010) propose a mult-attribute reputation management (MARM) support tool to assist users when using auction sites. The factors they use are: category similarity, value of the traded good, time delay (newly obtained reputation records are of higher weight than older ones), and credibility (different weight on the rating provided by different users). As the authors admit, it is difficult to obtain an appropriate value for credibility in sites with many users, since the users considered more credible by someone may have had no prior transactions with the seller in question.

The authors present a prototype of their proposal and an comparative evaluation using data from eBay for the ‘Lenovo/IBM’ class of goods, in November 2009, with values between 100 and 500. They also used the respective data from the ‘Digital Cameras/Canon’. In total, 1,000 random buyer–seller pairs were chosen for their study and the respective transaction details analysed and then compared with two earlier versions (‘BIN’ and ‘DET’) of their proposal and another (‘FuzzyRep’) based on fuzzy logic. Although the results show that ‘MARM’ outperforms ‘BIN’, it is only marginally better than ‘DET’ and ‘FuzzyRep’ (approximately 7%). Also, the cases where ‘MARM’ is reported as wrong range from 14.7–22.2%.

An interesting study of an existing e-commerce system published recently (Maranzato et al.2010) with a novel approach in the problem of reputation relative to attacks by malicious users. The authors base their approach on a detailed study (June 2007–July 2008) on real transactions in a popular Brazilian e-commerce system (TodaOferta), together with lengthy interviews of specialists in fraud detection in this system.

TodaOferta offers uses a rating system similar to eBay’s with three possible values per transaction: 1 (positive), 0 (neutral) and -1 (negative). However, it considers a rating from the same user only once to prohibit malicious users from colluding to offer false high ratings. Furthermore, it employs the escrow feature,

(11)

under which the seller will receive his payment only after the buyer has received the item bought or after 15 days.

As a result of their study, the authors identified a set of characteristics that define fraudulent users with high probability and used it to rank sellers accordingly. As a further enhancement, they propose the usage of logistic regression producing a better ranking of the sellers according to the probability of fraudulent behaviour. They report that they validated their results using the same market place, and that the latter enhancement increased the precision of their results by 112%, by achieving a 55% detection of fraudsters with 100% precision.

Nevertheless, they state that most of the work of the fraud specialists is about reaction to denunciations from affected buyers and they do not provide important details such as the total number of users in the system, computational aspects to gather and process the plethora of data required and the estimated cost in the case of fraudulent transactions for both buyers and malicious sellers.

In general, trust and reputation management in e-commerce will remain challenging for some time because the issues and solutions surrounding it are so complex. Not only do technical solutions require effectiveness and efficiency, but we must take into account cultural, psychological, and social factors and their impact on such evaluations (Wang and Lin2008).

4 Novel proposals in this work

We make several novel proposals in this work. The first proposal has to do with the rating submission process, where we propose that ratings remain hidden until either both parties submit their rating to the system or a certain time elapses. The second proposal has to do with the maximum number of ratings allowed per buyer for the same seller, so that collusion or retaliation can be minimized. We also propose a novel, yet simple RMS, which depends more on RMS charges. Moreover, we present simulation results in order to evaluate our RMS.

4.1 Hidden ratings with simultaneous revelation

As shown earlier, current seller ratings may not be as sincere by buyers from fear of retaliation. In eBay, for example, there are cases of sellers who openly state that they will provide a positive rating to buyers, provided the buyer has done the same first. Although they do not state that they will provide a negative rating (leaving room for a possible neutral rating), this is some form of tweaking with the rating rules.

In a recent paper (Gudes et al.2009) the authors propose to compute reputation in a distributed manner while preserving private information. The degree of trust (and therefore the actual rating by any single buyer) is considered private information. We believe that this approach is unnecessarily complex since it involves keeping various levels of trusts (e.g., experts) and a wide range of information that must be maintained and used to calculate reputation in a distributed manner.

(12)

Instead, we propose that the rating a buyer or a seller provides for the other party on a specific transaction should be kept private, until both parties have submitted their ratings. In this case, the seller cannot retaliate, since he does not know in advance the rating he has been given by the buyer. Of course, the seller may consider not providing any rating for the buyer. For this reason, we extend our idea into keeping a rating private either until the other party provides a reciprocal one, or a certain fixed time period determined by the e-commerce system has elapsed (e.g., 30 days). Under this scheme, the other party loses the right for submitting a rating and thus cannot retaliate. Needless to say that if the pre-specified time period elapses without any of the parties concerned to have submitted a rating, no update of the respective reputation ratings should take place. Also, if payment is completed through the same system, then a buyer’s rating is not taken into consideration unless he pays within a predetermined amount of time.

This proposal is simple and easy to implement. Furthermore, it forces a ‘lazy’ seller to provide his rating for the respective buyer, since the seller’s reputation score cannot be updated before both parties provide a rating or the specific time period has elapsed. Moreover, it encourages buyers to be more sincere in their ratings.

Can our proposal be used by malicious (or multiple new accounts of whitewashed) buyers to organize an attack on a seller? We first remark that if the transaction is satisfactory the seller must normally provide a positive rating; otherwise a negative one. This may damage the reputation of the buyer(s), but is not as damaging as the reputation of the seller. Then, there is always the option for the seller who is under attack to sell only to buyers with a certain minimum level of reputation. This is common practice at eBay today. Finally, the reputation calculation should also be modified. We propose a novel calculation scheme below. 4.2 Accepting specific ratings for reputation calculation

The reputation strategy adopted by an e-commerce platform is particularly important for its success. This is also true for other networks such as p2p as shown by the volume and quality of research work in this field, some of which was presented earlier. From this large body of work we can summarize the vital aspects for a successful RMS:

• Recent ratings should carry more weight than older ones.

• A good sales record for low price range transactions should not carry the full weight when the seller sells an item at a higher price range. We assume that a higher price range represents a higher profit for the seller, hence a stronger incentive to turn malicious.

• Entities must be long lived; ratings should reflect the life of an entity.

• When a party reaches a certain thresholdmof malicious incidents the reputation metric should be reduced to the minimum (Gutowska and Sloane2010).

• The amount of details for each transaction should be kept to a minimum, reducing the possibility of ‘information explosion’.

• The raw rating given by an entity to the associated party should be simple and easy to understand (e.g., the 3-value system of eBay).

(13)

• The calculation of the reputation of each user after a completed transaction should be as simple as possible to perform and understand. This reduces the possibility of dissatisfaction since the rules are well known and their application straightforward.

Apart from taking all the above into consideration, we propose that all ratings between the same parties (one for each separate transaction) are recorded, but at most two (at random) from the recent ones (within time periodd) are taken into consideration. This is similar to the scheme in TodaOferta, but the difference is that from these two ratings at most one negative rating can be taken into consideration, if such a negative rating exists. It is thus possible that if after a series of successful transactions the seller provides bad service, the buyer can give him a negative rating. However, the number of negative ratings taken into consideration from the same buyer are limited to at most one. The rationale behind this arrangement is that if a buyer provides a negative rating for a seller, it would be at least suspicious to attempt again to buy from the same seller and provide an additional negative rating. Also, once a rating is submitted, it cannot be changed by the rater.

The possible values for a single rating are those used at eBay, namely ‘positive’, ‘neutral’ and ‘negative’. This reputation score is calculated once every time a new transaction takes place or a certain time thresholdhelapses (e.g., 1 month). As an example, let us assume that the ‘recent period’dcorresponds to the last 12 months. Then a seller with many different buyers will typically be awarded a higher reputation score compared to one who had the same amount and type of individual ratings, but from fewer buyers, provided he receives the same amount of positive ratings.

Intuitively, one might tend to trust more a seller who has built a reputation by selling to more people, than selling to few. Our approach makes cooperation between malicious users (or fake accounts created for this purpose by a malicious seller) more difficult; if a large group of entities deliberately provides good ratings to a seller, it will be more costly to provide him with a very good reputation. It also inhibits such entities to deliberately give bad ratings to other sellers in order to destroy their reputation.

Of course, a malicious user may set up a plethora of accounts to use them as attacking agents. Hence, our proposal should also take whitewashing into consideration. As stated earlier the typical defence of an established, reputed seller is to sell only to reputed buyers. If we combine it with our proposal we see that it becomes more costly to the attacker to be successful.

To make it even more difficult for the malicious user(s), we augment our proposal by demanding a registration fee for each new user, refundable after he pays for one or more transactions within the specified timed. This is similar to the one existing at eBay (registration fee, refundable after the first transaction). By carefully selecting the registration fee, the average new user would not be annoyed, but it could cost more to a malicious one who tries to manifest him with multiple accounts. Intuitively, we believe that this approach is better than other, more complex approaches where the rating provided by a user with higher reputation carries more weight compared to the one provided by another with lower reputation.

(14)

Another important defence against such acts is the existence of a ‘Resolution Center’ as in eBay. If one of the two parties in a transaction provides a negative rating, he must specify the reason(s) for this rating. If the other party does not accept that rating, he can always take the matter to the ‘Resolution Center’. This can be used so that unfounded or even deliberate negative ratings can be annulled and the reputation be calculated again without taking the particular rating into consideration.

However, most of these issues cannot be answered in a satisfactory way unless an evaluation is performed with different scenarios. We attempt to answer these in the evaluation section below.

4.3 Formulating our proposals for reputation calculation

As stated in the previous section, we propose the use of a simple 3-value rating system as in eBay, namely ‘negative’, ‘neutral’ and ‘positive’, corresponding to the arithmetic values [-1, 0, 1]. Each party in a transaction can place his rating, which remains hidden until either the other party provides his or a pre-specified time period (threshold)helapses. We propose to be normally set at 30 days, which is an ample period of time for an item to be shipped all over the world. In this context, it is possible for a transaction to produce two, one or even no ratings at all. The ratings cannot be changed once provided. This is to avoid any negative feedback extortion, where one party maliciously demands something in return for not providing a negative rating. Furthermore, the party which gives a negative rating must justify it. This is important for both feedback reasons, as well as for a ‘Resolution Center’ to examine such cases in order to identify any cases with malicious behavior. If it is finally decided that a rating was wrongfully given, the ‘Resolution Center’ can strike it out and update the respective reputation score as if that rating was never given.

We also take a different approach from other researchers: we assume that it is not possible to eliminate fraudulent behaviour. Hence, our objective is not only to make it more difficult with our RMS, but also more costly through transaction charges. In the case of eBay there is a scaled charge to the seller, decreasing with the actual sale price. As an example, the charges to a seller for a sale worth 1,500 $US is at slightly above 4%. We, therefore, combine our RMS with seller charges to make it very hard for collusion attacks. Intuitively, a seller may get a negative rating for several types of actions, but this is not as severe if occurring sparingly compared to a denunciation for not mailing an expensive item and disappearing altogether. By clearing the buyer’s funds only after a certain period and a careful selection of seller charges, the RMS can more than gain enough funds to compensate for such cases. Our proposed RMS keeps a record of all transactions and ratings for at least a predetermined period d (here we assume 12 months), which it is long enough to accumulate a sizeable amount of transaction history for a seller. We also propose the following:

• There are Qreputation scores calculated, representing each of the price ranges available. As we can see in the evaluation section we normally assumeQ=3,

(15)

representing the price range: (a) 1–10 units, (b) 11–100 units, (c) 101–1,000 units.

• For each seller: Get all ratings in the last period dat most two from the same rater, of which at least one is negative (if one exists), but no more than one negative from the same rater. This selection gets updated: (a) at the beginning of each month, (b) when a new transaction rating appears, (c) when the ‘Resolution Center’ annuls a related rating. This is in contrast to eBay’s where multiple ratings from the same buyer count as a single one for transactions performed within a single week.

• LetMthe total amount of malicious incidents for the same entity within the last periodd. We define a certain thresholdm, which if exceeded (i.e.,M[m), the user account is given a special value of-Max to indicate that this entity should no longer be trusted in general. Obviously, the RMS administrator may decide to impose more severe penalties or even suspend/delete that account. Otherwise, it is still possible that some risky (or malicious) buyer may nevertheless decide to perform a transaction with him.

The actual reputation score for each entity is in fact a vector containing 2*Qentries. The firstQshow the reputation score for each price range as a seller and the latterQ as a buyer. That is because a user may assume both roles in an e-commerce platform, something that the RMS must support. We now focus on the calculation of the reputation score for a seller.

The RMS should calculate a different reputation score, depending on the price range of a new sale. In order to encourage a transaction with a new seller at a particular price range, however, we propose that all reputation scores appear; the RMS should clearly mark them with the price range they refer to, as well as additional information on request by the prospective buyer (the total number of past transactions, positive, negative, etc.). In this way, the buyer is provided with the right information, but depending on his profile, he may opt to risk buying from a seller with low reputation.

We also note that with our proposal there is no need for any special arrangement regarding the ‘age’ of ratings; all ratings older thand are automatically discarded, while the rest are given equal weight. This may seem inappropriate at first. As an example let us consider a seller who has been active only for one month at the beginning of a year and sells nothing for almost 11 months. Should his reputation score be of equal weight to that of another seller who has a similar total gross of sales, but spanning the whole of the same year? In the latter case we have a seller who has performed some sales during the last month. We believe that there are certain cases where a non-even distribution of sales is quite normal within a calendar year. For example, if the seller sells leather products his sales are typically few if at all during the summer; supply and financial problems may be other sources of inactivity.

The discerning reader may note another issue, pertinent to our proposal, namely the possibility of two sellers who present the same amount of ratings, but from a different amount of transactions, since we only take into consideration up to two ratings from each buyer to the same seller (and vice versa). For example, it is

(16)

possible for seller A to accumulate 100 ratings from 50 buyers, whereas seller B accumulates 100 ratings from 100 buyers, all within the same, valid period. An evaluation coming from a lot of users is, on average, more reliable than the one coming from few. On the other hand, a new seller is bound to have fewer separate buyers; and it is also possible to have two ‘power’ sellers with excellent reputation, with the first one having sold to 600 buyers and the second to 500 separate buyers. It is also possible that a group of malicious users cheat by appearing as having many successful transactions among them in order to increase their reputation. What weight should these figures carry into the reputation calculation (if any)? We propose an approach that takes these cases into consideration as well as the possibility of malicious users who collude to improve the reputation of one or more malicious sellers. More specifically, let us assume:

• cithe maximum RMS charge for each transaction at price leveli 2[1, Q].

• sthe new sale price at price levelk, wherek2[2,Q].

• max_pricei, the maximum price at price leveli.

• repkthe old reputation score at price level k.

• wri,the weighted sum or ratings at price leveli.

• new_repi the new reputation score regarding the new sale at price s, which belongs to price leveli.

From the discussion above the main reputation score (recall that the reputation scores for each of the other price ranges also appear to assist the buyer) should be a combination of the reputation built from the previous price range and the one from sales at the current price range, including the risk that the RMS charges paid by the seller so far may not cover a single sale at the present price range:

new repi¼ciwriþci repi1s ð1Þ this means that the reputation score is the sum of all ratings for transactions in the price rangei, with up to two ratings allowed to emanate from the same user over periodd, subtracting the pricesof the new item under sale. Hence, a seller needs to have paid a significant amount to the RMS in order to acquire an adequate reputation score. The only exception exists for the lowest price range, where relation (1a) becomes:

new rep1¼wr1 ð2Þ The reason for (2) is that the lowest price range represents a very low risk in terms of possible compensation by the RMS system to affected buyers. Therefore, we can allow for a more relaxed reputation score calculation in order to attract sellers who would otherwise be hesitant to participate. The charges of the RMS do play a very important role as a form a deposit that can be used by the platform to compensate the buyer with most or all of the amount of a sale in case of a problematic seller. The problem with (1), however, is that initially the reputation score for price rangei[1 will depend exclusively on the reputation based on the lower price ranges, since no sales at the current price range have occurred. This is also a handicap for the new sellers compared to the established ones. We need this reputation to increase adequately fast for a new seller to build a reasonable reputation score so that he is practically allowed to sell at price rangei. But we also

(17)

need to protect the RMS from malicious sellers who build a good reputation score with little cost based only on the lower price ranges. At the same time, something based only on RMS charges may deter sellers from using the system. Therefore, we introduce two more parameters based on the time a seller has registered with the system, namelya andb. Neither of them is used at the lowest price range.

Parameterais a factor that attempts to promote the reputation gained from lower price range sales taking time into consideration. A seller who has been long enough with the system is given a boost, so that he can present an adequate reputation to sell at a higher price range. At the same time, after a certain amount of timeabecomes constant so that very old sellers do not have an advantage over other old sellers. Also a hasty malicious seller has to wait for a significant amount of time, increasing his reputation, before commencing his intended activity. Using a we get the following equation fori[1:

new repi¼ciwriþa ci1repi1s ð3Þ still, this does not prohibit collusion. A group of malicious users may collude to perform fake transactions at a lower price range in order to increase their ratings. To address this issue, we introduce parameterb, ending into the following relation:

new repi¼ciwriþmin max½ð pricei bÞ;ðða ci1repi1Þ sÞ ð4Þ .

Effectively, relation (4) denotes that a seller at price rangei[1 cannot expect to increase his reputation by more thanbof the maximum price at that range simply by selling at lower price ranges. Hence, lower price range sales cannot contribute to more than a factor ofbtowards the final score at the current price range.

Of course, bothaandb are parameters that must be set by the RMS manager, depending on how strict or loose a policy is desired; a more strict policy intuitively reduces malicious incidents but deters new sellers to participate, whereas a more loose policy attracts new sellers with the higher risk of malicious incidents. We suggestb =0.10.

The case forais more complex. Sincearepresents a boost for the old seller up to a certain point, it cannot be constant; it must also increase up to a certain maximum value amax relative to the expected number of sales per seller. Consequently,

assumingns the total number of sellers in the system,nbthe number of potential buyers,pbthe probability that a buyer considers buying something each day,pr1the

probability he opts for an item at the lowest price range (the most common one),

h=180 days andd =360 days, we suggest the following value fora:

amax¼ns=ðnb pb pr1Þ ð5aÞ a¼ 1:0; forth amaxt1_d ; forhtd amax; fort[d 8 < : ð5bÞ

we must also stress that the reputation scores calculated above are raw reputation scores; it is up to the e-commerce platform to present them in an appropriate way for the potential buyer. For example, one simple way would be to present the triplet of

(18)

reputation scores (per seller) after normalizing them according to the minimum price at that price range. Assuming the three price ranges we have used for experimentation, a raw reputation score value of 500 at the top price range would be represented as 5. The reputation score calculation for a buyer is different, since there is normally no charge for each successful transaction. Furthermore, a buyer has to pay in full before his rating is taken into consideration. Hence, it is more difficult to gain directly from a malicious behaviour compared to a seller. Therefore, ifSiis the total number of transactions within the last periodd, for each of theQprice range classes, then the reputation score for each price range class for a buyer is simply the respective aggregated average:

buyer new repi¼

Sum of all ratings in classi

Si

ð6Þ

.

5 RMS evaluation

In order to evaluate our reputation calculation proposal we performed simulations based on several scenarios. We focused only on the reputation calculation for sellers, since it is their overall performance (and reputation) that is the main factor for the success of an e-commerce system. After all, an honest seller wants a safe environment where dishonest sellers are pointed out and punished for their behaviour, allowing for healthy competition, which obviously is desirable for a typical buyer, too.

We assume a periodd=360 days (one year) and we run each simulation for a period of 1,080 days (3 years), setting the time threshold h for ratings to appear during the same day of the transaction for simplicity (instead of 30 days at most). We also assume that each seller has many items for sale in each of theQprice range classes withQ=3. Therefore, no seller is removed from consideration by a buyer solely on the basis of not selling a good at a certain price range.

Each buyer may choose to buy an item at a specific price range with different probabilitypri. We setpr1=0.5,pr2=0.4 andpr3=0.1, reflecting the fact that it

is more probable for the average buyer to buy something ‘cheap’ more often, compared to buying something at the highest price range. Of course, the fact that a buyer decides at first to buy something at a certain price range does not mean that he does this every day or that he finds what he wants. We reflect this by assuming that a buyer decides, in principle, to perform up to one transaction per day for a single item, with a probabilitypb=0.20—similar to what is assumed in (Gutowska and Sloane2010). Therefore, in our simulation, at every single day there arepb*pr3*nb

potential transactions, on average, for items at the highest price range, prior to reputation consideration, where nb the number of buyers. For example, with

nb=400, we can expect 40, 32 and 8 potential transactions, on average, for the respective price ranges during each day. The initial reputation score for all parties is 0 and a seller sells to any buyer without restriction.

Once a buyer decides to initiate a transaction with a certain seller (at random), bases his final decision on the reputation score of the seller which obviously

(19)

depends on the actual item price. As in (Gutowska and Sloane 2010) a buyer examines 20 sellers at random, comparing their reputation scores and finally selects a seller according to the following rules:

• A buyer always prefers the seller with the highest reputation score, provided it is a positive number.

• If more than one seller appears with the same score, then one of the sellers is selected at random.

• If none of the above sellers has a positive reputation score, then one is selected at random among those with the highest reputation score, with a probabilitypr. This is the probability for a buyer to be willing to take a risk and is different for each price range. We set this probability to 0.20, 0.10 and 0.05 for the lowest, medium and highest price ranges, respectively.

5.1 Scenario 1: The ‘‘Good’’ one

Here we assume all buyers and sellers are honest (although they do not know it). There arenb=400 buyers andns=200 sellers, all registered during the first day and all following the same pattern as described above. Since this is the ‘‘good’’ scenario, there are no malicious users and all transactions are completed without a problem. Therefore, for each completed transaction the buyer rewards the respective seller with a positive rating.

This scenario is useful because it can be used as the reference point by which other, more realistic ones, may be compared. Furthermore, it provides a measure of the RMS convergence over time, allowing for possible fine tuning of additional parameters like RMS charges,a andb, turning the system towards more strict or relaxed direction.

In Fig.1a, we see the average reputation scores for all sellers and the respective score for the top seller over a total period of 3 years for the lowest price range on a per month basis. For comparison, we also plot the highest reputation score achieved by the top seller at the end of the 3 year period. We also present the respective results for the medium and high price range, in Fig.1b, c. Note that the top seller for each price range is not necessarily the same.

We observe that the reputation score for the two highest price ranges is initially negative as expected, passing point 0 after nine months for the medium and after one year for the highest price range. Furthermore, we observe that for all price ranges, the average seller score levels after 12 months for the low price range and after approximately 18 months in the top two price ranges. Hence, a new seller cannot expect to gain quickly the same reputation as the old ones, until a significant amount of time elapses.

5.2 Scenario 2: The ‘‘Bad’’ one

This is similar to the previous scenario, but designed to test the validity of our proposal in the case of the most organized of attacks, namely a group of users (buyers and sellers) colluding to defraud the system. There are now m% of

(20)

malicious sellers and buyers, up to a total of 20 malicious sellers (always up to a 10% of the total seller population). The malicious buyers collude with the malicious sellers in a way more difficult to identify. More specifically, the malicious sellers behave in an optimal way, trying to increase their reputation. The malicious buyers behave as the rest of the buyers with the exception that they only select to buy among the malicious sellers (at random).

With this scenario we try to determine what happens up to a possible point of a serious incident by a malicious seller (e.g., he receives the money without sending the goods). Therefore, we associate the reputation score gained by the malicious sellers over time with the cost in terms of RMS charges they had to pay up to that point.

We try this scenario form=10, 20 and 30%. Therefore, in the worst case there are 120 malicious buyers, colluding with the 20 malicious sellers. The results are depicted in Figs.2,3and4, for the respective price ranges.

Figure2SelAs we can see, in all cases the top malicious seller manages a higher reputation score than the top good seller, who nevertheless manages to gain a reputation score close to the total average of all sellers, including the malicious ones. Furthermore, there is no further advantage to top malicious user after reaching to the top of his reputation after the 14th month.

0.00 20.00 40.00 60.00 80.00 100.00 120.00 140.00 160.00 180.00 200.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 Month Reputation Score Avg Top -100.00 -50.00 0.00 50.00 100.00 150.00 200.00 250.00 300.00 350.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 Month Reputation Score Avg Top -1000.00 -500.00 0.00 500.00 1000.00 1500.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 Month Reputation Score Avg Top (a) (b) (c)

Fig. 1 Average and top seller reputation score over 3 years—Low price range:aLow price range,

(21)

The results presented in the Figs.2,3, and4above show several important points in the case of colluding malicious users:

• A malicious seller cannot expect to gain more in terms of reputation score after a certain point (e.g., after the 18th month for the highest price range with

m=30% of users being malicious).

• A malicious seller cannot expect to gain a top reputation quickly with ‘whitewashing’ (by starting a new account).

• On average, most of the colluding malicious sellers do not gain a higher reputation score compared to the best ‘good’ seller.

• The average malicious seller takes a long time before reaching a positive reputation at the highest price range (between the 9th and the 10th month). Hence, our proposal does deter fraudulent behaviour for the average case, in terms of reputation score.

Nevertheless, there are two possible cases for a collusion attack to be successful at the highest price range, where the expected gains are higher after a fraud:

• In the first case, the top malicious seller may perform a fraud immediately after gaining a positive reputation at the highest price range, hoping that the respective funds will be cleared for one or two of these before his account is suspended. 0.00 50.00 100.00 150.00 200.00 250.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 Month Reputation Score Average MaliciousTop GoodTop -100.00 -50.00 0.00 50.00 100.00 150.00 200.00 250.00 300.00 350.00 400.00 450.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 Month Reputation Score Average MaliciousTop GoodTop (a) (b) (c) -1000.00 -500.00 0.00 500.00 1000.00 1500.00 2000.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 Month Reputation Average MaliciousTop GoodTop

Fig. 2 Seller reputation scores over 3 years, withm=10%:aLow price range,bMedium price range,

(22)

• In the second case, any malicious seller may perform a fraud immediately after gaining a positive reputation at the highest price range.

In order to appreciate the result from such attacks, we need the respective simulation results that show the RMS charges over the colluding malicious users. For this purpose, we need to sum all the malicious seller charges since they act (and have to pay) together to achieve their purpose. In our simulation we have assumed a flat 2% RMS charge for each sale imposed on each seller, which is less than half than the one in the case of eBay. Furthermore, we assume no other expenses for real sales to good users from malicious sellers who try to build their reputation. The results are depicted in Fig.5 and they should be combined with the respective reputation scores presented above.

The first observation from Fig.5 is that the malicious sellers’ charges increase almost linearly as time passes. Therefore, to commit a fraud successfully they need to strike at an earlier time. Let us reconsider the two possible attacks we outlined earlier:

• The top malicious seller performs a fraud immediately after gaining a positive reputation. This happens at the end of third month for the highest price range. At this point the malicious sellers’ cost is 1,573, 3,138 and 4,588 form=10, 20

0.00 50.00 100.00 150.00 200.00 250.00 300.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 Month Reputation Score Average MaliciousTop GoodTop -100.00 0.00 100.00 200.00 300.00 400.00 500.00 600.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 Month Reputation Score Average MaliciousTop GoodTop (a) (c) (b) -1000 -500 0 500 1000 1500 2000 2500 3000 3500 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 Month Reputation Score Average MaliciousTop GoodTop

(23)

and 30%, respectively. Hence, the top malicious seller has to achieve from two to five successful frauds (successful in the sense that the funds are cleared and collected by him) at the highest possible price (1,000 units), to be marginally successful. However, this is very difficult to achieve since the respective funds will be blocked after the first denunciation.

• Any malicious seller attempts a fraud as soon as his reputation becomes positive at the highest price range and a genuine buyer buys from him. By looking at the

0.00 50.00 100.00 150.00 200.00 250.00 300.00 350.00 400.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 Month Reputation Score Average MaliciousTop GoodTop -100.00 0.00 100.00 200.00 300.00 400.00 500.00 600.00 700.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 Month Reputation Score Average MaliciousTop GoodTop (a) (b) (c) -1000.00 0.00 1000.00 2000.00 3000.00 4000.00 5000.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 Month Reputation Score Average MaliciousTop GoodTop

cSeller reputation scores over 3 years, withm=30%: High price range

(24)

reputation for the average seller we observe that this occurs after the 12th month. At this point, the malicious sellers’ charges are 8,945, 13,802 and 18,351 for

m=10, 20 and 30%, respectively. Therefore, for such an attack to be marginally successful, at least 45% of the malicious sellers (9 out of a total 20 in our simulation) for m=10% must have achieved a positive reputation and managed to have sold to genuine buyers at the highest possible price (1,000 units). But since with m=10% there are 360 honest buyers in our population, only 7 such transactions can take place per day, on average. Hence, this attack can only be marginally successful if at all. Trying daily for 12 months to build up good reputation with a possibility of a marginal profit in the end is obviously not worth.

By increasing the seller charges per transaction at the same level as eBay, it is evident that such attacks cannot be successful at all, since the total charges are now doubled.

Consequently, the cost for the malicious sellers and buyers who work together is considerable, both in terms of time (to build their reputation) and money. Hence, our proposal effectively neutralizes collusion attacks.

6 Conclusion

There are many platforms available today offering e-commerce services on the Internet with high popularity and revenues. In this context, reputation plays an important role, mainly for protecting buyers from malicious sellers. Reputation is essentially an indication of how trustworthy a seller is based on his previous transactions. Since fraud detection is usually performed by experts after denunci-ation by users, an RMS must not only assist a potential buyer in selecting the most appropriate seller, but also deter a malicious seller from launching a fraud. One of the most difficult attacks today is considered the collusion attack, where a group of malicious users (either real or fake) conspire to boost the reputation of malicious sellers (and sometimes to lower the reputation of honest sellers), so that the latter can make sales and disappear with the money before being denounced by the victims.

We propose a novel, yet relatively simple RMS which combines honest ratings and a good sales track record, with time and RMS charges to sellers. In this way, it is difficult and costly for malicious sellers to act in collusion with other malicious buyers and gain substantially from it. In our work, ratings are not revealed until either both parties provide them, or a certain time period has elapsed, thus avoiding direct retaliation. Also, ratings from the same buyer to the same seller are restricted to two positive or at most one negative one, thus avoiding both retaliation and collusion. Finally, the reputation is based on the price range, number of different satisfied buyers and time. We simulated our proposed RMS system with various populations of malicious users (both buyers and sellers) under optimal conditions for them verifying the validity of our claims.

(25)

References

Alpcan T, O¨ rencik C, Levi A, Savas E (2010) A game theoretic model for digital identity and trust in online communities. In: Proceedings of the 5th ACM symposium on information, computer and communication security (ASIACCS ‘10). pp 341–344

Amazon (2010)http://www.amazon.co.uk. Accessed Aug 2010

Artz D, Gil Y (2007) A survey of trust in computer science and the semantic web. J Web Semant, Elsevier: 58–71

Chang J, Wong H (2010) Selecting appropriate sellers in online auctions through a multi-attribute reputation calculation method. Electronic Commerce Res Appl, Elsevier. doi: 10.1016/j.elerap. 2010.05.003

Chen X, Zhao K, Chu X (2008) SepRep: a novel reputation evaluation model in peer-to-peer networks. ATC, Springer-Verlag LNCS 5060: 86–99

Cormier C, Tran T (2009) Improving trust and reputation modeling in e-commerce using agent lifetime and transaction count. MCETECH 2009, Springer-Verlag LNBIP 26: 184–195

Dong P, Wang H, Zhang H (2009). Probability-based trust management model for distributed e-commerce. In: Proceedings of the IC-NIDC’09. pp 419–423

Du R, Ma X, Wang Z (2009) E-commerce trust model based on perceived risk. In: Proceedings of the second symposium international computer science and computational technology (ISCSCT ‘09). pp 175–178

eBay (2010)http://www.amazon.com. Accessed Aug 2010

Feigenbaum J, Parkes D, Pennock D (2009) Economic and social sciences will drive the internet protocols and services into the future. Commun ACM 52(1):70–74

Gudes E, Gal-Oz N, Grubshtein A (2009) Methods for computing trust and reputation while preserving privacy. In: Gudes E, Vaidya J (eds), Data and applications security, IFIP LNCS 5645, pp 291–298 Gutowska A, Sloane A (2010) Modelling the B2C marketplace: evaluation of a reputation metric for

e-commerce. WEBIST 2009, Springer-Verlag, LNBIP 45: 212–226

Hughes J, Lang K, Vragov R (2008) An analytical framework for evaluating peer-to-peer business models. Electron Commer Res Appl 7:105–118

Jøsang A, Ismail R, Boyd C (2007) A survey of trust and reputation systems for online service provision. Decis Support Syst 43(2):618–644

Kamvar S, Schlosser M, Garcia-Molina H (2003). The eigentrust algorithm for reputation management in p2p networks. In: 12th international conference on world wide web. pp 640–651

Lenzini G, van Houten Y, Huijsen W, Melenhorst M (2010) Shall I trust a recommendation? towards an evaluation of the trustworthiness of recommender sites. Springer-Verlag, LNCS 5968: 121–128 Li J, Liu L, Xu J (2010) A P2P e-commerce reputation model based on fuzzy logic. In: Proceedings of the

10th IEEE international conference on computer and information technology (CIT 2010): 1275–1279

Malik Z, Bouguettaya A (2009) Reputation bootstrapping for trust establishment among web services. IEEE Internet Comput January/February: 40–47

Maranzato R, Pereira A, Neubert M, do Lago A (2010) Fraud detection in reputation systems in e-markets using logistic regression and stepwise optimization. ACM SIGAPP Appl Comput Rev 11(1):14–26 Marmol F, Perez G (2010) State of the art in trust and reputation models in P2P networks. In: X Shen

et al. (ed) Handbook of peer-to-peer networking, Springer Science?Business Media, LLC Remondino M, Boella G (2010) How users’ participation affects reputation management systems: the

case of P2P networks. Simul Model Pract Theory 18:1493–1505

Sundaresan N (2007) Online trust and reputation systems. In: Proceedings of the 8th ACM conference on electronic commerce. pp 366–367

Tang L (2009) Grouping-based mechanism driven by reputation in P2P E-commerce. In: Proceedings of the 2009 international symposium on web information systems and applications (WISA ‘09). pp 510–515

Wang Y, Lin K (2008) Reputation-oriented trustworthy computing in e-commerce environments. IEEE Internet Comput July/August: 55–59

Wang Y, Vassileva J (2007) Toward trust and reputation based web service selection: a survey. Int Trans Syst Sci Appl 3(2):118–132

Wang Y, Wong D, Lin K, Varadharajan V (2008) Evaluating transaction trust and risk levels in peer-to-peer e-commerce environments. ISeB 6:25–48

(26)

Wu M (2010) Cloud trust model in e-commerce. In: Proceedings of the second international symposium on networking and network security (ISNNS ‘10). pp 271–274

Wu F, Li H, Kuo Y (2010) Reputation evaluation for choosing a trustworthy counterparty in C2C e-commerce. Electron Commer Res Appl, Elsevier. doi:10.1016/j.elerap.2010.09.004