Security and Privacy Risks of Using Address as an Identity

(1)

Security and Privacy Risks of Using E-mail Address as an Identity

Lei Jin, Hassan Takabi, James B.D. Joshi School of Information Sciences

University of Pittsburgh Pittsburgh, PA, US

[email protected], {hatakabi, jjoshi}@sis.pitt.edu

Abstract—More and more websites are allowing or requiring users to input their e-mail addresses to be used either as identities or for other purposes. Although username-based identity and password problems resulting from user behaviors have been a research focus for quite some time, the serious issues related to using e-mail address as an identity and the associated online behaviors of users have not been well investigated in the literature. In this paper, we discuss and analyze security and privacy problems resulting from the use of e-mail address as identity via well-designed user behavior survey and by investigating website’s design schemes. Our results illustrate that using e-mail address as an identity poses high security and privacy risks. This is mainly because of the multiple usages of e-mail addresses and users’ improper online habits. Moreover, we discuss the drawbacks of existing solutions for e-mail address as identity and related password problems, and present two potential solutions that may secure online identity management systems in future.

Keywords-e-mail address identity; authentication; security

I. INTRODUCTION

The rapid development of Internet-based applications and services has made websites a crucial part of our lives. More and more users are using online office services (e.g Google Doc) instead of desktop office software, online social services (e.g Facebook, Linkedin) to engage in social activities, online information boards (e.g Twitter) to post their business products, and online shopping (e.g Amazon) to do their purchasing. However, the security and privacy problems have also grown along with these newer and open Internet applications.

In order to use the online services and applications, users typically need to create accounts including usernames and passwords. The username-based identity and the related password problems because of online user behaviors have been a focus of research studies for quite some time [1-4].

Florencio et al. report the results of a large-scale study of password usage habits, where they discuss and analyze the password sharing rate among sites, password usage and strength [1]. They also emphasize that it is better to increase the strength of usernames rather than passwords as it is more effective in defending against bulk guessing attacks than attacks that focus on compromising passwords [2]. Perlman et al. summarize the existing problems of username-based identity and password schemes, including the problems related to use of the same usernames and passwords in multiple websites [3]. Just et al. investigate the users’

behaviors related to how they select or design secret questions and answers when they register accounts [4]. Their results show that the poorly-chosen questions and answers would weaken the security in user accounts.

However, the serious issues related to using e-mail address as an identity and related users’ behaviors have not been well investigated. E-mail address as identity is becoming more common and may soon replace username- based identity as it has the following advantages:

• It is easier for a user to remember his e-mail addresses than different usernames. A user may easily apply one of his e-mail addresses as identity for multiple websites;

hence, he only needs to remember one identity rather than multiple username-based identities.

• Web service providers find it more convenient to contact users using e-mail addresses that are collected during the account registration phase. For example, e-mail communication as a confirmation and tracking method is important for online shopping websites, such as Amazon and Ebay.

• Users are able to get or reset their passwords via e-mail when they forget them; this approach is regarded as convenient and safe by online service providers.

Because of the benefits mentioned above, many websites allow and some even require users to use their e-mail addresses as identities in their login pages. However, using e-mail address as an identity introduces severe security and privacy risks. In fact, an e-mail address identity can potentially expose more private information to the public than the username-based identities, thus introducing significant privacy concerns. Such concerns arise because an e-mail address identity, in addition to representing a unique ID, also reveals the user’s e-mail address details (e.g., login name, domain name, etc.). For instance, if a user has an e- mail address identity [email protected] in one website, it reveals the fact that she has an account named mary in the Gmail service. Therefore, once a user’s e-mail addresses or e-mail address identities have been compromised, it may put his other related accounts at a higher risk.

In this paper, we focus on the security and privacy issues related to the using of e-mail address as identity. We illustrate how serious they are becoming, identify potential attacks to e-mail address identity systems and analyze the extent of damages such attacks may cause. We present the results of our survey which was carried out to investigate users’ behaviors in using e-mail address as an identity. We

(2)

also investigate popular websites’ account design schemes to see how they address security and privacy issues related to users’ e-mail addresses. Based on our user survey and investigation of online sites, we show that the security and privacy problems of using e-mail address as identity are not caused only by user behaviors but also by the design schemes used by the websites. We demonstrate that the use of e-mail address as identity poses a higher risk to users’

online accounts and may result in more breaches of security than username-based identity. This is mainly due to the multiple usages of e-mail addresses, such as for identification and communication. We also argue that using e-mail address as an identity cannot be a secure solution for users because users share their identities and passwords in multiple websites – such sharing behavior can put all of the user’s accounts at a higher risk when any one of his e-mail accounts used as identities is compromised. The key contributions of this paper are as follows:

• To the best of our knowledge, this paper is the first investigation related to the using of e-mail address as identity for websites.

• Based on our survey and investigation results, we demonstrate the high security and privacy risks of using e-mail address as an identity by showing different threats resulting from such behaviors.

• We present the drawbacks of existing solutions for e- mail address identity (and the related password problems), and propose suggestions for users to better protect their e-mail address identities. We also discuss potential solutions that may secure identity management systems.

The rest of the paper is organized as follows. Section 2 introduces our methodology for the user behavior survey and website investigation. In Sections 3 and 4, we analyze the results of our user behavior survey and present the investigation of websites, respectively. In Section 5, we discuss existing solutions for the identified security and privacy problems, their drawbacks and introduce possible approaches. In Section 6, we present our conclusions.

II. M^ETHODOLOGY

Our survey is focused on investigating users’ behaviors related to the use of their e-mail addresses. The survey includes 74 participants who responded to the survey via e- mail. The participants are graduate students in the School of Information Sciences at the University of Pittsburgh (USA) and the School of Software in Tsinghua University (China).

In order to collect information about users’ behaviors in using e-mail addresses, we designed a questionnaire and asked each participant to carefully read the questions and answer them. The questionnaire included 27 questions about using e-mail addresses in general, using e-mail addresses when registering in different websites, using the same passwords for different e-mail address identities, etc. The questions were in the form of multiple choice and yes/no questions. Particularly, we were interested in the following user’s behavioral information:

• Use of e-mail addresses: we focus on the following questions: how many e-mail addresses each participant has; how many participants’ e-mail addresses, divided by primary and non-primary, are used during website registrations; do the participants use the same passwords for their e-mail accounts and e-mail address identities in other websites; etc.

• Username preferences for e-mail accounts: we gather information of participants’ preferences with regards to using the same username for their e-mail accounts.

• Capability of remembering passwords for e-mail address identities: we examine the pairs of e-mail accounts/identities and passwords participants can remember.

We collected quantitative data through the survey and used it to analyze the relationships between users’ behaviors and potential security and privacy problems related to these behaviors. In order to analyze the data, we designed a table that includes a code for each question and the corresponding number of possible answers.

Note that the survey participants are graduate students in the computer science field. Hence, they represent a fairly expert IT group and can be assumed to have a deeper understanding of security and privacy issues than general users. Although our participants are not completely representative of the community in general, this group has its own advantages. First, these participants can be expected to be more involved in online services. For example, they are more likely to share their information with his friends online.

As a result, we can obtain more detailed data about users’

behaviors. Second, because of their background, we believe that they are better positioned to answer our survey questions.

While the results of our analysis are focused on participants who might be more aware of IT issues, we can easily assume that any bad behaviors would be potentially much worse for the general population who might be less aware/educated about security issues.

In addition to users’ behaviors, another factor that may cause security and privacy issues when using online services and applications is how websites are designed. To this end, we have investigated popular websites to examine the security and privacy risks caused by account design schemes used by the websites. We have tried to infer how users’

private information is stored in these websites. Based on the inferences, we have demonstrated that users’ accounts in different websites have a high risk of being compromised when their e-mail accounts or e-mail address identities in some sites are compromised and passwords are exposed, because users may have used the same e-mail address identities and passwords in multiple sites.

In the study of websites, we examined 94 popular websites that can be categorized into different types of online services, such as shopping websites (e.g Amazon and Newegg), social networking websites (e.g Facebook and Linkedin), financial websites (e.g Paypal), etc. In our study, we specifically examined the registration, login and password recovery processes. We collected information related to the following three components of each website:

(3)

• Website's registration scheme: we focus on what types of users’ information is requested during registration.

• Types of identities: we look at what kinds of identities are supported by the websites.

• Password recovery schemes: we examine what types of password recovery schemes are applied by the websites.

The questionnaire, user behavior statistic reports and analysis report of websites are available in the URL mentioned in [5].

III. U^{SE OF}E-MAIL ADDRESS AND RELATED SECURITY AND PRIVACY ISSUES

In this section, we illustrate and analyze the result of our user behavior survey and discuss various security and privacy problems related to the use of e-mail address.

A. Use of E-mail Address

Number of E-mail Accounts and Popular E-mail Services The survey results show that all of the participants adopt e-mail as their main medium to communicate with others.

More than half of them have between 1 and 3 e-mail accounts; the remaining have more than 4 e-mail accounts.

However, most of the participants (nearly 95%) only use 1 to 3 e-mail accounts. The survey also shows that 82.4% of the participants use Gmail as one of their primary e-mail accounts; 13.5% and 5.4% of them use Hotmail and Yahoo Mail, respectively, as one of their primary e-mail accounts.

And, 55.4% of the participants use other e-mail addresses, such as their company e-mail addresses as their primary e- mail accounts.

Primary E-mail Addresses

Figure 1 demonstrates that 67.6% of the participants use their primary e-mail addresses while registering on other websites. These e-mail addresses are used by online services as identities to contact users for various purposes such as notifications, shopping confirmations and password recovery.

Figure 2 illustrates the distribution of the websites where the primary e-mail addresses of the participants are used. 43.2%

of the participants use their primary e-mail addresses on shopping websites (e.g Amazon, Ebay and Newegg); 51.4%

of them employ their primary e-mail addresses on financial websites (e.g Paypal and Citibank); 66.2% of them use their primary e-mail addresses on social networking websites (e.g Facebook and Twitters) and 25.7% of them adopt their primary e-mail addresses on other websites.

Figure 1. Use of Primary E-mail Address

Florencio et al. have shown that participants use the same passwords on average in 3 different websites [1]. In our survey, we also investigate how many participants use their

e-mail accounts’ passwords on other websites where their e- mail addresses are used as identities or for other purposes.

As shown in Figure 1, of the 67.6% of the participants using their primary e-mail addresses during registration, 66.0%

(20.0%+46.0%) of them share at least one of their primary e- mail accounts’ passwords with the e-mail address identities on other websites. In particular, 20.0% of the participants use one password for their primary e-mail accounts and e-mail address identities on multiple websites. Only 34.0% of these participants claim they use their primary addresses for website registration processes, but never use the same passwords of their primary addresses for the e-mail address identities on multiple websites.

Figure 2. Distribution of Primary E-mail Addresses Used in the Websites

Non-Primary E-mail Addresses

Although 32.4% of the participants (see Figure 1) never use their primary e-mail accounts on other websites, all of these participants employ at least one of their other e-mail addresses for website registrations; this may be because some websites, such as Amazon and Facebook, only support e-mail address identity. Compared to the impact of sharing the same e-mail passwords between primary e-mail addresses and the corresponding e-mail address identities, our survey results (see Figure 3) demonstrate that participants are at a higher risk in this case than the case of primary e-mail addresses. A significant proportion of the participants either share fully (29.7%) or partially (39.2%) their non-primary e-mail accounts’ passwords with the corresponding e-mail address identities on other websites.

Only 31.1% of the participants never use their non-primary e-mail accounts’ passwords for the corresponding e-mail identities on other websites.

Figure 3. Passwords Sharing Rate among Non-Primary E-mail Addresses

E-mail Account Settings

We also collected participants’ behavioral information regarding how they configure the settings in their e-mail accounts. 56.8% of the participants connect at least one of their primary e-mail addresses with their other e-mail addresses for some purposes, perhaps to receive e-mails from other e-mail accounts and/or to receive password reset links when the passwords of their e-mail accounts are forgotten. In particular, 52.7% of the participants ask their primary e-mail accounts to receive e-mails from their non-primary accounts.

Based on the results related to participants’ behaviors with regards to e-mail account setting, we infer that: 1) participants’ e-mail accounts may be easily compromised

43.2%

51.4%

66.2%

25.7%

0% 10% 20% 30% 40% 50% 60% 70%

Shopping Websites Financial Websites Social Websites Other Websites

29.7% 39.2%

31.1%

0.0%

20.0%

40.0%

60.0%

All Same Partial Same Never Same

(4)

when their other e-mail accounts, which are used to receive password reset links of these accounts, are compromised; 2) participants may expose their non-primary e-mail accounts more when their primary e-mail accounts are compromised;

3) participants may lose their accounts on other websites once their e-mail accounts used on these websites are compromised. This is because the e-mail containing plaintext passwords or password recovery links of these accounts can be assumed to be exposed. For instance, assume one user allows his Gmail account to receive his e-mails of Yahoo Mail and he uses his account of Yahoo Mail as the identity on Amazon. Once this Gmail account is compromised, his Amazon account will likely be compromised as well because an adversary can get the e-mail containing the Amazon’s password reset link (it has been originally sent to his Yahoo Mail account by Amazon) in his Gmail accounts.

Analysis and Discussion of Security Problems

Based on the results related to the use of primary e-mail addresses, we can infer that the percentage of participants who use the same passwords in both their primary e-mail accounts and websites where these e-mail addresses are used as identities, is 67.6% 20.0% 13.5%, where 67.6% is the portion of the participants using their primary e-mail addresses on other websites, and 20.0% is the portion of the participants using the same password both for their primary e-mail accounts and for accounts of websites where these e- mail addresses are used as identities.

Combining the results related to the popular e-mail services, we can infer that the percentage of participants who use their primary e-mail addresses as identities (e.g., Gmail) on the social websites is approximately 82.4% 66.2%

54.5%, where 82.4% is a portion of the participants using Gmail as one of their primary e-mail addresses and 66.2% is the portion of the participants employing their primary e- mail addresses as identities on social websites. Assuming that 30% of the participants use their Gmail accounts in Facebook, the portion of the participants having the same passwords for their Gmail and Facebook accounts is approximately 13.5% 54.5% 30% 2.2%. We believe that these participants will have both accounts compromised if their Gmail accounts or Facebook accounts are compromised and then the passwords are exposed.

Through the analysis above, we can infer that adversaries only need to compromise a user’s primary account, e.g his Gmail account, in order to potentially compromise many of his other accounts on multiple websites. For example, adversaries may compromise the user’s other e-mail addresses/identities on multiple websites by trying the same password used in his Gmail account, checking Gmail account settings and searching registration confirmation and password reset e-mails. Moreover, adversaries can aim to compromise the user’s account on some poorly-designed website where a user employs his Gmail address as identity and share the same password of this account with his Gmail account. After the password of this account is exposed to the adversaries, they can compromise his Gmail account and further compromise his other accounts on multiple websites - ways similar to those mentioned above. We analyze the issues related to the website design in Section 4.

B. Username Diversity and Retention Capability For E- mail Accounts/E-mail Address Identity

Username Diversity

We also investigate the issue of participants’ preferences regarding usernames in different e-mail services. The results show that 35.1% of the participants use one username for all of their e-mail accounts in different e-mail services; 64.9%

of the participants apply one username for several e-mail services but not for all of the e-mail services for which they are registered. Among the participants who do not use the same username for all of their e-mail accounts, 20.8% of the participants use two different usernames, 58.4% of the participants apply three different usernames and 20.8% of the participants adopt more than three usernames for their e- mail accounts.

The investigation of username diversity in e-mail accounts reveals that it is not difficult to obtain all of the e- mail addresses that belong to one user, because most of them only use between 1 and 3 usernames for different e-mail services. This user behavior also shows that adversaries can easily guess users’ e-mail identities on different websites.

Retention Capability

Our survey also shows the pairs of e-mail accounts/identities and corresponding passwords that the participants feel that they are able to remember. As shown in Figure 4, most of the participants indicate that they can remember 2 to 5 pairs of e-mail addresses/identities and passwords. Only 8.1% of the participants remember one pair, while 9.5% of them indicate a high retention capability and claim they can remember more than 5 pairs.

Figure 4. Memorizing Capability For E-mail Accounts/Identities

Compared to the study on retention capability with regards to passwords in [1] where the authors show that their participants can usually remember 5 or 6 different passwords for username-based identities, our results indicate that the participants may have less retention capabilities for e-mail addresses/identities and passwords. Therefore, we may infer that some of the participants may tend to use same passwords of their e-mail accounts for their e-mail address identities on multiple websites. This inference may validate our previous analysis that using the same passwords among users’ e-mail accounts and corresponding e-mail address identities on multiple websites is a common practice.

New Opportunity for Brute Force Attack

Based on the analysis of username diversity and retention capability, we infer that there are a large number of users who not only use the same passwords but also the same usernames in e-mail addresses and e-mail address identities on various websites. For instance, one user may apply one username, e.g mary, both in the Gmail and Hotmail systems and employ the e-mail addresses [email protected] and

8.1% 16.2%

41.9%

24.3%

9.5%

0%

20%

40%

60%

1 2 3 4 to 5 > 5

(5)

[email protected] as identities on different websites, such as Amazon, Facebook and Paypal. However, adversaries can try many different passwords simultaneously for one username on different e-mail services and websites without any restrictions, such as the limit on the number of incorrect attempts on the site in one day. For example, an adversary can try many different passwords simultaneously for [email protected], [email protected] and the websites where these e-mail addresses are used as identities, such as Amazon, Facebook and Paypal. If these e-mail addresses and e-mail address identities belong to one user and if he uses the same password in these different systems, which is likely based our previous discussion, these accounts may be easier to be compromised by brute force attacks. Furthermore, when adversaries use their botnets and proxies to launch such brute force attacks, they can break more security mechanisms and compromise users’ accounts more easily.

IV. W^EBSITEACCOUNT DESIGN SCHEMES AND SECURITY AND PRIVACY ISSUES

Security and privacy problems of using e-mail address as identity arise not only from users’ habits but also from websites’ unsecure designs and mechanisms. In this section, we discuss these issues and illustrate that additional risks are created when poorly-designed websites are compromised, exposing the private information stored in these websites.

A. Websites’ Design Investigation

In this section, we discuss our findings based on the investigation of 94 popular websites with regards to their schemes related to account registration, types of identities and password recovery. The details of the investigation are shown in [5].

Website Registration Schemes

TABLE I. CATALOG OF WEBSITE REGISTRATION SCHEMES

Scheme Description

A Websites require users to input unique usernames

B Websites require users to input unique usernames and select or customize secret questions and answers

C Websites only require users’ unique e-mail addresses D Websites require users to input unique usernames and unique

e-mail addresses

E

Websites require users to select secret questions and set the corresponding answers, or ask users to define customized questions and provide the answers. The e-mail addresses are also required by the websites

F Websites require users to input unique usernames, unique e- mail addresses, secret questions and the corresponding answers

Our findings of the registration schemes that websites generally adopt (Table I) are illustrated in Figure 5. The proportion of the websites that request users to input e-mail addresses as identities or use e-mail addresses for other purposes (e.g for password recovery) is 92.4% (Scheme C + Scheme D + Scheme E + Scheme F); 10.1% of the websites (Scheme B +Scheme E + Scheme F) request users to select or define their secret questions and answers. In particular, 2.5%

of the websites support two kinds of registration schemes (Scheme B and Scheme D): when users forget their passwords, they can get or reset passwords via their e-mail

addresses or by answering their secret questions. Only 5.0%

of the websites (Scheme A + Scheme B) do not request users’

e-mail addresses when registering.

Figure 5. Distribution of Website Registration Schemes

This result confirms our argument that most of the websites (92.4%) request users to input their e-mail addresses for some purpose, such as to send passwords or password reset links. This fact also reveals a high risk for e- mail accounts because users utilize their e-mail addresses on many different websites.

Types of Identities

We also investigate the types of identities that websites support. The result shows that 55.3% of the websites only support e-mail address as identity, 29.8% websites employ username-based identity and 14.9% of the websites support both of these identities. Based on these data, we find that e- mail address identity is widely adopted: 70.2% (55.3% + 14.9%) of the websites support it.

Based on the data in website registration schemes, we can infer that the proportion of websites that request users’ e- mail addresses as identities is approximately 92.4%×70.2%

= 64.9%. This indicates that e-mail address as identity is becoming popular. However, our analysis and discussion in Section 3 show that e-mail address as identity poses huge security and privacy risks.

Password Recovery Schemes

Based on the password recovery schemes of the websites that we studied (Table II), our collected data related to password recovery schemes is shown in Figure 6. When users forget their password, 62.8% of the websites send the password reset links to users’ e-mail addresses and 6.4% of them (Scheme C1 + Scheme C2) ask users to answer their customized secret questions. Of the remaining websites, 14.9%

of them send users’ original passwords as plaintext and 5.3%

of them send new passwords as plaintext to users’ e-mail addresses. Also, a small set of websites send temporary links displaying new passwords, or links showing secret questions to users, or send password reset links after users can answer secret questions correctly. In particular, 5.3% of the websites support two kinds of password recovery schemes: sending password reset links to users via e-mail and asking users to answer their pre-defined secret questions.

TABLE II. CATALOG OF PASSWORD RECOVERY SCHEMES

Scheme Description

A1

Users input their usernames or e-mail addresses. Services send the original passwords in plaintext to the e-mail addresses recorded

A2

Users input their usernames or e-mail addresses. Services send the new passwords set by services in plaintext to e-mail addresses recorded

2.5% 2.5%

34.2%

50.6%

1.3% 6.3% 2.5%

0.0%

20.0%

40.0%

60.0%

A B C D E F B OR D

(6)

A3

Users input their usernames or e-mail addresses. Services send temporary passwords (valid for a limited time) to users’ e-mail addresses in plaintext. After users login using these temporary passwords, they can modify their passwords

B1

Users input their usernames or e-mail addresses. Services send password reset links (valid for a limited time) to users’ e-mail addresses

B2

Users input their usernames or e-mail addresses. Services send the links (valid for a limited time) of pages displaying new passwords to users’ e-mail addresses

C1

Users input their usernames or e-mail addresses. Services provide users’ selected or customized secret questions. After answering correctly, users can reset their passwords

C2

Users input their usernames or e-mail addresses. Services provide users’ selected or customized secret questions. After users answer the secret questions correctly, services send password reset links (valid for a limited time) to users’ e-mail addresses

D

Users input their usernames or e-mail addresses. Services send links that show users’ selected or customized secret questions to users’ e-mail addresses. After answering correctly, users can reset their passwords

Figure 6. Password Recovery Schemes

We also observe that more than 90% of the websites (Scheme A and Scheme B) send plaintext passwords or password reset links to users’ e-mail accounts. This fact confirms the high risk related to e-mail address identity discussed in the previous section: a user’s accounts on different websites may be easier to compromise once some of his e-mail accounts have been compromised. This is because adversaries can acquire or reset victim’s passwords for his other accounts on multiple websites via the compromised e-mail account.

Some users may believe that recovering passwords via secret questions is a secure mechanism. However, we argue that this is still not safe because users may typically apply the same secret questions and answers for multiple websites.

Moreover, adversaries may obtain users’ passwords through other means even though they cannot compromise their e- mail accounts in secure e-mail services. To do this, they can compromise poorly-designed websites and get the private data, such as users’ e-mail addresses, passwords, secret questions and answers. After that, they may compromise accounts for other websites where users share e-mail address identities, passwords, or secret questions and answers. In the following, we analyze and discuss these issues.

B. What Adversaries Obtain After Compromising Websites Generally, users’ private data, including passwords and answers to secret questions, are stored on the websites. When adversaries have compromised the websites, they may obtain

this information. In the following, we discuss these issues based on our study of websites.

Storage for Users’ Passwords

We can infer that some websites store users’ passwords in plaintext because they send users’ original password in plaintext when the passwords are forgotten. These websites comprise at least 14.9% (Scheme A1 in the Figure 6) in our study, such as Pizza Hut (pizzahut.com) and Conduit (conduit.com). Although this is a small portion of the websites studied, we cannot ignore them because adversaries may obtain users’ passwords easily after they compromise these websites.

In other websites, they may store users’ passwords as hashed values in order to protect users’ privacy. However, it is not difficult for adversaries to compromise these hashed passwords if the salt mechanisms [16], which can complicate and defend dictionary attacks effectively, are not applied or are exposed. This is mainly because users usually apply simple passwords which cannot be defended against a brute- force or dictionary attack because of the increased computing resources available currently. Florencio et al show that most of the users define their passwords to be 6-13 characters long [1]. They report that more than 60% of such long passwords are composed only of lowercase letters; at least 10% of the 7-13 character long passwords are composed only of digits;

only 10% of the passwords are composed of four different types of characters including lowercase/uppercase letters, digits and symbols. Based on these data, we can determine the volume of calculations necessary to compromise users’

hashed passwords. For example, in order to decrypt the 8- character long passwords that are composed only of lowercase letters or only of digits, the adversary only needs to do 26 10 2.1 10 10 2.1 10

calculation. It is not difficult for an adversary to search and compare the hashed values in this space to compromise approximately 70% (60%+10%) of all 8-character long passwords.

As a result, hashed-password systems without salt mechanisms can no longer be considered impervious to attacks. Adversaries can first target the websites that are easier to compromise. After getting users’ hashed passwords, they can compromise them by the method discussed above.

Finally, they can try these real passwords in different websites to compromise users’ other accounts.

Users’ Secret Questions and Answers

Some well-designed websites (about 6%, Scheme C in the Figure 6) ask users to select or customize secret questions and answers. However, we strongly believe that users also employ their secret questions and answers on multiple websites, similar to their behavior of sharing their e-mail address identities and passwords. One reason for this behavior may be the limited retention capability of users.

For the secret questions, most of these websites (except Scheme D) expose them to the public. For the answers to the secret questions, some of the websites may store them as hashed values while some store them in plaintext. After an adversary compromises a website, he may obtain the user’s secret questions and the answers (if the answers consist of simple characters that are hashed without salts, adversaries

14.9%

5.3%

2.1%

62.8%

1.1%

5.3%

1.1%

2.1%

5.3%

A1 A2 A3 B1 B2 C1 C2 D B1 or C1

0% 10% 20% 30% 40% 50% 60% 70%

(7)

may compromise them via the brute-force approach, as described above). Then, adversaries can check and compare secret questions on multiple websites, and try the answers they get to comprise more accounts. Even though adversaries cannot decrypt the hashed answers, they can find or infer some of the users’ privacy information from the data exposed on multiple social websites [15], to guess the answers to the secret questions.

V. E^XISTINGSOLUTIONS AND THEIR DRAWBACKS

In this section, we discuss several existing solutions to deal with security and privacy issues related to using e-mail address as an identity and demonstrate that these approaches are still far from providing strong protections. We only consider the identity mechanisms which can be applied by websites that are open, dynamic and have a large number of users. The solutions for identity management systems based on cryptography may not be suitable for these websites because these mechanisms are suitable typically only for a controlled environment with well-organized key management schemes in place [6].

A. Password Solutions

There is an argument that complex textual passwords can protect users’ accounts effectively. Although this might be true, the fact is that this argument overlooks the usability issues. Complex passwords, composed of lowercase letters, uppercase letters, digits and symbols, are difficult for users to remember and are not easy to input when users log into the websites. Furthermore, even though some websites require users to set more complex passwords, these passwords will likely be reused by users in other websites.

Therefore, the complex password scheme cannot solve the security problems of password usage we discussed earlier.

Some researchers believe that graphical passwords [7] or biometric passwords [8] can significantly improve the security of users’ accounts as these password schemes increase the length and complexity of passwords and may be more convenient to remember. With graphical passwords, a user needs only to remember a special image as a password, while biometric passwords do not require users to remember anything at all. However, we argue that these password schemes may not be widely adopted by websites. For graphical passwords, users may have to remember different images for each different website in order to achieve higher levels of security. If users cannot remember enough images and adopt only few images as passwords, the problem is similar to that of sharing textual passwords as discussed earlier. For biometric passwords, adversaries may get a user’s biometric features from compromised websites; thus, they may be able to compromise a user’s accounts on other websites as well. Furthermore, some biometric features, such as a user’s fingerprints, are easy to obtain in the course of daily life.

The one-time password scheme, which is a dynamic password scheme and can achieve randomization, is becoming popular and has been proved to be more secure.

However, users need a device to receive an active password, such as smart cards [9] or mobile phones [10]. Users have to

carry these devices and use them to receive passwords when they are logging into the websites. It may not be convenient for some users. Furthermore, the security issues of such devices, such as being lost, need to be considered.

B. Password Management Tools and Single Sign-on Systems

Password management tools, such as Firefox extensions proposed in [11] to manage passwords, are considered to be potential solutions to the problem of using same identities and passwords for multiple websites. Generally, a password management tool helps users to organize usernames and corresponding passwords. Such a tool typically has a local database or file that holds the encrypted passwords for different websites. Users have to create profiles and set master passwords to manage their accounts’ passwords.

When users log into different websites, these tools can invoke corresponding passwords and input them into the login pages automatically. As such, users do not need to remember many passwords so that they can set unique and complex passwords for every website. We believe that these tools are convenient but they pose a high risk for all of users’

accounts. This is because all accounts recorded in these tools will be compromised if the master passwords for these tools are compromised.

Single Sign On (SSO) systems allow users to log into their different accounts through one portal where the users are authenticated. Currently, there are two kinds of SSO schemes: 1) one password for different applications, such as Google applications and Microsoft online applications; 2) one master password to manage SSO systems. These SSO systems record users’ identities and passwords in their databases. In SSO systems, users only need to use one account and a password to login; then, they can access all the websites listed without inputting identities and passwords again. For example, users just need to log into their Gmail account, and then they can access other services in Google such as Calendar, Doc, and Voice. SSO systems indeed improve the level of convenience for users, but the security of all of the users’ accounts depends on the security of the SSO systems – hence, they become victims of a single-point- of-attack. Once a SSO system is compromised, all the accounts in it will be compromised.

C. Recommendations for Users

In previous sections, we discussed that users’ behaviors, such as using the same e-mail address identities and passwords on multiple websites, put users at higher risk and pose increasing security and privacy problems. There are some suggestions in the literature to overcome these related problems [1, 2], such as using diverse identities and applying password diversity. In the following, we provide some additional recommendations for users based on our findings:

• Avoid registering for an account using e-mail addresses on websites which have not been validated as secure and avoid using the same e-mail address as identity on many websites.

• Avoid using the passwords of e-mail accounts for accounts of other websites where these e-mail addresses

(8)

have also been used as identities. Users should try to find their own password pattern, which is easy for them to remember, to achieve password diversity for multiple websites.

• A user should try to make his e-mail addresses and e- mail address identities independent of (or less dependent on) each other. For example, one improvement is to remove the e-mail account setting that allows forwarding from multiple accounts to just one. This mechanism can reduce the number of accounts compromised when a subset of the user’s accounts are compromised.

D. Identity Management in Future

According to our analysis and discussion in the previous sections, a key issue of handling these problems related to e- mail address as identity is to protect users’ private information on the websites, such as their passwords and answers to secret questions. A feasible solution is that users do not store their e-mail addresses, passwords and answers of secret questions on the websites. They should be able to provide their anonymous credentials when they log into various websites. A potential system employing this mechanism is the Federated Identity system where a trusted and secure third party is used to store users’ personal information and cryptographic methods, such as Zero Knowledge Protocol [12], are used to guarantee anonymity and hence protect users’ privacy [13]. However, the high risk of exposing all of the accounts still exists. When a Federated Identity system is compromised by an adversary, all of the users’ accounts on different websites may be compromised.

Therefore, such a Federated Identity system must be well- designed to achieve a higher level of security.

Another promising identity management system is introduced by Schechter et.al in [14]. They employ a mechanism called social-authentication for identifying users.

This system asks trustees (previously appointed by the account holder) to verify the account holder’s identity. The main problem here is how to find appropriate trustees in a timely fashion when users want to log into the service. We believe that it may not be a significant problem in future due to the rapid development of smart phones and other internet- connected devices. In this case, the selected trustees may always stay connected with Internet.

VI. C^ONCLUSION

Using e-mail address as an identity on different websites is becoming a common practice. We believe that this popularity is due to the convenience for users of having to remember only one identity. In addition, it is more convenient for websites to require an e-mail address that will be later used to contact users for services. However, the use of an e-mail address as an identity poses more security and privacy threats than username-based identity. We have demonstrated the risks are due to several factors: the multi- purpose of the e-mail addresses, users’ behaviors in using their e-mail address identities and passwords on multiple

websites, and the account design schemes used by websites which might not be secure based on our presented survey and investigation. We have also analyzed existing password management solutions and discussed two potential identity management systems. For our work in the future, we plan to focus on improving and extending these two potential solutions to achieve more practical and more secure identity management systems for dynamic large-scale websites.

ACKNOWLEDGMENT

This research has been supported by the US National Science Foundation award IIS-0545912.

REFERENCES

[1] D. Florencio and C. Herley, “A large-scale study of web password habits”, In Proceedings of the 16th international conference on World Wide Web, 2007, pp. 657-666.

[2] D. Florêncio, C. Herley, and B. Coskun, “Do strong web passwords accomplish anything?”, In Proceedings of the 2nd USENIX workshop on Hot topics in security, 2007, pp. 1-6.

[3] R. Perlman and C. Kaufman, “User-centric PKI”, In Proceedings of the 7th symposium on Identity and trust on the Internet, 2008, pp. 59- 71.

[4] M. Just and D. Aspinall, “Personal choice and challenge questions: a security and usability assessment”, In Proceedings of the 5th Symposium on Usable Privacy and Security, 2009, pp. 1-11.

[5] L. Jin, “Investigations of Users' and Websites' Behaviors”, www.sis.pitt.edu/~leijin/investigation.htm.

[6] DeadMan's Handle Ltd, “DeadMan's Handle and Cryptography”, http://www.deadmanshandle.com/papers/DMHAndCryptology.htm [7] K.M. Everitt, T. Bragin, J. Fogarty and T. Kohno, “A comprehensive

study of frequency, interference, and training of multiple graphical passwords”, In Proceedings of the 27th international conference on Human factors in computing systems, 2009, pp. 889-898.

[8] J.L. Wayman, “Biometrics in Identity Management Systems”, IEEE Security and Privacy, 2008. 6(2): pp. 30-37.

[9] S. Lee and K.M. Sivalingam, “An efficient One-Time Password authentication scheme using a smart card”, Int. J. Secur. Netw., 2009.

4(3): pp. 145-152.

[10] K.C. Liao, W.H. Lee, M.H. Sung and T.C. Lin, “A One-Time Password Scheme with QR-Code Based on Mobile Phone”, In Proceedings of the 5th International Joint Conference on INC, IMS and IDC, 2009, pp. 2069-2071.

[11] J.A. Halderman, B. Waters and E.W. Felten, “A convenient method for securely managing passwords”, in Proceedings of the 14th international conference on World Wide Web, 2005, pp. 471-479.

[12] E. Bertino, F. Paci, R. Ferrini and N. Shang, “Privacy-preserving Digital Identity Management for Cloud Computing”. IEEE Data Eng, 2009. 32(1): p. 21-27.

[13] J. Camenisch and E. V. Herreweghen, “Design and implementation of the idemix anonymous credential system”, In Proceedings of the 9th ACM conference on Computer and communications security, 2002, pp. 21-30

[14] S. Schechter, S. Egelman, and R.W. Reeder, “It's not what you know, but who you know: a social approach to last-resort authentication”, In Proceedings of the 27th international conference on Human factors in computing systems, 2009,pp. 1983-1992.

[15] E. Zheleva and G. Lise, “To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles”, In Proceedings of the 18th international conference on World Wide Web, 2009, pp. 531-540.

[16] Salt (Cryptography), http://en.wikipedia.org/wiki/Salt_cryptography.