• No results found

2. Related Work

3.6 Data Store

The Data Store has two main functions. The first function is to store information about a user’s namespace, their ASIs, and their contact’s ASIs. Secondly the Data Store is a point where a user’s contacts can resolve the user’s SGN to its result set. This section will primarily deal with the first function, while the second function is discussed in the subsequent section.

The identifier exchange we outlined in 3.5 results in the exchange of Sobriquet Global Names and also the addition of each party to the other’s namespace. When two people communicate there will be a step where each participant in the communication session authenticates the other. This authentication step requires knowledge of private key values, which are needed by the user on the machine they are communicating from. Since this machine can be any machine they use, and may change frequently, they need access to this information and their contacts’ ASIs from everywhere they communicate from.

This information is divided into two tables in the Data Store’s database. The first table is the user’s table and this contains the information that is tied to a specific SGN. This information includes the SGN itself, the group manager’s private key that allows new members into a Contact Group, a privacy flag, and the result set for that SGN. The group manager’s private key and the encryption keys are stored in encrypted form. We will detail how these records are encrypted, and how keys are managed in Section 3.6.1.

Access to this information is accessible via the name resolution mechanism to clients. Access is controlled through the use of the "Private" flag as shown in the diagram. If this flag is set then a user must authenticate themselves using the protocol outlined in Section 3.4.1 in order to gain access to this information. Since the Data Store can verify messages signed by clients using their group member private key, this authentication can take place. However, the Data Store will be unable to open the signature and

Fig. 3.5: The data stored by the Data Store. PK signifies a primary key, and E signifies an encrypted record.

obtain the identity of this client.

The second database table of significance used by the Data Store is the contacts database. There is a has-many relationship between users and contacts; that is a user has many contacts. This relationship is depicted in Fig. 3.5. The information stored in this table is as follows: the SGN of the contact, their Data Store URI, the group member’s private key that allows the user to prove they are a member of the corresponding Contact Group to the SGN, the Sobriquet Memorable Name that has been assigned to the contact, and the contact’s result set. Out of this information the tuple of the private key and the memorable name are stored in encrypted form. The contact table is only available to the user, and only once they have sufficiently authenticated themselves as outlined in Section 3.6.1.

On the client side the user will have a Data Store Browser, which is a piece of software that communicates with the Data Store, or multiple Data Stores if the user so wishes, and is responsible for synchronising the state on a device with the most up to date state in each of the Data Stores as indicated by a counter value that increments each time an update takes place and which is stored in the Data Store alongside each

SGN to ASI set mapping. The Data Store Browser will maintain a long-term cache on devices the user owns and will maintain a temporary cache on devices the user does not own. To access the Data Store the user will supply the Data Store Browser with a username and password. The username will be in the form of an email address, which the software will parse into username and host components by splitting the string on the "@" character.

Each Data Store Browser maintains a persistent connection to the Data Store. When this is not possible it may periodically poll. When a connection to the Data Store is opened the Browser will send the last timestamp it received from the Store. If there are any updates it will send a new timestamp along with the updated information. If there are no updates the Store will reply with the timestamp sent. The Data Store will push information to the Browser as updates occur. Since each item stored in the Data Store is indexed according to a SGN, each update will be also. The full new record will be sent with each update and will be overwritten if the timestamp of the received update is greater than the one present in the cache. The Data Store treats encrypted data as a binary string. This string will be encoded according to a suitable encoding scheme, such as Base64, before it is sent between the Store and the Browser.

3.6.1

Authentication and Encrypted Storage

The password the user enters to the Data Store Browser will be put through a key derivation function, such as PBKDF2 [77], from which an authentication token and an encryption key will be derived. Passwords should be chosen to be of sufficient length and randomness. While remembering longer passwords is difficult for most people we believe that this is an acceptable trade off given that it is the one piece of state that the user needs to replicate themselves. The authentication token will be used to authenticate the user at the Data Store and the encryption key will be used to encrypt the records stored there. To obtain these two values the key derivation function will be applied twice. The encryption key will be derived from a value obtained by iterating

the key derivation function m times and the authentication token will be derived by iterating n times. So long as m < n the encryption key cannot be derived from the authentication token, assuming the key derivation function uses an adequate one- way function for each iteration. The values of m and n should be defined as global constants and their values are seen as an implementation issue that depends on the key derivation function used. The salt for the key derivation function is stored at the Data Store and obtained during the authentication process. We assume that the authentication procedure takes place over a secure transport protocol such as TLS. We advocate, though do not mandate, that the Secure Remote Password protocol [122] is used for authentication over this channel. The authentication token is used as the SRP password. Even if the TLS channel is compromised the attacker will learn nothing of interest from the login exchange.

3.6.2

Data Store Portability

Since we do not want the Data Store to introduce a component that the user is tied to, we allow them to change to a new Data Store at any time. This process requires that they put a special record in their user table under their SGNs, which points to their new Data Store. This record is signed with their group manager private key. This is performed once the new Data Store tables have been populated with the existing data from one of the user’s devices. Since this process requires user intervention there is no need to ensure that multiple devices are not updating the Data Store at the same time.

3.6.3

Scalability

Data Store usage should scale linearly. Entries in the Data Store are indexed and queried by the SGN. That allows for easy sharding of the data along that database column, with no cross shard queries, which would allow for very linear scaling. Addi- tionally the amount of state stored per user is not very large. Since each query ought

to be carried out at roughly the same frequency per user it’s unlikely that intelligent caching would be possible to fit the most heavily queried data in memory. However, storing a subset of the data, namely the vector clock value and a hash of the SGN would heavily reduce the amount of data needed to fit in memory. A 16 or 32 bit integer value should suffice for the vector clock value and so those values for 1 billion users could easily fit in memory alongside a hash of the corresponding SGN (assum- ing 160 bits), for a total of 36 bytes, on a single machine. By adding more machines this would scale linearly. Reads and writes could additionally benefit from the addi- tion of solid state storage and the use of database replication. Expecting updates to happen at a frequency of about one per day per user should be an overestimation of the requirements since pushing an update requires user involvement in changing their identifiers or adding a new contact, which it seems unlikely would happen on a daily basis. To support 109 users performing one update a day, assuming an even spread,

would require an infrastructure capable of handling about 11600 requests per second. This figure should be easy to support with a manageable number of web servers and databases on current hardware.

Related documents