• No results found

TRUSTED ARCHIVE OVERVIEW

N/A
N/A
Protected

Academic year: 2021

Share "TRUSTED ARCHIVE OVERVIEW"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)
(2)

Enterprise communications have experienced explosive growth—

driven by the continuous shift to digital workflows and the pervasive

use of social media, mobile and web channels for customer

engagement. As increasingly complex record-keeping requirements

are introduced by legal authorities, staying compliant across all

communication channels continues to be a challenge for enterprises.

One of the main components of compliance is record keeping, the

process of ensuring that data is accessible in an archive for future use.

To preserve the integrity of archived data, a mechanism must be in

place to guarantee that data is not corrupted or lost. This mechanism

is reconciliation, i.e., the process of establishing consistency among

data from a source to a target store and vice versa.

Reconciliation ensures that a set of records is authentic and correct

versus a “golden copy,” i.e., that they match in digital signatures

(e.g., checksums, fingerprints) and in record counts. The process

must be efficient to be able to handle the hundreds of terabytes

of data and hundreds of millions of records typically found in a

compliance archive.

The challenge is to find or develop a mechanism that both detects

discrepancies in large datasets in a space- and time-efficient way and

offers a means of resolving such discrepancies in a timely manner.

The figure on the next page describes the Bloomberg Vault

reconciliation process. This process employs a sophisticated data

sequence and hash tree (Merkle Tree) technology to efficiently compare

and synchronize a range of data objects between client and cloud.

(3)

1. Transmit Data — data local to

the client is transmitted to the cloud-based archive and ingested and stored there.

2. Determine Range – a range

of data determined by count or time threshold is selected from the local store by the client to be reconciled with the cloud.

3. Create Sequence — client

uses the embedded sequence in the objects in the data range, i.e., brings the objects into a deterministic order.

4. Calculate Hash Tree — a hash

signature is calculated for each object. The hash signatures are used as leaf nodes to generate a tree of hashes.

5. Retrieve Data — the cloud

retrieves the objects for a given data range from its archive.

6. Determine Range, Use Sequence and Retrieve Data — the cloud performs the

same sequencing and hash tree calculation tasks as described in steps 2–4.

7. The Reconciliation Process – starts with a comparison of

the root value of the hash trees generated for a given data range. A match for a data range digitally certifies that the corresponding dataset is the same between client and cloud. If the values don’t match, the process recursively traverses the hash tree to find the corrupted objects, i.e., those for which the hash signatures don’t match. These objects are retransmitted in a subsequent step.

The Bloomberg Vault Reconciliation Process

TRUSTED ARCHIVE OVERVIEW 2

WITHIN CLIENT WITHIN CLOUD

DATA DATA

ARCHIVE COMPARE ROOT NODE

COMPARE SUB-NODES

RETRANSMIT MISSING DATA

(4)

Bloomberg Vault reconciliation uses an efficient protocol between a client and the cloud-based archive to execute the reconciliation process. The protocol is implemented by a client component that runs as part of an enterprise communication or collaboration service and by a hosted service offered as part of Bloomberg’s cloud.

Clients that participate in the

reconciliation process maintain a copy of all data that needs to be reconciled. This local copy can be a file share, a mail server inbox or a local store of a SMTP server; it is often called the “golden” copy. The size of the golden copy depends on the resiliency requirements of the customer, but once the data has been reconciled, it can be destroyed and the space reused.

The example below depicts implementation and protocol

using Microsoft Exchange Server®

and a plugin.

Reconciliation Client Plugin in Exchange maintains the hash values and sequence numbers for all email communications that are transmitted to the cloud for archiving purposes. At the end of a configurable time (e.g., end of day), the Plugin starts the reconciliation process for the range of data transmitted.

The first step is to sequence the hash values for each of the individual emails in the data range to be reconciled. After all emails have been sequenced, digital signatures and the hash tree are generated, and the Plugin sends a Start Reconciliation message to the cloud service.

The cloud service, once it receives the Start Reconciliation message, retrieves all email messages that have been ingested, stored and indexed on behalf of the client for a given data range. It performs the same sequencing and hash tree generation as the Plugin and notifies the Plugin when it is ready to reconcile for a given data range.

After the Start Reconciliation steps are completed, the Client Plugin and Hosted Service exchange hash tree nodes within a reconciliation process that efficiently compares data between the two systems. The process either digitally certifies that the transmission was correct—the root hash nodes match—or it progresses until the corrupted or missing email messages are identified.

Implementation and Protocol Example

TRUSTED ARCHIVE OVERVIEW 3 BLOOMBERG VAULT MICROSOFT EXCHANGE SERVER LOCAL STORE Data Management • Search • Reporting • Analytics Storage Ingestion

PROTOCOL ENGINE PROTOCOL ENGINE

HASH TREE GENERATOR HASH TREE GENERATOR

RECONCILIATION ENGINE RECONCILIATION ENGINE

SEQUENCE GENERATOR SEQUENCE GENERATOR

ARCHIVING RANGE GENERATOR

ADMIN

Reconciliation Client Plugin Reconciliation Protocol Reconciliation Service

Start Reconciliation Data Range Get Node Hash Node Hash Value Stop Reconciliation

(5)

Based on its configuration, the process either alerts the operator to the error condition or starts a retransmission of the missing or corrupted data. At the end of a successful reconciliation process, the client Plugin can safely remove all data stored locally.

We have used the Trusted Archive technology to secure the ingestion of hundreds of terabytes of client data;

it has proven invaluable in maintaining our 24x7 operation with automatic end-to-end reconciliation and in helping our customers stay compliant.

We can build a hash tree at a rate of about 2,840 documents/second on each side (enterprise and cloud); the actual tree comparison across a private line takes less than 10 seconds, including wire delay. For a typical dataset (500,000 messages across

2 machines at 3,000 messages/ second), the reconciliation process takes under 100 seconds.

This compares favorably with the typical process employed in the industry, i.e., to transfer counts, IDs and fingerprints to the archive and loop over the stored objects to compare them—a process, that, at best, takes days to complete and involves manual steps.

The data included in these materials are for illustrative purposes only. ©2014 Bloomberg L.P. All rights reserved. S535654726 1214 DIG

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = BEIJING +86 10 6649 7500 DUBAI +971 4 364 1000 FRANKFURT +49 69 9204 1210 HONG KONG +852 2977 6000 LONDON +44 20 7330 7500 MUMBAI +91 22 6120 3600 NEW YORK +1 212 318 2000 SAN FRANCISCO +1 415 912 2960 SÃO PAULO +55 11 2395 9000 SINGAPORE +65 6212 1000 SYDNEY +61 2 9777 8600 TOKYO +81 3 3201 8900 bloomberg.com/vault TAKE THE NEXT STEP

Learn more about Bloomberg Vault and archive reconciliation at massive scale using Merkel Trees. Contact us at

References

Related documents

In this article, we describe the setup and validation of the Pathology Image Exchange project, which aimed to create a vendor-independent platform for exchange of whole-slide

& Reconciliation Settlement & Reconciliation CLS 3rd Party service Settlement & Reconciliation Settlement & Reconciliation (FIN+FpML) Settlement &

The optimal mix of technical measures and economic restructuring as source of emission reductions is affected by the strictness of environmental policy targets for all themes

The mid-term target of achieving about 85 million tons of accumulated greenhouse gas emission reductions from 2008 to 2013 through a 40% reduction in average electicity consumption

• Buick’s global sales were up 7 percent in the third quarter to 284,540 units and they are up 11 percent calendar year to date to 858,046 units, driven by strong growth in the

Compared to other methods, chemotaxonomic analysis based on High Pressure Liquid Chromatography (HPLC) presents many advantages (e.g., rapidity, reproducibility, and capacity to

LukeTanzer, Managing Director, RiverStone UK, comments “we at RiverStone, like all major run-off companies involved in acquisition opportunities are committed to investing ©

The epistemic agent model shall be the basis for the combination of (non-monotonic) logical knowledge representation and reasoning, belief change theory, abstract agent models