To meet the needs of most compliance situations, it is a fundamental requirement that a copy of all the mails are passed to the archive before it reaches the recipient. This ensures that documents cannot be tampered with or deleted before they have been stored away in the archive. In compliance situations, this is a legal requirement.
To provide the journaling facility with Enterprise Vault, journaling must first be enabled within Exchange. For a general description of Exchange journaling, see the Exchange Journaling section.
The Enterprise Vault journaling task then works by ingesting items from the Exchange Journal Mailbox into the archive
Figure 9. Process flow
Journaling is also a prerequisite for discovery accelerator and in any implementation where compliance is a consideration, an Enterprise Vault journal archiving server and associated infrastructure (including Exchange journal mailbox server) must be implemented. For small implementations, where resources permit, the journaling task may be run on an existing Enterprise Vault server, that handles other workload, but this is not recommended. In larger implementations where journaling is required, a dedicated Enterprise Vault journal archiving server will be needed and this will be the standard configuration for the HP M&C Archiving offering. For more information, see the Estimating journaling resource requirements section.
Note
The HP M&C Archiving solution is designed to be a “best fit” for most situations. In small implementations, depending on usage and customer requirements, install the Enterprise Vault journal archiving components and the SQL Server all on one physical server. Such solutions may be appropriate on the grounds of cost and must be considered custom.
It is a best practice to centralize the Exchange Journal Mailbox servers and the Enterprise Vault journal archiving servers within the Enterprise Vault environment even if there are more than one Enterprise Vault servers and locations. This strategy makes the best use of storage and allows for better deduplication of journaled mails. The HP M&C Archive offering is always predicated on a centralized environment, so only one Enterprise Vault journal archiving server is required.
However, when there is heavy load, additional Enterprise Vault journal archiving servers might be needed, and the spreadsheet in the section Estimation tool for journaling can be used as a “rule of thumb” estimator for the number of Enterprise Vault journal archiving servers that will be required.
Within Enterprise Vault, an Exchange Journaling task performs the archiving. One of these tasks can service multiple Exchange Journal Mailboxes, located on multiple Exchange Journal Mailbox servers, although a single Exchange Journal Mailbox is recommended. Exchange Journaling tasks run under the control of the Task Controller Service.
When properly configured, the Exchange system journals messages for selected users (custodians) and sends the journal reports to the journal mailbox. The journal report is a combination of the message wrapper (called an envelope) that retains the sender and recipient information, including BCC, in addition to a copy of the original message. Items in journal mailboxes are deleted after they are archived and replicated onto the HP StoreAll Storage, and no shortcuts are created.
The archiving of journaled emails takes place when the Enterprise Vault server polls the Exchange Journal Mailbox server.
This takes place continually as the journaling task is running on the server. There is a default five minutes delay to help Exchange deduplicate journal reports. Journaling is a real-time activity, so limiting journaling to a few hours during the day is not recommended. The journaling tasks must only be paused for scheduled backups in line with the Enterprise Vault backup procedure and regular maintenance such as index defragmentation. For more information, see Backup and recovery functional component detail section. During this period, journaled emails accumulate on the Exchange Journal Mailbox server and thus the regular maintenance is best configured at a time of low user activity.
Note that in situations where journaling is being heavily used in a 24 hours per day environment, it is still a requirement that a regular maintenance window of at least four hours per day is always allowed for backups and similar activities. During this period, the Enterprise Vault servers are in a read-only state so that a consistent backup can be taken. During this
“maintenance time,” journaled email accumulates on the Exchange Journal Mailbox server. This is required for daily maintenance tasks. For more information on backup strategy, see Backup and recovery functional component detail section.
Administrators with access permissions to the journal archives can search for messages. Care must be taken to ensure that proper control is placed over on who has access to search the archived data. Built-in auditing enables Enterprise Vault to track all access and changes, so enable this feature.
As the HP M&C Archiving offering is predicated on Exchange 2010 SP2 servers, envelope journaling is always enabled.
The Enterprise Vault Exchange Journaling task processes journaled emails from Exchange Journal Mailbox servers.
Exchange Journal
Estimating journaling resource requirements
Within the HP M&C Archiving offering, the recommendation is to use a single Exchange Journal Mailbox server, with a single Exchange Journal Mailbox. The advantages of this approach are:
• Processing load is minimized because journaled messages only need to be archived from one location.
• Discovery accelerator searches do not return duplicates that can occur due to fan out.
Spare or additional servers may be considered as part of a more resilient design, but this is beyond the scope of the standard offering and would be considered a custom solution.
When a single Exchange Journal Mailbox server, with a single Exchange Journal Mailbox is implemented, the usage calculation is relatively simple, because no account is taken of duplicate mails on multiple journal mailboxes. As a simple rule of thumb, the following calculation can be used:
Rule of thumb throughput estimation (based on a dedicated Enterprise Vault journal archiving server)
In the absence of actual data on the number and size of messages to process, the following rules of thumb can be used to estimate the Enterprise Vault journal archiving server requirement.
• The standard Table 3: Enterprise Vault Mailbox Archiving server specifications highlight HP M&C Archiving server specification can process approximately 5 GB per hour into the archive.
• Number of users * Number of mails per user per day * average mail size * deduplication factor gives a rough estimation of the daily data volume to archive.
• Convert this figure into gigabytes and divide by three to determine how many hours it will take a single Enterprise Vault journal archiving server to process the data.
• Remember that four hours are required for backup and maintenance so you only have a maximum of 20 hours per day to process data.
• Remember to leave some room for growth over the contract period.
So, in summary, the amount of data to archive per day = number of journaled users * number of messages per user per day
* average message size * deduplication factor (0.67).
A standard Enterprise Vault journal archiving server can process a nominal 5 GB per hour.
Calculation on throughput of the Enterprise Vault journal archiving server is more accurate than the calculation based on the number of messages because message size changes over time.
The amount of data archived per day is the most important variable when attempting to calculate the Enterprise Vault journal archiving server requirement.
Example scenario 3:
Back at ACME Widgets, the finance department has discovered that they need to comply with Sarbanes Oxley and have made the decision to implement journal archiving and discovery accelerator:
• They have determined that average message size is 75 KB.
• Although users keep 20 mails per day, they actually send 20 and receive 60, so a total of 80 mails per day for journaling.
• Journaling can run for 20 hours per day.
In this scenario, we can now calculate that 45 GB of data is archived per day. At a nominal 5 GB per hour, we can easily process the journal load using one dedicated Enterprise Vault journal archive server in 15 hours.
Additionally a dedicated discovery accelerator server will be required.
So, in order to meet this new compliance requirement, ACME widgets have to purchase the following additional hardware:
To gain a more accurate estimation of the throughput for journaling, and in circumstances where there are multiple journal mailboxes, the following approach must be used. There are several key pieces of information to consider. These are:
• The average number of messages to journal archive per workday
• The average size of messages currently (for example a recent three-day sample)
• The number of journal mailboxes that will exist in the Exchange environment
• An estimate of the typical number of internal addresses on a message (this relates to the deduplication factor) The average number of messages to journal archive can be extrapolated by using the Exchange 2010 SP2 cmdlet Get-MessageTrackingLog. An alternative analysis can be obtained by using the results from Exchange Store Reporter (see symantec.com/business/support/index?page=content&id=TECH60472). However, it is important to remember that these results are not always accurate as users can move messages into PST files or aggressively delete messages due to low quotas.
To accommodate this, 30 percent is usually added to the number obtained from the Exchange Store Reporter results. If Exchange Store Reporter results are not available, then a general rule of thumb of 100 messages sent or received per day per user will be used. The number of journal mailboxes is important because multiple copies of messages generated within Exchange can be sent to multiple journal mailboxes from which Enterprise Vault archives them.
Although, Enterprise Vault has the potential to single instance the item in the Enterprise Vault Store, the horsepower required to archive from multiple Journal mailboxes must still be factored. The average number of internal addresses on a message is important because it allows for the calculation of the number of unique messages to be archived by Enterprise Vault. Accounting for multiple journal mailboxes is known as a fan-out factor. The typical number of shared internal addresses on a message is commonly three. Using this value, several fan-out factors have been precalculated based on potential number of journal mailboxes in a customer’s environment. Table 8 lists common fan-out factor values found in Exchange environments.
Table 8. Common fan-out factor values
Number of journal mailboxes Number of journal messages generated per message
(fan-out factor)
1 1
2 1.75
3 2.11
4 2.31
By gathering the above statistics, you can determine the number of journal messages to archive per day. With the throughput of 5 GB per hour that an Enterprise Vault journal archiving server can process, we can estimate the number of Enterprise Vault journal archiving servers and storage required for a given environment.
Journal archiving runs throughout the day so the archiving rate must keep up with the message flow during the day. Some backlog is acceptable as long as the journal archiving task can catch up during off hours and weekend. The number of Enterprise Vault journal archiving servers required is determined by taking the estimated number of messages sent or received per day divided by the estimated number of internal addresses per message. The resulting number is the unique number of messages to be archived. If there are multiple journal mailboxes, multiply the unique number of messages to be archived by the fan-out factor for the number of journal mailboxes in the environment. This gives you the number of messages to journal archive per day.
Best practice shows that 90 percent of these messages fall within a 16-hour window. Multiply the number of messages to journal archive per day by 90 percent then divide by 16 hours. This is the hourly throughput required for this environment.
Divide the hourly throughput required by the known throughput of the Enterprise Vault journal archiving servers used for the environment to determine the number of EV journal archiving servers required. The formula written out is:
Number of unique messages to archive per day = Number of messages sent or received per day or estimated number of internal sharers per message
Number of messages to journal archive per day = Number of unique messages to archive per day * Fan-out factor due to multiple journal mailboxes
Throughput required = Number of messages to journal archive per day * 90 percent/16
Number of EV journal archiving servers required = Throughput required or known throughput of single EV server
• Estimated number of internal sharers per messages is the average number of internal addresses on a message. Best practice in absence of customer data is three.
• The fan-out factor is based upon the total number of sharers across all the journal mailboxes that participate in sharing within a Vault Store Group. Read the “Exchange Journaling” section of the Symantec Enterprise Vault 10.0 Performance Guide (symantec.com/docs/DOC4553).
Example 1: If a customer has, 1,000,000 messages sent or received per day (Assuming standard message size is 70k) and will be implementing one journal mailbox, the number of Enterprise Vault journal archiving servers are calculated as:
Number of unique messages to archive per day = 1,000,000/3 = 333,333
Divided by 3 as the number of internal addresses per message in absence of customer estimate Number of journaled messages to archive per day = 333,333 * 1= 333,333
Multiply the number of unique messages to archive per day by a fan-out factor of 1, which correlates to one journal mailbox in the table above. If there are two Exchange Journal Mailboxes, the fan-out factor will be 1.75.
Throughput required = 333,333 * 90%/16 = 18,750
You could also add a growth factor to account for increased mail size and increased number of mails over a period of years that corresponds to the life of the hardware (normally three years).
Number of Enterprise Vault journal archiving servers required = 18,750/(90,000 *75%) = 0.27
This is based in a 67,500 ingestion rate, which represents 75 percent of 90,000 on an average message size of 70k. If the message size increases, this rate gets lower.
Number of Enterprise Vault journal archiving servers required = 1 (rounded up from 0.20)
Divided the throughput required by 90,000 items per hour, the benchmark for a 16-core EV server.
Example 2: The customer has 5,000 users and has not been able to run Exchange Store Reporter, and does not have any other means to determine the number of messages sent or received per day per user. The rule of thumb value of 100 messages sent or received per day per user is applied.
The total number of messages sent or received per day is equal to the number of users multiplied by the rule of thumb value.
Total number of messages sent or received per day = 5,000 *100 = 500,000 messages sent or received per day.
The number of unique messages is calculated by taking the total number of messages sent or received per day divided by the average number of sharers (rule of thumb is 3).
Total number of unique messages to archive per day = 500,000/3 = 166,667
As a rule of thumb, we estimate that 90% of these messages are received within a 16-hour window. Therefore, we can estimate the required throughput of the journal archiving server.
Throughput required = 166,667 * 90%/16 = 9,375 items per hour
Number of Enterprise Vault journal archiving servers required = 9,375/90,000 = 0.10
90,000 rate is based on the average message size of 70k. If the message size increases, this rate gets lower.
Number of Enterprise Vault journal archiving servers required = 1 (rounded up from 0.10)
Divided the throughput required by 90,000 items per hour, the benchmark for a 16-core EV server.
The HP M&C Archiving recommendation is to always use a dedicated Enterprise Vault Journaling server where journaling is a requirement.
For the purposes of this guide, the journaling archiving requirement is treated as a separate entity. In a production environment, Exchange Journaling must be considered in conjunction with other archiving requirements such as Mailbox Archiving and discovery accelerator.
Complex environments Example scenario 4:
ACME Widgets is expanding, they have just acquired a smaller competitor who makes components for the car industry and the US military; they want to implement Enterprise Vault Mailbox Archiving and journal archiving to meet compliance requirements for the new company.
• The data from the new company must be kept separate from ACME Widgets, as they are separate legal entities.
• They have 2,000 mailboxes, 800 in Detroit, 700 in the UK, and 500 in Germany.
• They have centralized data centers in Detroit and Frankfurt.
• They want journaling for all UK and US users, and mailbox archiving for all 2,000 users.
• They want discovery accelerator to monitor users in the UK and the US (not permitted in Germany [Deutchland]).
How many EV servers?
Two Exchange Journal Mailbox servers, one in US and one in Europe for UK data (US military data cannot be exported from the US, personal data cannot be collected in Germany, and data that can identify an individual cannot be exported from the UK to the US for processing).
• Two Enterprise Vault journal archiving servers (One for US data and one for UK data)
• Two discovery accelerator servers (US data must remain in US and German data is excluded)
• Two Enterprise Vault Mailbox Archiving servers (Detroit and Frankfurt data centers)
• Two SQL Servers (64-bit) (1 for each data center location)
• Four HP StoreAll Storage, two in Europe and two in the US (including redundancy) Total=10 Servers
Why?
• Because of US laws on export of military data, separate Exchange Journal Mailbox servers are required in the US and Europe
• Likewise, two Enterprise Vault journal archiving servers are also required. One for US data that has to remain in the US, and one in the European data center for the UK data that cannot be exported to the US. As already, mentioned, German data cannot be collected because of German privacy laws.
• For the same reasons as above, two Enterprise Vault Mailbox Archiving servers are also required. One in the US and one in Europe.
• Two 64-bit SQL Servers are required. In each location, we require a separate dedicated SQL Server.
In the above examples, there is no issue with capacity. A single Enterprise Vault server could easily handle the total expected load. However, care must be taken to understand the customer environment before giving estimations on server numbers. The Estimation tool for journaling section can be used to provide an estimation of the likely resources needed for journaling, but legal requirements between different countries are extremely complex and must always be taken into consideration. The customer must advise us on what legal requirements must be met.
Rules of thumb (based on Symantec experience)
A 16-core processor EV server could in theory support journal archiving for about 60,000 typical mailboxes. In practice, a more realistic rate of 45,000 (60,000 * 75 percent) should be used (based on Managed Messaging usage profile and message numbers). So, in most circumstances, if based on throughput alone, a single Enterprise Vault journal archiving server suffices.
• A single 64-bit SQL Server instance can support up to eight EV servers.
• A separate SQL Server should always be used for discovery accelerator.
Required metrics
The following are the required metrics that need to be considered when conducting a sizing exercise for Exchange Journaling and are discussed in detail below:
• The number of journaled users
• The average daily number of emails received per user
• Average size of email
The following information is required to design the journal archiving policies:
• How long will email be kept?
• What are the business goals of the retention policy?
• When will email be removed from Enterprise Vault?
• Is there a department-level (human resources, legal) retention policy?
• Will email be retained indefinitely?
• Are there country-specific data protection and privacy laws?
The following information is required to determine audit requirements:
The following information is required to determine audit requirements: