When Content Reporter loads log data into the database associated with Con- tent Reporter, data is stored with hourly resolution and on a URL basis. This means that all accesses by the same user to a given URL within one hour are aggregated to a single row in the database that contains (amongst other values) an ID number for the URL and a timestamp with a granularity of an hour. This representation is called hourly representation.
For most organizations, it is not necessary to keep data in this format for an extended period of time. Also, with more log data being loaded into the data- base, database growth has to be controlled. To control database growth, Con- tent Reporter can transform data from an hourly representation into a monthly representation.
In the monthly representation, data is organized by sites and with a granularity of one month. This means that all accesses by the same user to a given site within one month are aggregated to a single row in the database.
Reports executed on this representation have a maximum time resolution of one month, and can show which sites have been accessed by a user. It is not possible to generate a report on any single URL based on this representation or on the number of accesses per day.
The advantage of the monthly representation is a considerable reduction in the amount of disk space needed to store the data. Typically, the data volume can be reduced by 80–90%, meaning that 1 GB of hourly data will be reduced to approximately 100 - 200 MBs of monthly data.
The database rollup event that can be configured in Content Reporter de- scribes two distinct but related operations:
1. Converting and copying data from an hourly to a monthly
representation: The first step of the rollup transforms all data cur- rently existing in hourly representation to its monthly representation. This step creates a copy of data existing in hourly representation and inserts the monthly representation of the data into the database.
Note: If “Number of days of hourly data to keep” in theRolluptab is con- figured to a value larger than the number of days since log data collection was started, the rollup process will not result in a reduction of the data volume in the database, as it will only copy additional data from hourly to monthly representation. Therefore, the disk space needed by the data- base would actually grow by 10 - 20%. To avoid unwanted database growth, make sure to configure parameters described below to reason- able values, or to start database rollups only once you want to clean out data from the hourly table (see description in Step 2 below).
2. Deleting data from database: During the second step of the rollup process and depending on your settings in the Database Rollups section of the Rollup tab, Content Reporter will delete data from a number of database tables. In the above-mentioned tab, you can specify how many days of hourly, billing, streaming, exact access detail, SMTP and firewall data, and how many months of monthly data you would like to keep stored in your database.
This is done to avoid the necessity of deleting a huge amount of data in one operation, which will occur if a lot of data has been collected and the deletion interval is too large.
A deletion in intervals of an hour or less can, of course, only be performed for data that is collected on an hourly basis or in even smaller time inter- vals.
So, it is applied to data that was configured for deletion in the following ways: Delete hourly data that is older than ... days, Delete exact
access detail data that is older than ... days, Delete SMTP data that is older than ... daysandDelete firewall data that is older
than ... days.
Note: All data that is older than the specified number of days (hourly, billing, streaming, exact access detail, SMTP and firewall data) or months (monthly data) will be unrecoverably deleted from the database (unless a database backup is available).
The default settings in the Database Rollups section of 210 days for hourly data will allow you to access at least 6 months’ worth of hourly data at any given point in time. Basically, database rollups create a “sliding window” or data available for reports. In most production environments, hourly data is only needed for about 6 months.
Using database rollups settings, you can get rid of unwanted data on a regular basis and, at the same time, have the Content Reporter server store more information without your database continuing to grow in size. The database rollup activity is a scheduled database background task. Cleanup of data in this category can be managed through the Rollup fea- ture, so that no manual execution of SQL statements is required.
Note: If you disable database rollups, your database will grow in size at approximately the rate of log file processing. Also, if you disable rollups, the Content Reporter server will never update the monthly database.