Data Distribution Models - Microsoft SQL Server Black Book pdf

Data replication is the result of companies needing to distribute data across a slow WAN link or online transaction processing, which requires that data on

http://www.itknowledge.com/reference/standard/1576101495/ch04/101-106.html (1 of 4) [1/27/2000 6:15:46 PM]

Go!

Keyword

---Go!

separate servers be synchronized at all times. Two models are used for this data distribution: the two-phase commit and replication.

Two-Phase Commit

A two-phase commit is possible with Microsoft SQL Server versions 6.0 and above, although in version 6.0, you must use the API functions and write some fairly advanced code. In version 6.5, the DTC (Distributed Transaction

Coordinator) makes the two-phase commit easy to implement.

Two-phased commits are considered to be “tight consistency.” In other words, there is no latency between data on the separate servers participating in the transactions. This is usually driven by the time-sensitivity of data and the requirement that it be synchronized 100 percent of the time. A common misconception among database administrators is that two-phased commits are the only way to keep multiple servers synchronized. This is simply not true.

Replication also can achieve this purpose and, in some cases, other network-related issues as well, without any user intervention.

Replication Consistency

Replication is considered a looser consistency than the two-phased commit.

This is, in fact, true, but it should not steer you away from using replication in your system design. The main difference between these two methods of moving data is that “real-enough” time data synchronization exists with

replication. This means that, in replication, there is latency between the source and copy databases. The nice thing about replication is the tunable nature of the latency, coupled with built-in fault tolerance.

Microsoft SQL Server replication is a popular method of maintaining report servers, decision-support servers, and distribution of read-only copies of price lists or similar information across many divisions of a company. By

distributing the load across many different servers, you are able to disseminate data across your enterprise quickly and reliably.

Microsoft SQL Server can effectively replicate data with as little as 32MB of RAM. Although Microsoft states that you can publish data with as little as 16MB of RAM, I do not recommend this minimum.

Replication is implemented through the transaction logs contained on your Microsoft SQL Server databases. The beauty of this concept is that only the changes to the database are replicated to the other servers. This reduces traffic on the network and provides a fault-tolerant method of distributing your data.

While this method requires both servers participating in replication to be synchronized before you begin the process, the end result is the same as a two-phased commit, with much less traffic.

Terminology

Before we get into too many of the details about replication, let’s cover the basic terminology involved in Microsoft SQL Server replication. If you are familiar with these terms, you might want to skim this section just to pick up

http://www.itknowledge.com/reference/standard/1576101495/ch04/101-106.html (2 of 4) [1/27/2000 6:15:46 PM]

the finer points. Some of the terms mean exactly what you might expect;

others might be a little confusing.

Publisher

The publisher is the server that contains the source data you wish to distribute to other servers. This is analogous to the publisher of a magazine, who

presides over one central location, putting all the copy, photographs, advertisements, and artwork together into one issue. The publisher then distributes the data to all shippers to put the magazine in the hands of the people who have subscribed to it.

Subscriber

The subscriber is the server that receives the data from the publishing server.

Multiple servers can subscribe to multiple publications in any combination your company requires. For a clear picture, compare this to a magazine subscriber, who can subscribe to many magazines—from one publisher or from many different publishers at the same time.

Distribution Server

The distribution server collects all the publications and disseminates them to any servers that have subscribed to any of the publications. This server can be the same server as the publication server, or it can reside on a totally separate machine that performs the distribution tasks without affecting the publication or production server’s performance. The distribution server can be related to a magazine distribution center. The publisher does not typically mail magazines directly to subscribers, but rather ships them in bulk off to a location that sends the magazines to retail stores and individual subscribers.

Transaction Log

Microsoft SQL Server uses transaction logs to perform many tasks, including replication. The transaction log stores only the changes made to data. Many people think transaction logs are some sort of text file that can be easily read and interpreted. This is not the case, however. The transaction log stores data in encrypted form in a table in the database.

The name of the table that stores the transactions in the database is Syslogs.

For an example of what the transaction log looks like, run the following query against any database on Microsoft SQL Server:

SELECT * FROM SYSLOGS

Two columns will be returned from the query: the transaction data and an operation code that specifies the type of activity that occurred on this record.

The resulting output should look something like the following:

xactid op

--http://www.itknowledge.com/reference/standard/1576101495/ch04/101-106.html (3 of 4) [1/27/2000 6:15:46 PM]

0xd40a07001400 20 0x000000000000 17 0xd40a07001400 39 0xd40a07001400 9 0xd40a07001400 30 0x000000000000 17 0xd30a07001b00 39 0xd30a07001b00 9 Synchronization

Synchronization is the act of ensuring that both the source and target tables that participate in replication are exactly the same. Once the tables are the same, the transactions can then be applied to each table independently, and the tables will still be identical.

Previous Table of Contents Next

http://www.itknowledge.com/reference/standard/1576101495/ch04/101-106.html (4 of 4) [1/27/2000 6:15:46 PM]

Brief Full Advanced Search

Search Tips

To access the contents, click the chapter and section titles.

Microsoft SQL Server Black Book

(Publisher: The Coriolis Group) Author(s): Patrick Dalton ISBN: 1576101495 Publication Date: 07/01/97

Search this book:

Previous Table of Contents Next

Horizontal Partitions

A horizontal partition of a table consists of the rows that you want to publish as articles to your subscribers. Horizontal partitioning is accomplished by using WHERE clauses in your SQL statements. By stating that you wish to publish only those rows that meet certain column values, you can increase replication speed. See Figure 4.1 for an example of a horizontally partitioned table.

Figure 4.1 Horizontal partitioning.

Vertical Partitions

Vertical partitioning of a table is done by specifying in the SELECT statement of your query which columns you want to publish. If you want to publish only two columns of a table, you list only those columns in the SELECT statement.

Because of the limited amount of data being passed between servers, vertical partitioning can greatly enhance replication performance. You can also use vertical partitioning to exclude certain types of columns—for instance, time stamp, nonuser, and system columns—from the replication process. See Figure 4.2 for an example of vertical partitioning.

http://www.itknowledge.com/reference/standard/1576101495/ch04/106-110.html (1 of 4) [1/27/2000 6:15:49 PM]

Go!

Keyword

---Go!

Figure 4.2 Vertical partitioning.

Articles

In SQL, articles are analogous to the conventional meaning of the word. An article is the smallest unit that a subscriber may receive from a publication. In replication, an article is a specific horizontal and/or vertical partition of data found in a table or view. You can make a whole table an article, or you can parse it down, through horizontal and vertical partitioning, to just what the intended audience needs to see.

Publications

A publication is basically a group of one or more articles that is to be

replicated between your servers. On a case-by-case basis, each subscriber can subscribe to the group of articles in a publication in whole or in part,

depending on the needs of the organization. As with magazines, where you need to buy the entire issue even if you only want to read a few of the articles, the base unit of a subscription is a publication, not an article.

Push

The term push refers to the process of establishing a subscription to published information. A push subscription is initiated by the source server and

controlled and created from that location. This subscription method is easy to manage because the server that has the data knows the structures of the articles and what the data should look like. Centralized management of your

subscriptions is also a push-type subscription.

Pull

The opposite of push, a pull subscription is requested from a publisher by the subscriber. With this method of subscription management, users must ask for the correct information from the publisher, as opposed to the publisher telling users what they will get. Pull subscriptions are only convenient for

organizations that want to publish information freely, without control over which publications are used by other organizations.

In document Microsoft SQL Server Black Book pdf (Page 93-98)