Efficient Interval Management in Microsoft SQL Server

(1)

Efficient Interval Management

in Microsoft SQL Server

Itzik Ben-Gan, SolidQ

Last modified: August 13, 2013

Agenda

• Inefficiencies of classic interval handling

• Solution: RI-tree by Kriegel, Pötke and Seidl of University of Munich

• Optimized solution: Static RI-tree by Laurent Martin

• Potential for integration in SQL Server

• References

(2)

Intervals

• Examples for temporal intervals: sessions, contracts, appointments, shifts

• Allen’s interval algebra: 13 base relations

•

http://www.ics.uci.edu/~alspaugh/cls/shr/allen.html

• Common relation: X intersects Y

• Example: return contracts that were active during an input period

• Classic interval representation:

• Table: Intervals( id, lower, upper )

• Indexes:

• idx_lower ON Intervals(lower) INCLUDE(upper)

• idx_upper ON Intervas(upper) INCLUDE(lower)

X

Y

Inefficiencies of Classic Interval Handling

• Return intervals [lower, upper] that intersect with [@l, @u]:

• Classic predicate: WHERE lower <= @u AND upper >= @l

• Problem: two range predicates

• An Index Seek can use only one range predicate as a Seek Predicate

• Other range predicate is evaluated as a [residual] Predicate

• Sensitive to parameter-sniffing

@l

@u

lower

upper





(3)

Relational Interval tree (RI-tree) model

• Paper [RI]: Managing Intervals Efficiently in Object-Relational Databases (Kriegel, Pötke and Seidl

of University of Munich)

http://www.dbs.ifi.lmu.de/Publikationen/Papers/VLDB2000.pdf

Static RI-tree and SQL Server implementation

• Article [SRI1]: A Static Relational Interval Tree (Laurent Martin)

http://www.solidq.com/sqj/Pages/2011-September-Issue/A-Static-Relational-Interval-Tree.aspx

• Article [SRI2]: Advanced interval queries with the Static Relational Interval Tree (Laurent Martin)

http://www.solidq.com/sqj/Pages/Relational/Advanced-interval-queries-with-the-Static-Relational-Interval-Tree.aspx

• Article [SRI3]: Using the Static Relational Interval Tree with time intervals (Laurent Martin)

•

http://www.solidq.com/sqj/Pages/Relational/Using-the-Static-Relational-Interval-Tree-with-time-intervals.aspx

• [RIBG]: Interval Queries in SQL Server (Itzik Ben-Gan)

•

http://sqlmag.com/t-sql/sql-server-interval-queries

* Above sources will be referred to as [RI], [SRI1], [SRI2], [SRI3] and [RIBG] respectively

Work used as Foundation for Solution

Relational Interval tree (RI-tree) [RI]

• Virtual backbone binary tree,

h

eight = num bits, range: 1 – 2

h

-1, root: 2

h-1

• Fork node

: first node within interval when descending the tree with bisection

• Table: Intervals( id,

node

, lower, upper )

• Indexes: idx_lower ON Intervals(node, lower), idx_upper ON Intervals(node, upper)

1 00001 2 00010 3 00011 4 00100 5 00101 6 00110 7 00111 8 01000 9 01001 10 01010 11 01011 12 01100 13 01101 14 01110 15 01111 16 10000 17 10001 18 10010 19 10011 20 10100 21 10101 22 10110 23 10111 24 11000 25 11001 26 11010 27 11011 28 11100 29 11101 30 11110 31 11111 11 13

(4)

• Cons: iterative T-SQL; slows down insertions

Fork Node

CREATE FUNCTION dbo.forkNode(@lower AS INT, @upper AS INT) RETURNS INT WITH SCHEMABINDING

AS BEGIN

DECLARE @node AS INT = 1073741824; -- height = 31 DECLARE @step AS INT = @node / 2;

WHILE @step >= 1 BEGIN

IF @upper < @node SET @node -= @step; ELSE IF @lower > @node

SET @node += @step; ELSE BREAK; SET @step /= 2; END; RETURN @node; END;

Optimized Fork Node [SRI1]

• Compute fork node from lower and upper: lowest common ancestor

• Observations:

• 1: For any value N, L = leading bits before trailing 0s

• 2: For any X starting with L, X’s left subtree nodes start with (L-1), right start with L

• 3: For nonleaf node X and X-1 ancestors are same, so X can be replaced with X-1

• 4: Leaf node Z and Z-1 differ only in last bit; 1 for Z and 0 for Z-1

• Hence...

1 00001 2 00010 3 00011 4 00100 5 00101 6 00110 7 00111 8 01000 9 01001 10 01010 11 01011 12 01100 13 01101 14 01110 15 01111 16 10000 17 10001 18 10010 19 10011 20 10100 21 10101 22 10110 23 10111 24 11000 25 11001 26 11010 27 11011 28 11100 29 11101 30 11110 31 11111 11 13

Example: N = 12(

011

00), L = 011

Example: X = 12 (

011

00)

Left node: 10 (

010

10)

Right node: 14 (

011

10)

Example: 10, 9 same ancestors

Example: 9 (0100

1

), 8 (0100

₀

)

(5)

Optimized Fork Node

• Fork node =

matching prefix of (lower - 1) and upper

||

1

||

0s

• Let A = (lower - 1) ^ upper -- mark different bits 01010 ^ 01101 = 00111

• Let B = POWER(2, FLOOR(LOG(A, 2))) -- first different bit set to 1 like in upper: 00100

• Let C = upper % B -- keep trailing bits from upper after set bit in B: 01

• Let D = upper - C – concat 1 and set trailing bits to 0s, voilà fork node:

01

1

00

• Formula:

upper - upper % POWER(2, CAST(LOG((lower - 1) ^ upper, 2) AS INT))

1 00001 2 00010 3 00011 4 00100 5 00101 6 00110 7 00111 8 01000 9 01001 10 01010 11 01011 12 01100 13 01101 14 01110 15 01111 16 10000 17 10001 18 10010 19 10011 20 10100 21 10101 22 10110 23 10111 24 11000 25 11001 26 11010 27 11011 28 11100 29 11101 30 11110 31 11111 11 13

Not clear?

Details in [RIBG]

Optimized Fork Node

Implemented as a computed column

• Pros:

• Much faster inserts

• Support seamless multi-row modifications

• Cons:

• Complex (even more so with date and time—requires mapping to integers)

CREATE TABLE dbo.IntervalsRIT (

id INT NOT NULL,

node AS upper - upper % POWER(2, FLOOR(LOG((lower - 1) ^ upper, 2))) PERSISTED NOT NULL, lower INT NOT NULL,

upper INT NOT NULL,

CONSTRAINT PK_IntervalsRIT PRIMARY KEY(id),

CONSTRAINT CHK_IntervalsRIT_upper_gteq_lower CHECK(upper >= lower) );

CREATE INDEX idx_lower ON dbo.IntervalsRIT(node, lower); CREATE INDEX idx_upper ON dbo.IntervalsRIT(node, upper);

* Above are basic index definitions to support intersection queries; additional columns may be needed (filters,

included columns)

(6)

Fork node for DATETIME2 data type [SRI3]

node AS DATEADD(ns,

((((CAST(DATEDIFF(hh, CONVERT(DATETIME2, '00010101', 112), upper) AS BIGINT) * 60 + DATEPART(mi, upper)) * 60

+ DATEPART(s, upper)) * 10000000 + DATEPART(ns, upper) / 100

-(((CAST(DATEDIFF(hh, CONVERT(DATETIME2, '00010101', 112), upper) AS BIGINT) * 60 + DATEPART(mi, upper)) * 60

+ DATEPART(s, upper)) * 10000000 + DATEPART(ns, upper) / 100 + 1 ) % POWER(CAST(2 AS BIGINT),

FLOOR(LOG(

(((CAST(DATEDIFF(hh, CONVERT(DATETIME2, '00010101', 112), lower) AS BIGINT) * 60 + DATEPART(mi, lower)) * 60

+ DATEPART(s, lower)) * 10000000 + DATEPART(ns, lower) / 100 ) ^

(((CAST(DATEDIFF(hh, CONVERT(DATETIME2, '00010101', 112), upper) AS BIGINT) * 60 + DATEPART(mi, upper)) * 60

+ DATEPART(s, upper)) * 10000000

+ DATEPART(ns, upper) / 100 + 1)) / LOG(2)))) % 10000000) * 100, …

Fork node for DATETIME2 data type [SRI3]

…

DATEADD(s,

((((CAST(DATEDIFF(hh, CONVERT(DATETIME2, '00010101', 112), upper) AS BIGINT) * 60 + DATEPART(mi, upper)) * 60

+ DATEPART(s, upper)) * 10000000 + DATEPART(ns, upper) / 100

-(((CAST(DATEDIFF(hh, CONVERT(DATETIME2, '00010101', 112), upper) AS BIGINT) * 60 + DATEPART(mi, upper)) * 60

+ DATEPART(s, upper)) * 10000000 + DATEPART(ns, upper) / 100 + 1 ) % POWER(CAST(2 AS BIGINT),

FLOOR(LOG(

(((CAST(DATEDIFF(hh, CONVERT(DATETIME2, '00010101', 112), lower) AS BIGINT) * 60

+ DATEPART(mi, lower)) * 60 + DATEPART(s, lower)) * 10000000 + DATEPART(ns, lower) / 100 ) ^

(((CAST(DATEDIFF(hh, CONVERT(DATETIME2, '00010101', 112), upper) AS BIGINT) * 60

+ DATEPART(mi, upper)) * 60 + DATEPART(s, upper)) * 10000000

+ DATEPART(ns, upper) / 100 + 1)) / LOG(2)))) / 10000000) % 86400, …

(7)

Fork node for DATETIME2 data type [SRI3]

…

DATEADD(d,

(((CAST(DATEDIFF(hh, CONVERT(DATETIME2, '00010101', 112), upper) AS BIGINT) * 60

+ DATEPART(mi, upper)) * 60 + DATEPART(s, upper)) * 10000000 + DATEPART(ns, upper) / 100

-(((CAST(DATEDIFF(hh, CONVERT(DATETIME2, '00010101', 112), upper) AS BIGINT) * 60 + DATEPART(mi, upper)) * 60 + DATEPART(s, upper)) * 10000000 + DATEPART(ns, upper) / 100 + 1 ) % POWER(CAST(2 AS BIGINT), FLOOR(LOG(

(((CAST(DATEDIFF(hh, CONVERT(DATETIME2, '00010101', 112), lower) AS BIGINT) * 60 + DATEPART(mi, lower)) * 60 + DATEPART(s, lower)) * 10000000 + DATEPART(ns, lower) / 100 ) ^ …

Fork node for DATETIME2 data type [SRI3]

…

(((CAST(DATEDIFF(hh, CONVERT(DATETIME2, '00010101', 112), upper) AS BIGINT) * 60

+ DATEPART(mi, upper)) * 60 + DATEPART(s, upper)) * 10000000 + DATEPART(ns, upper) / 100 + 1)) / LOG(2))) ) / CAST(864000000000 AS BIGINT),

CONVERT(DATETIME2, '00010101', 112))))

• Could be much shorter if BIGDATEADD and BIGDATEDIFF were

implemented:

(8)

Querying [RI]

• Left nodes

• Middle nodes

• Right nodes

• All Together

• leftNodes and rightNodes functions

Querying, Left Nodes

• Left nodes: W =

{w on path leading to lower; and w < lower}

• In below example W = {8, 10}

1 00001 2 00010 3 00011 4 00100 5 00101 6 00110 7 00111 8 01000 9 01001 10 01010 11 01011 12 01100 13 01101 14 01110 15 01111 16 10000 17 10001 18 10010 19 10011 20 10100 21 10101 22 10110 23 10111 24 11000 25 11001 26 11010 27 11011 28 11100 29 11101 30 11110 31 11111 11 13

CREATE FUNCTION dbo.leftNodes (@lower AS INT, @upper AS INT) RETURNS @T TABLE

(

node INT NOT NULL PRIMARY KEY )

AS BEGIN

DECLARE @node AS INT = 1073741824; DECLARE @step AS INT = @node / 2; -- descend from root node to lower WHILE @step >= 1

BEGIN

-- right node IF @lower < @node

SET @node -= @step; -- left node

ELSE IF @lower > @node BEGIN

INSERT INTO @T(node) VALUES(@node); SET @node += @step;

END -- lower ELSE BREAK; SET @step /= 2; END; RETURN; END;

(9)

Querying, Left Nodes

• Intervals [lower, upper] that intersect with

input [@l, @u]:

• All intervals registered at w, where upper >= @l

1 00001 2 00010 3 00011 4 00100 5 00101 6 00110 7 00111 8 01000 9 01001 10 01010 11 01011 12 01100 13 01101 14 01110 15 01111 16 10000 17 10001 18 10010 19 10011 20 10100 21 10101 22 10110 23 10111 24 11000 25 11001 26 11010 27 11011 28 11100 29 11101 30 11110 31 11111 @l=11 @u = 13

SELECT I.id

FROM dbo.IntervalsRIT AS I

JOIN dbo.leftNodes(@l, @u) AS L

ON I.node = L.node

AND I.upper >= @l

w

lower = 6 upper = 12

w

Querying, Right Nodes

• Symmetric to Left Nodes (with rightNodes function)

• Intervals that intersect with input [@l, @u]:

• All intervals registered at w, where lower <= @u

1 00001 2 00010 3 00011 4 00100 5 00101 6 00110 7 00111 8 01000 9 01001 10 01010 11 01011 12 01100 13 01101 14 01110 15 01111 16 10000 17 10001 18 10010 19 10011 20 10100 21 10101 22 10110 23 10111 24 11000 25 11001 26 11010 27 11011 28 11100 29 11101 30 11110 31 11111 @l=11 @u = 13

SELECT I.id

JOIN dbo.rightNodes(@l, @u) AS R

ON I.node = R.node

AND I.lower <= @u

w

lower = 13 upper = 15 w

(10)

Querying, Middle Nodes

• Intervals that intersect with input [@l, @u]:

• All intervals registered at w,

where node between @l and @u

1 00001 2 00010 3 00011 4 00100 5 00101 6 00110 7 00111 8 01000 9 01001 10 01010 11 01011 12 01100 13 01101 14 01110 15 01111 16 10000 17 10001 18 10010 19 10011 20 10100 21 10101 22 10110 23 10111 24 11000 25 11001 26 11010 27 11011 28 11100 29 11101 30 11110 31 11111 @l=11 @u = 13

SELECT id

FROM dbo.IntervalsRIT

WHERE node BETWEEN @l AND @u

w

lower = 10 upper = 14

w w

Querying, All Together

SELECT I.id

JOIN dbo.leftNodes(@l, @u) AS L

ON I.node = L.node

AND I.upper >= @l

UNION ALL

SELECT I.id

JOIN dbo.rightNodes(@l, @u) AS R

ON I.node = R.node

AND I.lower <= @u

UNION ALL

SELECT id

WHERE node BETWEEN @l AND @u;

(11)

leftNodes and rightNodes Functions

• Cons:

• Use iterative T-SQL and table variable

• In a join, called per row

• Can be implemented more efficiently

Optimized Ancestors Function [RIBG]

• Ancestors [SRI1]: each step clears rightmost set bit and sets bit to the left to 1

• Left nodes: ancestors where node < @node

SELECT node FROM dbo.Ancestors(13) AS A

WHERE node < 13; -- 8, 12

• Right nodes: ancestors where node > @node

SELECT node FROM dbo.Ancestors(13) AS A

WHERE node > 13; -- 14, 16

BitMasks (n < num bits - 5)

n b1

B3

1 011110

000010

2 011100

000100

3 011000

001000

4 010000

010000

CREATE FUNCTION dbo.Ancestors(@node AS INT)

RETURNS TABLE

AS

RETURN

SELECT @node & b1 | b3 as node -- compute ancestor

FROM dbo.BitMasks

WHERE b3 > @node & -@node; -- b3 > rightmost set bit

Example: n = 13

001101 = 13

001110 = 14

001100 = 12

001000 = 8

010000 = 16

Identify rightmost set bit in 13 (

00110

1

):

• -n: -13 (

11001

1)

keep bits from right until first set bit (

1

)

reverse rest of bits (

00110

becomes

11001

)

• n & -n = n’s rightmost set bit (000001)

Not clear?

Details in [RIBG]

(12)

Optimized Ancestors Function

SELECT I.id

FROM dbo.IntervalsRIT AS I JOIN dbo.Ancestors(@l) AS L

ON L.node < @l AND I.node = L.node AND I.upper >= @l UNION ALL SELECT I.id FROM dbo.IntervalsRIT AS I JOIN dbo.Ancestors(@u) AS R ON R.node > @u AND I.node = R.node AND I.lower <= @u UNION ALL

SELECT id

WHERE node BETWEEN @l AND @u; logical reads: 81 (IntervalsRIT) + 2 (BitMasks), CPU time: 0 ms

Further Optimization [SRI2]

SELECT I.id

FROM dbo.IntervalsRIT AS I JOIN dbo.Ancestors(@l) AS L

ON L.node < @l

AND L.node >= (SELECT MIN(node) FROM dbo.IntervalsRIT) AND I.node = L.node

AND I.upper >= @l UNION ALL SELECT I.id FROM dbo.IntervalsRIT AS I JOIN dbo.Ancestors(@u) AS R ON R.node > @u

AND R.node <= (SELECT MAX(node) FROM dbo.IntervalsRIT) AND I.node = R.node

AND I.lower <= @u UNION ALL

SELECT id

FROM dbo.IntervalsRIT WHERE node BETWEEN @l AND @u;

logical reads: 66 (IntervalsRIT) + 2 (BitMasks), CPU time: 0 ms

• Filter out ancestors

outside of range covered

by table

• Eliminates unnecessary

index seeks

(13)

Optimized Ancestors Function

• Pros:

• More efficient than iterative T-SQL functions

• Doesn’t require a table variable

• Cons:

• A bit complex

Potential for Integration in SQL Server [RIBG]

• Model is complex; can add engine support for model and optimizations

• Indexing:

Internally compute fork node and create two B-tree indexes:

• key([fcol1, fcol2, …,] node, lower[, upper]) [include(icol1, icol2)]

• key([fcol1, fcol2, …,] node, upper[, lower]) [include(icol1, icol2)]

• Querying:

• Optimizer support for detecting interval queries with classic predicates

• Adding declarative SQL with RI-tree engine support

• More efficient native functions (for advanced users who wish to roll their own):

• forkNode, leftNodes, rightNodes, ancestors

• Support integer, as well as date and time types

CREATE INDEX myindex

ON dbo.Intervals[(fcol1, fcol2, ...)]

-- leading equality-based filters

INTERVAL(lower, upper)

-- interval columns

[INCLUDE(icol1, icol2, ...)]

-- included columns

(14)

References

• [RI]: Managing Intervals Efficiently in Object-Relational Databases (Kriegel, Pötke and

Seidl of University of Munich)

•

• [SRI1]: A Static Relational Interval Tree (Laurent Martin)

•

• [SRI2]: Advanced interval queries with the Static Relational Interval Tree (Laurent Martin)

•

https://www.solidq.com/sqj/Pages/Relational/Advanced-interval-queries-with-the-Static-Relational-Interval-Tree.aspx

• [SRI3]: Using the Static Relational Interval Tree with time intervals (Laurent Martin)

•

http://www.solidq.com/sqj/Pages/Relational/Using-the-Static-Relational-Interval-Tree-with-time-intervals.aspx

• [RIBG]: Interval Queries in SQL Server (Itzik Ben-Gan)

•

http://sqlmag.com/t-sql/sql-server-interval-queries

Feature Enhancement Request for Microsoft

• Connect item:

•http://www.ics.uci.edu/~alspaugh/cls/shr/allen.html

http://www.solidq.com/sqj/Pages/Relational/Advanced-interval-queries-with-the-Static-Relational-Interval-Tree.aspx

•

http://www.solidq.com/sqj/Pages/Relational/Using-the-Static-Relational-Interval-Tree-with-time-intervals.aspx

•http://sqlmag.com/t-sql/sql-server-interval-queries

https://connect.microsoft.com/SQLServer/feedback/details/783293

•

https://www.solidq.com/sqj/Pages/Relational/Advanced-interval-queries-with-the-Static-Relational-Interval-Tree.aspx

https://connect.microsoft.com/SQLServer/feedback/details/780746