Efficient Interval Management
in Microsoft SQL Server
Itzik Ben-Gan, SolidQ
Last modified: August 13, 2013
Agenda
• Inefficiencies of classic interval handling
• Solution: RI-tree by Kriegel, Pötke and Seidl of University of Munich
• Optimized solution: Static RI-tree by Laurent Martin
• Potential for integration in SQL Server
• References
Intervals
• Examples for temporal intervals: sessions, contracts, appointments, shifts
• Allen’s interval algebra: 13 base relations
•
http://www.ics.uci.edu/~alspaugh/cls/shr/allen.html
• Common relation: X intersects Y
• Example: return contracts that were active during an input period
• Classic interval representation:
• Table: Intervals( id, lower, upper )
• Indexes:
• idx_lower ON Intervals(lower) INCLUDE(upper)
• idx_upper ON Intervas(upper) INCLUDE(lower)
X
Y
Inefficiencies of Classic Interval Handling
• Return intervals [lower, upper] that intersect with [@l, @u]:
• Classic predicate: WHERE lower <= @u AND upper >= @l
• Problem: two range predicates
• An Index Seek can use only one range predicate as a Seek Predicate
• Other range predicate is evaluated as a [residual] Predicate
• Sensitive to parameter-sniffing
@l
@u
lower
upper
Relational Interval tree (RI-tree) model
• Paper [RI]: Managing Intervals Efficiently in Object-Relational Databases (Kriegel, Pötke and Seidl
of University of Munich)
http://www.dbs.ifi.lmu.de/Publikationen/Papers/VLDB2000.pdf
Static RI-tree and SQL Server implementation
• Article [SRI1]: A Static Relational Interval Tree (Laurent Martin)
http://www.solidq.com/sqj/Pages/2011-September-Issue/A-Static-Relational-Interval-Tree.aspx
• Article [SRI2]: Advanced interval queries with the Static Relational Interval Tree (Laurent Martin)
http://www.solidq.com/sqj/Pages/Relational/Advanced-interval-queries-with-the-Static-Relational-Interval-Tree.aspx
• Article [SRI3]: Using the Static Relational Interval Tree with time intervals (Laurent Martin)
•
http://www.solidq.com/sqj/Pages/Relational/Using-the-Static-Relational-Interval-Tree-with-time-intervals.aspx
• [RIBG]: Interval Queries in SQL Server (Itzik Ben-Gan)
•
http://sqlmag.com/t-sql/sql-server-interval-queries
* Above sources will be referred to as [RI], [SRI1], [SRI2], [SRI3] and [RIBG] respectively
Work used as Foundation for Solution
Relational Interval tree (RI-tree) [RI]
• Virtual backbone binary tree,
h
eight = num bits, range: 1 – 2
h
-1, root: 2
h-1
• Fork node
: first node within interval when descending the tree with bisection
• Table: Intervals( id,
node
, lower, upper )
• Indexes: idx_lower ON Intervals(node, lower), idx_upper ON Intervals(node, upper)
1 00001 2 00010 3 00011 4 00100 5 00101 6 00110 7 00111 8 01000 9 01001 10 01010 11 01011 12 01100 13 01101 14 01110 15 01111 16 10000 17 10001 18 10010 19 10011 20 10100 21 10101 22 10110 23 10111 24 11000 25 11001 26 11010 27 11011 28 11100 29 11101 30 11110 31 11111 11 13
• Cons: iterative T-SQL; slows down insertions
Fork Node
CREATE FUNCTION dbo.forkNode(@lower AS INT, @upper AS INT) RETURNS INT WITH SCHEMABINDING
AS BEGIN
DECLARE @node AS INT = 1073741824; -- height = 31 DECLARE @step AS INT = @node / 2;
WHILE @step >= 1 BEGIN
IF @upper < @node SET @node -= @step; ELSE IF @lower > @node
SET @node += @step; ELSE BREAK; SET @step /= 2; END; RETURN @node; END;
Optimized Fork Node [SRI1]
• Compute fork node from lower and upper: lowest common ancestor
• Observations:
• 1: For any value N, L = leading bits before trailing 0s
• 2: For any X starting with L, X’s left subtree nodes start with (L-1), right start with L
• 3: For nonleaf node X and X-1 ancestors are same, so X can be replaced with X-1
• 4: Leaf node Z and Z-1 differ only in last bit; 1 for Z and 0 for Z-1
• Hence...
1 00001 2 00010 3 00011 4 00100 5 00101 6 00110 7 00111 8 01000 9 01001 10 01010 11 01011 12 01100 13 01101 14 01110 15 01111 16 10000 17 10001 18 10010 19 10011 20 10100 21 10101 22 10110 23 10111 24 11000 25 11001 26 11010 27 11011 28 11100 29 11101 30 11110 31 11111 11 13Example: N = 12(
011
00), L = 011
Example: X = 12 (
011
00)
Left node: 10 (
010
10)
Right node: 14 (
011
10)
Example: 10, 9 same ancestors
Example: 9 (0100
1
), 8 (0100
0
)
Optimized Fork Node
• Fork node =
matching prefix of (lower - 1) and upper
||
1
||
0s
• Let A = (lower - 1) ^ upper -- mark different bits 01010 ^ 01101 = 00111
• Let B = POWER(2, FLOOR(LOG(A, 2))) -- first different bit set to 1 like in upper: 00100
• Let C = upper % B -- keep trailing bits from upper after set bit in B: 01
• Let D = upper - C – concat 1 and set trailing bits to 0s, voilà fork node:
01
1
00
• Formula:
upper - upper % POWER(2, CAST(LOG((lower - 1) ^ upper, 2) AS INT))
1 00001 2 00010 3 00011 4 00100 5 00101 6 00110 7 00111 8 01000 9 01001 10 01010 11 01011 12 01100 13 01101 14 01110 15 01111 16 10000 17 10001 18 10010 19 10011 20 10100 21 10101 22 10110 23 10111 24 11000 25 11001 26 11010 27 11011 28 11100 29 11101 30 11110 31 11111 11 13
Not clear?
Details in [RIBG]
Optimized Fork Node
Implemented as a computed column
• Pros:
• Much faster inserts
• Support seamless multi-row modifications
• Cons:
• Complex (even more so with date and time—requires mapping to integers)
CREATE TABLE dbo.IntervalsRIT (
id INT NOT NULL,
node AS upper - upper % POWER(2, FLOOR(LOG((lower - 1) ^ upper, 2))) PERSISTED NOT NULL, lower INT NOT NULL,
upper INT NOT NULL,
CONSTRAINT PK_IntervalsRIT PRIMARY KEY(id),
CONSTRAINT CHK_IntervalsRIT_upper_gteq_lower CHECK(upper >= lower) );
CREATE INDEX idx_lower ON dbo.IntervalsRIT(node, lower); CREATE INDEX idx_upper ON dbo.IntervalsRIT(node, upper);
* Above are basic index definitions to support intersection queries; additional columns may be needed (filters,
included columns)
Fork node for DATETIME2 data type [SRI3]
node AS DATEADD(ns,
((((CAST(DATEDIFF(hh, CONVERT(DATETIME2, '00010101', 112), upper) AS BIGINT) * 60 + DATEPART(mi, upper)) * 60
+ DATEPART(s, upper)) * 10000000 + DATEPART(ns, upper) / 100
-(((CAST(DATEDIFF(hh, CONVERT(DATETIME2, '00010101', 112), upper) AS BIGINT) * 60 + DATEPART(mi, upper)) * 60
+ DATEPART(s, upper)) * 10000000 + DATEPART(ns, upper) / 100 + 1 ) % POWER(CAST(2 AS BIGINT),
FLOOR(LOG(
(((CAST(DATEDIFF(hh, CONVERT(DATETIME2, '00010101', 112), lower) AS BIGINT) * 60 + DATEPART(mi, lower)) * 60
+ DATEPART(s, lower)) * 10000000 + DATEPART(ns, lower) / 100 ) ^
(((CAST(DATEDIFF(hh, CONVERT(DATETIME2, '00010101', 112), upper) AS BIGINT) * 60 + DATEPART(mi, upper)) * 60
+ DATEPART(s, upper)) * 10000000
+ DATEPART(ns, upper) / 100 + 1)) / LOG(2)))) % 10000000) * 100, …
Fork node for DATETIME2 data type [SRI3]
…
DATEADD(s,
((((CAST(DATEDIFF(hh, CONVERT(DATETIME2, '00010101', 112), upper) AS BIGINT) * 60 + DATEPART(mi, upper)) * 60
+ DATEPART(s, upper)) * 10000000 + DATEPART(ns, upper) / 100
-(((CAST(DATEDIFF(hh, CONVERT(DATETIME2, '00010101', 112), upper) AS BIGINT) * 60 + DATEPART(mi, upper)) * 60
+ DATEPART(s, upper)) * 10000000 + DATEPART(ns, upper) / 100 + 1 ) % POWER(CAST(2 AS BIGINT),
FLOOR(LOG(
(((CAST(DATEDIFF(hh, CONVERT(DATETIME2, '00010101', 112), lower) AS BIGINT) * 60
+ DATEPART(mi, lower)) * 60 + DATEPART(s, lower)) * 10000000 + DATEPART(ns, lower) / 100 ) ^
(((CAST(DATEDIFF(hh, CONVERT(DATETIME2, '00010101', 112), upper) AS BIGINT) * 60
+ DATEPART(mi, upper)) * 60 + DATEPART(s, upper)) * 10000000
+ DATEPART(ns, upper) / 100 + 1)) / LOG(2)))) / 10000000) % 86400, …
Fork node for DATETIME2 data type [SRI3]
…
DATEADD(d,
(((CAST(DATEDIFF(hh, CONVERT(DATETIME2, '00010101', 112), upper) AS BIGINT) * 60
+ DATEPART(mi, upper)) * 60 + DATEPART(s, upper)) * 10000000 + DATEPART(ns, upper) / 100
-(((CAST(DATEDIFF(hh, CONVERT(DATETIME2, '00010101', 112), upper) AS BIGINT) * 60 + DATEPART(mi, upper)) * 60 + DATEPART(s, upper)) * 10000000 + DATEPART(ns, upper) / 100 + 1 ) % POWER(CAST(2 AS BIGINT), FLOOR(LOG(
(((CAST(DATEDIFF(hh, CONVERT(DATETIME2, '00010101', 112), lower) AS BIGINT) * 60 + DATEPART(mi, lower)) * 60 + DATEPART(s, lower)) * 10000000 + DATEPART(ns, lower) / 100 ) ^ …
Fork node for DATETIME2 data type [SRI3]
…
(((CAST(DATEDIFF(hh, CONVERT(DATETIME2, '00010101', 112), upper) AS BIGINT) * 60
+ DATEPART(mi, upper)) * 60 + DATEPART(s, upper)) * 10000000 + DATEPART(ns, upper) / 100 + 1)) / LOG(2))) ) / CAST(864000000000 AS BIGINT),
CONVERT(DATETIME2, '00010101', 112))))
• Could be much shorter if BIGDATEADD and BIGDATEDIFF were
implemented:
Querying [RI]
• Left nodes
• Middle nodes
• Right nodes
• All Together
• leftNodes and rightNodes functions
Querying, Left Nodes
• Left nodes: W =
{w on path leading to lower; and w < lower}
• In below example W = {8, 10}
1 00001 2 00010 3 00011 4 00100 5 00101 6 00110 7 00111 8 01000 9 01001 10 01010 11 01011 12 01100 13 01101 14 01110 15 01111 16 10000 17 10001 18 10010 19 10011 20 10100 21 10101 22 10110 23 10111 24 11000 25 11001 26 11010 27 11011 28 11100 29 11101 30 11110 31 11111 11 13CREATE FUNCTION dbo.leftNodes (@lower AS INT, @upper AS INT) RETURNS @T TABLE
(
node INT NOT NULL PRIMARY KEY )
AS BEGIN
DECLARE @node AS INT = 1073741824; DECLARE @step AS INT = @node / 2; -- descend from root node to lower WHILE @step >= 1
BEGIN
-- right node IF @lower < @node
SET @node -= @step; -- left node
ELSE IF @lower > @node BEGIN
INSERT INTO @T(node) VALUES(@node); SET @node += @step;
END -- lower ELSE BREAK; SET @step /= 2; END; RETURN; END;
Querying, Left Nodes
• Intervals [lower, upper] that intersect with
input [@l, @u]:
• All intervals registered at w, where upper >= @l
1 00001 2 00010 3 00011 4 00100 5 00101 6 00110 7 00111 8 01000 9 01001 10 01010 11 01011 12 01100 13 01101 14 01110 15 01111 16 10000 17 10001 18 10010 19 10011 20 10100 21 10101 22 10110 23 10111 24 11000 25 11001 26 11010 27 11011 28 11100 29 11101 30 11110 31 11111 @l=11 @u = 13
SELECT I.id
FROM dbo.IntervalsRIT AS I
JOIN dbo.leftNodes(@l, @u) AS L
ON I.node = L.node
AND I.upper >= @l
w
lower = 6 upper = 12
w
Querying, Right Nodes
• Symmetric to Left Nodes (with rightNodes function)
• Intervals that intersect with input [@l, @u]:
• All intervals registered at w, where lower <= @u
1 00001 2 00010 3 00011 4 00100 5 00101 6 00110 7 00111 8 01000 9 01001 10 01010 11 01011 12 01100 13 01101 14 01110 15 01111 16 10000 17 10001 18 10010 19 10011 20 10100 21 10101 22 10110 23 10111 24 11000 25 11001 26 11010 27 11011 28 11100 29 11101 30 11110 31 11111 @l=11 @u = 13
SELECT I.id
FROM dbo.IntervalsRIT AS I
JOIN dbo.rightNodes(@l, @u) AS R
ON I.node = R.node
AND I.lower <= @u
w
lower = 13 upper = 15 w
Querying, Middle Nodes
• Intervals that intersect with input [@l, @u]:
• All intervals registered at w,
where node between @l and @u
1 00001 2 00010 3 00011 4 00100 5 00101 6 00110 7 00111 8 01000 9 01001 10 01010 11 01011 12 01100 13 01101 14 01110 15 01111 16 10000 17 10001 18 10010 19 10011 20 10100 21 10101 22 10110 23 10111 24 11000 25 11001 26 11010 27 11011 28 11100 29 11101 30 11110 31 11111 @l=11 @u = 13
SELECT id
FROM dbo.IntervalsRIT
WHERE node BETWEEN @l AND @u
w
lower = 10 upper = 14
w w
Querying, All Together
SELECT I.id
FROM dbo.IntervalsRIT AS I
JOIN dbo.leftNodes(@l, @u) AS L
ON I.node = L.node
AND I.upper >= @l
UNION ALL
SELECT I.id
FROM dbo.IntervalsRIT AS I
JOIN dbo.rightNodes(@l, @u) AS R
ON I.node = R.node
AND I.lower <= @u
UNION ALL
SELECT id
FROM dbo.IntervalsRIT
WHERE node BETWEEN @l AND @u;
leftNodes and rightNodes Functions
• Cons:
• Use iterative T-SQL and table variable
• In a join, called per row
• Can be implemented more efficiently
Optimized Ancestors Function [RIBG]
• Ancestors [SRI1]: each step clears rightmost set bit and sets bit to the left to 1
• Left nodes: ancestors where node < @node
SELECT node FROM dbo.Ancestors(13) AS A
WHERE node < 13; -- 8, 12
• Right nodes: ancestors where node > @node
SELECT node FROM dbo.Ancestors(13) AS A
WHERE node > 13; -- 14, 16
BitMasks (n < num bits - 5)
n b1
B3
1 011110
000010
2 011100
000100
3 011000
001000
4 010000
010000
CREATE FUNCTION dbo.Ancestors(@node AS INT)
RETURNS TABLE
AS
RETURN
SELECT @node & b1 | b3 as node -- compute ancestor
FROM dbo.BitMasks
WHERE b3 > @node & -@node; -- b3 > rightmost set bit
Example: n = 13
001101 = 13
001110 = 14
001100 = 12
001000 = 8
010000 = 16
Identify rightmost set bit in 13 (
00110
1
):
• -n: -13 (
11001
1)
keep bits from right until first set bit (
1
)
reverse rest of bits (
00110
becomes
11001
)
• n & -n = n’s rightmost set bit (000001)
Not clear?
Details in [RIBG]
Optimized Ancestors Function
SELECT I.idFROM dbo.IntervalsRIT AS I JOIN dbo.Ancestors(@l) AS L
ON L.node < @l AND I.node = L.node AND I.upper >= @l UNION ALL SELECT I.id FROM dbo.IntervalsRIT AS I JOIN dbo.Ancestors(@u) AS R ON R.node > @u AND I.node = R.node AND I.lower <= @u UNION ALL
SELECT id
FROM dbo.IntervalsRIT
WHERE node BETWEEN @l AND @u; logical reads: 81 (IntervalsRIT) + 2 (BitMasks), CPU time: 0 ms
Further Optimization [SRI2]
SELECT I.idFROM dbo.IntervalsRIT AS I JOIN dbo.Ancestors(@l) AS L
ON L.node < @l
AND L.node >= (SELECT MIN(node) FROM dbo.IntervalsRIT) AND I.node = L.node
AND I.upper >= @l UNION ALL SELECT I.id FROM dbo.IntervalsRIT AS I JOIN dbo.Ancestors(@u) AS R ON R.node > @u
AND R.node <= (SELECT MAX(node) FROM dbo.IntervalsRIT) AND I.node = R.node
AND I.lower <= @u UNION ALL
SELECT id
FROM dbo.IntervalsRIT WHERE node BETWEEN @l AND @u;
logical reads: 66 (IntervalsRIT) + 2 (BitMasks), CPU time: 0 ms
• Filter out ancestors
outside of range covered
by table
• Eliminates unnecessary
index seeks
Optimized Ancestors Function
• Pros:
• More efficient than iterative T-SQL functions
• Doesn’t require a table variable
• Cons:
• A bit complex
Potential for Integration in SQL Server [RIBG]
• Model is complex; can add engine support for model and optimizations
• Indexing:
Internally compute fork node and create two B-tree indexes:
• key([fcol1, fcol2, …,] node, lower[, upper]) [include(icol1, icol2)]
• key([fcol1, fcol2, …,] node, upper[, lower]) [include(icol1, icol2)]
• Querying:
• Optimizer support for detecting interval queries with classic predicates
• Adding declarative SQL with RI-tree engine support
• More efficient native functions (for advanced users who wish to roll their own):
• forkNode, leftNodes, rightNodes, ancestors
• Support integer, as well as date and time types
CREATE INDEX myindex
ON dbo.Intervals[(fcol1, fcol2, ...)]
-- leading equality-based filters
INTERVAL(lower, upper)
-- interval columns
[INCLUDE(icol1, icol2, ...)]
-- included columns
References
• [RI]: Managing Intervals Efficiently in Object-Relational Databases (Kriegel, Pötke and
Seidl of University of Munich)
•
http://www.dbs.ifi.lmu.de/Publikationen/Papers/VLDB2000.pdf
• [SRI1]: A Static Relational Interval Tree (Laurent Martin)
•
http://www.solidq.com/sqj/Pages/2011-September-Issue/A-Static-Relational-Interval-Tree.aspx
• [SRI2]: Advanced interval queries with the Static Relational Interval Tree (Laurent Martin)
•
https://www.solidq.com/sqj/Pages/Relational/Advanced-interval-queries-with-the-Static-Relational-Interval-Tree.aspx
• [SRI3]: Using the Static Relational Interval Tree with time intervals (Laurent Martin)
•
http://www.solidq.com/sqj/Pages/Relational/Using-the-Static-Relational-Interval-Tree-with-time-intervals.aspx
• [RIBG]: Interval Queries in SQL Server (Itzik Ben-Gan)
•
http://sqlmag.com/t-sql/sql-server-interval-queries
Feature Enhancement Request for Microsoft
• Connect item: