Ny databaseteknologi til enterprise intelligence
SAS
Intelligence
Database
RDBMS
Information
•
Fra store datamængder
•
Meget hurtigt
•
Til mange samtidige brugere
Skalerbarhed gennem
Ny databaseteknologi til enterprise intelligence
SAS
Intelligence
Database
RDBMS
Eliminering af
transaktionsorienteret
overhead i RDBMS
Ny databaseteknologi:
•
Organiser data optimalt
Ny databaseteknologi til enterprise intelligence
SAS
Intelligence
Database
RDBMS
Mange samtidige
Lille opgave
Stor opgave
Udnyt alt hardware fuldt ud
Designet til opgaven!
Cykel
Ny databaseteknologi til enterprise intelligence
SAS
Intelligence
Database
RDBMS
Eliminering af transaktionsorienteret overhead
Optimal udnyttelse af hardwaren
9
Reduktion af diskforbrug
2-4 gange mindre
9
Hurtigere svartid
op til 40 gange hurtigere
SAS Intelligence Database
Flere data, queries, brugere
Svartid
10
8
6
2
0
RDBMS
Database kun til enterprise intelligence
Hurtig
Pålidelig og robust
•
24/7, dedikeret database
Skalerbar
•
Opnå konstant svartid
•
Med vækst i data, forespørgsler og
brugere
Lav “total cost of ownership”
•
Reduceret hardware krav
•
Udnytter hardware optimalt
SAS IDB
4 CPU
6 CPU
2 CPU
SAS Intelligence Database
Teknikken bag den nye database
4.
Connectivity – Data & klienter
5.
Metadata, administration og
SAS
®
Intelligence Platform
1.
Partitionering af data
2.
Parallel processing
3.
Index
CPU
CPU
CPU
CPU
Operating system
Application
Application
RAM
I/O
controller
I/O
controller
SAS Intelligence Database
Partitionering af data
Metadata
Data
Traditionel tabel
SAS IDB tabel
Descriptor
Data
Data
System
Table
SAS Intelligence Database
Partitionering af data
Metadata
Data
SAS IDB tabel
CPU
CPU
CPU
CPU
Operating system
Application
Application
Application
RAM
I/O
controller
I/O
controller
SAS Intelligence Database
Partitionering af data
Metadata
Data
SAS IDB tabel
CPU
CPU
CPU
CPU
Operating system
Application
Application
Application
RAM
I/O
controller
I/O
controller
SAS Intelligence Database
Partitionering af data – Cluster-database
SAS
IDB-tabeller
ETL-processer
parallelt
Data A
Load-proces
Fact A
Fact B
Udfør
cluster-program
Load-proces
Data B
Fact C
Load-proces
Data C
Fact D
Load-proces
Data D
SAS Intelligence Database
Partitionering af data – Cluster-database
Cluster
SAS
IDB-tabel
Data A
Load proces
Load proces
Load proces
Load proces
Fact A
Fact C
Fact B
Fact D
Data B
Data C
Data D
ETL-processer
parallelt
Udfører
cluster-program
proc spdo library = mylib;
cluster create
ThirdQuarter2003
mem = Oct2006
mem = Nov2006
mem = Dec2006;
quit;
SAS Intelligence Database
Partitionering af data – Cluster-database
Cluster
SAS
IDB-tabel
ETL-processer
parallelt
Data A
Load-proces
Load-proces
Load-proces
Load-proces
Fact A
Fact C
Fact B
Fact D
Data B
Data C
Data D
Udfører
un-cluster-program
SAS Intelligence Database
Partitionering af data – Cluster-database
SAS
IDB-tabeller
ETL-processer
parallelt
Data A
Load-proces
Load-proces
Load-proces
Load-proces
Fact A
Fact C
Fact B
Fact D
Udfører
un-cluster-program
proc spdo library = mylib;
Uncluster TirdQuarter2003;
quit;
Data B
Data C
SAS Intelligence Database
Partitionering af data – Cluster-database
Tidsbaseret cluster-database
SAS IDB-tabel
Data
Metadata
Cluster SAS IDB-tabel
Table2
Table3
Cluster-metadata
Table1
Table4
Table5
Table6
Table7
Table8
SAS Intelligence Database
Partitionering af data – Cluster-database
Lynhurtig opdatering og oprydning
Tidsbaseret cluster SAS IDB-database
Forbered nye SAS IDB-tabeller
Kør un-cluster-program
Kør cluster-program
•
Nye SAS IDB-tabeller
Påvirker ikke
den kørende
SAS
IDB-database
Der skal kun
dannes nye
cluster-metadata
SAS Intelligence Database
Partitionering af data – Dynamisk cluster-database
Forbered nye tabeller
Udfør cluster add
Cluster-metadata
Jan 2003
Feb 2003
Apr 2003
Jul 2003
Aug 2003
Mar 2003
May 2003
Jun 2003
Sep 2003
Oct 2003
Nov 2003
Dec 2003
Jan 2004
Feb 2004
Apr 2004
Jul 2004
Aug 2004
Mar 2004
May 2004
Jun 2004
Sep 2004
Oct 2004
Nov 2004
Dec 2004
Jan 2005
Feb 2005
Mar 2005
Apr 2005
Jan 2005
Feb 2005
Mar 2005
Apr 2005
May 2005
Jun 2005
Cluster Metadata
Jan 2003
Feb 2003
Apr 2003
Jul 2003
Aug 2003
Mar 2003
May 2003
Jun 2003
Sep 2003
Oct 2003
Nov 2003
Dec 2003
Jan 2004
Feb 2004
Apr 2004
Jul 2004
Aug 2004
Mar 2004
May 2004
Jun 2004
Sep 2004
Oct 2004
Nov 2004
Dec 2004
Cluster-metadata
Jan 2003
Feb 2003
Apr 2003
Jul 2003
Aug 2003
Mar 2003
May 2003
Jun 2003
Sep 2003
Oct 2003
Nov 2003
Dec 2003
Jan 2004
Feb 2004
Apr 2004
Jul 2004
Aug 2004
Mar 2004
May 2004
Jun 2004
Jan 2005
Feb 2005
Mar 2005
Apr 2005
May 2005
SAS Intelligence Database
Partitionering af data – Dynamisk cluster-database
SAS Intelligence Database
Teknikken bag den nye database
CPU
CPU
CPU
CPU
Operating system
Application
Application
RAM
I/O
controller
I/O
controller
1.
Partitionering af data
2.
Parallel processing
3.
Index
Cluster-database
4.
Connectivity – Data & klienter
5.
Metadata, administration og
SAS Intelligence Database
Teknikken bag den nye database
1.
Partitionering af data
2.
Parallel processing
3.
Index
4.
Connectivity – Data & klienter
5.
Metadata, administration og
SAS
®
Intelligence Platform
CPU
CPU
CPU
CPU
Operating system
Application
Application
RAM
I/O
controller
I/O
controller
SAS Intelligence Database
Parallel processing
Optimal hardwareudnyttelse
Indbygget parallel programlogik:
•
Where processing
•
Sorting
•
Group by
•
Table join
•
Multi-index builds & updates
SAS Intelligence Database
Parallel processing
Thread 1 Thread 2 Thread 3 Thread 4
1
6
2
7
3
5
4
8
Partielle datafiler
WHERE, KEEP, SORT,
SUMMARIZE, GROUP BY …
Partielle resultater
Samling af delresultater
SAS Intelligence Database
Parallel processing – Table join
A
A
A
A
B
B
B B
Threaded Sort on Table A
A1 A2 A3 A4
B1 B2 B3 B4
Threaded Sort on Table B
A1 B1 A2 B2 A3 B4 A4 B4
Thread 1
Thread 2
Thread 3
Thread 4
Parallel sortering
af partitionerede
datafiler
Tabel A og B sorteret
SAS Intelligence Database
Parallel processing – Index creation
Partitionerede datafiler
Parallel dannelse af index
Index-dannet
SAS Intelligence Database
Avanceret hybrid-index-teknologi
Metadata
Data
Traditionel tabel
SAS IDB-tabel
Descriptor
Data
Index
Metadata
Parallel
index-evaluering
Flere tråde evaluerer
WHERE samtidigt
Index-metadata
Index-segmenter
SAS Intelligence Database
Avanceret hybrid-index-teknologi
Index
column_a
Segment 2
Index-
meta-data
Parallel index-evaluering
Flere tråde evaluerer WHERE samtidigt
Index-metadata og segmenter
Segment 1
Segment 4
Segment 3
Segment 5
Segment 6
Segment 7
Segment 8
Segment 2
Segment 3
Segment 4
Segment 5
where column_a
in ('A','
B
') ;
Cached index-metadata
•
MIN, MAX, COUNT, COUNT DISTINCT
•
NMISS, RANGE
Undgå fuld table scan
•
Select count(*) from table;
“Index-statistiktabel”
SAS Intelligence Database
Avanceret hybrid-index-teknologi
Metadata
Data
SAS IDB-tabel
Metadata
Index
X
SAS Intelligence Database
Adgang til data fra enhver klient
SAS
®
-klient
Libname statement
•
Åbner adgang fra alle SAS-klienter
•
Transparent …… som enhver anden database
libname mylib sasspds "class"
server=localhost.5200
user='student' passwd=
'
Metadata0';
Datatype: SAS Intelligence Database
SAS Intelligence Database
Adgang til data fra enhver klient
SAS
®
-klient
SQL pass-through facility
•
SQL udføres på server og ikke klient
•
SQL udføres i SAS Intelligence Database
proc sql;
select *
from connection to sasspds
( . . . sasspds kode . . . );
SAS Intelligence Database
Adgang til data fra enhver klient
Øvrige klienter:
ODBC
Windows-klient
JDBC
Java-program med web-interface
htmlSQL
Webapplikation
Informationsbruger
Avanceret bruger / analytiker
Forretningsadministrator
Datawarehouse
Web server
Server tier
Eksterne
datakilder
Dataintegrations-ekspert
Teknisk
administrator
Metadata
SAS
®
Intelligence Platform – Rollebaserede applikationer
SAS Intelligence
Danner og
opdaterer SAS IDB
Brugerstyring,
logning, integration
SAS
®
Intelligence Platform
SAS
®
Management Console – Administration
Administration af alle
ressourcer
Servere
Databaser
User, group, rolle
Rapporter
SAS
®
Stored Processes
SAS
®
-licenser
Job schedulering
SAS Intelligence Database
Teknikken bag den nye database
4.
Connectivity – Data & klienter
5.
Metadata, administration og
SAS
®
Intelligence Platform
1.
Partitionering af data
2.
Parallel processing
3.
Index
CPU
CPU
CPU
CPU
Operating system
Application
Application
RAM
I/O
controller
I/O
controller