The Data Exchange
Project
An update for SIF Conference
19th November, Sheffield
Iain Bradley – DfE Data Exchange Lead
The current landscape – a summary of problems
The Problem
• Bulk collections limit detail
• Significant front line ‘compliance’ cost • Changes reliant on MIS systems responding to COLLECT specifications • Schools have to
send the same data to several places • Data cleaning can
happen outside school MIS
•Data stored in silos
•Data stored inconsistently (version control)
•Data processed with locally chosen software •Data not stored at lowest level
•Responding to new policies or queries can be time consuming / inflexible Taken together, accessing, combining and then using data is more difficult than it ought to be
•Several places to go for the ‘same’ data. Multiple
websites and passwords. •Varying analytical and visualisation tools •Parents, schools, DfE, inspectors, researchers...not all pointing to the same data for the same issue
•3rd party access to data is a
bespoke and labour intensive process
The Solution
1. DATA EXCHANGE
School Performance Data Programme
2. WAREHOUSE 3. PORTAL
1. Gather Data 2. Process and
Store Data
3. Make Data Available Process
Context: The vision for the end state
For Richer, More accurate data to be available quickly in accessible and usable forms, in order to enable others to drive up the quality of education
and services received by children
Specifically within the context of data exchange:
• From bulk upload to regular movement with minimal manual intervention…a business process should trigger a movement
• Able to tell someone plugged into the exchange ‘within minutes’, but most typically ‘within hours’ will do.
• Data could be pushed on change, at defined times, or pulled.
• Schools don’t have to repackage data for different users – just be plugged into the exchange
Context: The vision for the end state
• We anticipate each and every School and LA who uses one, being plugged into exchange via their MIS
• Appropriate role based authorised access and security
• Data movements being controlled via a central hub as part of a ‘hub and spoke’ configuration (as opposed to hierarchical, distributed or centralised)
• SPDP’s data warehouse and portal will be key consumers, and as such the exchange architecture should be closely integrated to maximise performance
• Able to handle a variety of formats for moving data around the sector • when data moves it may be in various formats but only ISB format will
be accepted by the Department
Designing the Architecture– what do we know?
• 25,000 schools, 152 LAs. Nearly all have MIS, but not all use it to the full degree. Data Exchange will be ‘one size fits most’ but need a way to bring data into data store for the tail of schools not using MIS.
• Given the fact DfE is already buying a warehouse and portal under School Performance Data Programme, we should fully exploit the elements of those which can deliver part of the solution for Data Exchange
Data Exchange: What’s out of scope?
Data / Organisational scope could potentially be massive, but the risk of never getting off the ground would be substantial. Initial scope will focus on individualised data sitting in school and LA MIS as end points. But by building a scalable solution using open standards, we will avoid a cul-de-sac in future. Within that scope, a number of scenarios have been identified which may fall outside the scope of DTP, including:
the transfer of information between systems within an organisation, for example to maintain common data in separate systems within the organisation in a consistent state on a sub-second timescale. Whilst not our focus, we of course don’t want to unintentionally ‘orphan’ any existing local movements whilst implementing the exchange.
the transfer of information between schools working collaboratively, for example to move in-lesson attainment data captured in an interactive learning environment from one school to another during the lesson
to alert the Local Authority children’s services immediately when a learner that is being monitored does not arrive for school
Data exchange will support the sending of any package but the SPDP project does not cover the extraction or loading of any data which is not in ISB format and not required for SPDP. So sending data that is within an organisation can be done locally using the data exchange but is out of scope of SPDP
What we need…
To integrate an exchange within the warehouse and portals solutions in hand we need…
a) School / LA MIS to be able to communicate with Data Exchange Hub b) A data exchange hub, with appropriate routing, control, audit and
security
c) The hub to seamlessly integrate with the SPDP data warehouse to provide the storage area for all the data DfE receive.
We do not need – a data store, or way to present data – these are to be
Challenges / Risks
• Number of end points and variation in technical ability of schools
• Implementing the ISB Enterprise Data Architecture with MIS suppliers with whom we have no formal contractual relationship
• Ensuring integration with SPDP architecture, which itself has not been built yet
• Cultural shift for data providers, from annual data collections which are physically sent, to data flowing out of system automatically
• Greater transparency of information than ever before at a local and national level
• Data cleaning / validation. Ensuring we better support front-line data entry by developing easily accessible rules across the end-to-end solution, and not throwing out the baby with the bathwater in terms of
Summary / Next Steps / Timings
2013 SPD P D a ta Ex ch a n g e 2014 2015 2016+ Requirements Gathering & Technical Options WorkVision, Blueprint & end to end design Procurement Strategy Standards implementation decisions Outline Business Case
Procurement Activity Start : Target Spring 2014 Start Build: Likely to be phased based on SPDP readiness and standards maturity Phase one of exchange complete? Phase two of exchange complete? Contract Signed
Data Warehouse design, build and test
Phased DWH go live
Phased Portal go live Portal design, build and test
DfE Ministerial approval of Full Business Case Cabinet Office approves Full Business Case
Data Warehouse and Portal operational Preferred supplier
chosen – contract fine tuning
Developing The Blueprint: Latest Thinking
End to end solution overview
Agnostic of Transfer Mechanism Connected to Data Exchange
End Point – DE Connected
Data Store Application Interface Data Entry
End User
Not Connected to Data Exchange
Data Exchange Hub
Message Routing / Queue End Point Security Control Application Audit Database Authentication Service
Schools Performance Data Warehouse
CIOG Analysis DE API ODS Data Ware-house Legacy Data Store Data Extracts Data Validation
Master Data Management
Analytical End User Web-based End User Non-DE Interface
Control Application (including ETL) End Point – Not DE
Connected Data Store Data Entry End User Manual Data Entry Data Validation Data File W eb P or ta l Admin End User RBAC Data Analysis Tools End Point – DE Connected Application
Interface Data Store
Data Reader End User DE Message DE Message DE Message Application
Key requirements for transfer mechanism
Automatically transfer information
– Guaranteed transfer of information between any two end points with
no manual intervention (Not guaranteed order of delivery) – Addition / removal of end points with minimum effort
– Prioritisation/precedence to meet SLAs for different message types Configure dataflows (control capability)
– Enable authorised users to configure pre-defined dataflow services
• Trigger (on change, scheduled, on request, others TBD)
• Destination end points (individual or multiple)
Validate and cleanse data
– Data quality is important for end to end solution – Transfer mechanism must validate against XSD
Key requirements for transfer mechanism
Monitor and improve performance
– Performance logging and alerting Maintain security
– Solution will need accreditation to ‘Official’ level
– Authenticate end points (and end points must authenticate hub) – Protect information in transit
– Ensure only authorised users can configure data flows – Ensure data flows reflect access privileges of end points – Accounting / Audit capability
Key non-functional requirements
Support tens of thousands of end points
– 25k schools, could be several end points for some schools – Scalable to support future growth
Message volumes still being developed
– 25k schools reporting on (non)attendance twice per day
• message per session/school or message per class / session TBD – Other messages one or more orders of magnitude lower in frequency
Performance
– Session attendance available on portal within 1 hour of leaving school – ‘Performance budget’ needs to be split between transfer mechanism
and SPDP analysis / reporting / publication activities Availability
– 24/7 with 99%, core working hours higher – < 30 minutes interruption of service
Key non-functional requirements
Flexibility
– Keep data model and transfer mechanism separate
– Maximum flexibility in terms of modes from user perspective
Standards
– Use open standards in widespread use
– Identify output based requirements and assess solutions offered against VFM for the sector
Wide area network
Information flows and control
Hub Message queuing and routing Message layer Create / send, Receive / extract
ISB Data layer
Pack / unpack ISB format business
data payload
End point core
(May support ISB natively)
Message layer
Create / send, Receive / extract
ISB Data layer
Pack / unpack ISB format business
data payload
SPDP / End point Core
(May support ISB natively) D ata m od el ag no sti c IS B B DA sp e cific
End point Hub End point/SPDP
Control application Configure end point data flows P ro p riet a ry Dat a m o d e l Control message
Send {Existing flow X} When {trigger}
To A, B, C
Data flows IAW control messages: {Existing flow X} sent to hub labelled for A, B, C when trigger condition met
Hub routes message to A, B C based on information in header