Mining the Cloud….!
!
Building and
supporting a Virtual
Research
Infrastructure!
Gary WroblewskiApplica2on Coordinator/Mgr. Technical Services
The Herbert H. and Grace A. Dow College of Health Professions Central Michigan University
How to build a research
infrastructure….?
• Get a big grant.
• Build a high speed core network
– AHach that high speed network to buildings where your researchers are.
• AHach systems and storage
• Buy researchers high powered desktops
Fundamental Ques2ons
• Can I build on exis2ng CMU ac2vi2es like VDI?
• Standard HPC does not work for my faculty…
• Realiza2on that we should bring the
researchers to the data, not move the data to the researchers
– BeHer support for homogeneous systems – We can leverage shared investments from
different units
How to build an ICEBOX
• Buy Dense, Fast, Capable servers
• Buy good, fast disc arrays
– …..and fast interconnects/networking
• Virtualize servers and gold image of desktop
– Install soKware on gold image – Clone it
• Use ACLs to allow researchers to remotely use
vm’s to access secure high speed systems.
Research ICE
What is Research ICE?
• Cyberinfrastructure designed to support health analysis ac2vi2es.
Who is Research ICE for?
• Inves2gators at or collabora2ng with The Herbert H. and Grace A. Dow College of Health Professions
• What will Research ICE do?
• Support interdisciplinary collabora2on and research.
• Securely provide access to protected health informa2on and proac2vely enforce data use agreements.
Shared Governance: Research ICE
Faculty Priori2za2on CommiHee
Charter:
This commiHee shall be responsible for priori2zing data staging and developing the appropriate policies and opera2onal procedures to ensure the research data
available, access control, and collabora2on environment fulfills expecta2ons as cri2cal research infrastructure. Ini2ally the commiHee will focus on specific targeted projects but rapidly expand to support other research
groups at CMED and CHP. The Health Technologies Group (HTG) shall be responsible for ongoing opera2ons of
these resources under the guidance of the faculty and the Dean’s Advisory Council.
Acquisi2on of Private Data
• Establish a repeatable process for 2mely
acquisi2on and staging of third party data for analysis.
• Standardized Data Use Agreements (DUA)
– Define expecta2ons for privacy, permissible use or disclosure, and limita2on on comingling with other data sets.
– Outline security mechanisms, access control methods employed, and specific regulatory obliga2ons.
– Clarify the requirements upon termina2on of the rela2onship.
What is in the ICE “BOX”?
• An integrated infrastructure for conduc2ng shared analysis. • Commonly reusable data sources that are professionally
maintained and integrated.
• Robust security, access control, and audit tracking mechanisms.
• Access to common analy2cal and data visualiza2on tools. • Modestly large storage capacity, 50-‐100 TB
• Adequate computa2onal resources to readily manipulate and analyze 100 M record data sets.
Analy2cs Workbench Tools
Quan%ta%ve Resources
• JMP • SPSS • R
• SAS Enterprise Guide • Simula2on soKware • Data visualiza2on tools
– Tableau
Dedicated Virtual Worksta%ons
• Faculty & Principle Inves2gators
• Graduate students, under graduates, and GSAs
• Ability to work both
remotely and on campus • Adequate computa2onal
performance • Large data sets
Standardized Virtual Configura2ons
Virtual Student Computing Lab/s Research Group Virtual Analytics Evirnoment
Virtualization Server Fabric
Storage & Disk
Storage & Disk
Native Drivers Network Network Virtual Research Workstation 1 Virtual disk Virtual CPU & Memory Applications Guest OS VM-‐aware drivers Virtual Research Workstation 2 Virtual disk Virtual CPU & Memory Applications Guest OS VM-‐aware drivers Virtual Research Team Server Virtual disk Virtual CPU & Memory Applications Guest OS VM-‐aware drivers Firewall Firewall Virtual Lab 1 Virtual disk Virtual CPU & Memory Applications Guest OS VM-‐aware drivers Virtual Lab 2 Virtual disk Virtual CPU & Memory Applications Guest OS VM-‐aware drivers
Virtual Class Server Virtual disk Virtual CPU & Memory Applications Guest OS VM-‐aware drivers
Secure Environment: A Logical View
Management network
CMU network/Internet
Virtual desktop network
Virtual Desktop Broker Vmware Host #1 Vmware Host #2 Fiber Switch SAN Storage Isolated Storage Network Processing Cluster Dell PowerEdge R715
2x AMD 8 Core processors per server Total of 32 cores X 2.4Ghz 96GB Ram per host(192GB total) VMWare ESXi 4.1
Storage Cluster
Xiotech ISE Storage Blade
12,000 IOPS 19.2 TB Raw capacity 4Gb FC storage fabric Researcher laptop Researcher laptop DMZ
Access control list Or Firewall Remote
Performance Considera2ons
• Does virtualiza2on limit our performance?
– ~10% overhead……
• How does this compare to a standard desktop
that runs SAS?
• Data storage space?
Security
• Shared tenancy of data.
– Are ACL’s enough security?
– NDA’s for confiden2al data sources. – Backups?
• Develop process for onboarding/disposal of
GID’s.
• Probably much beHer control than having
Support
• Schedule maintenance 2me.
• Long running jobs…..days/weeks.
• Researcher schedule….nights and weekends
when support is at home!
• Training requirements for researchers.
Support…..
• SoKware s2ll maturing
• Commitment from administra2on
• Develop oversight commiHee
– Faculty and staff involved in research
• Virtual infrastructure is easy to overtax!
What’s Next
• Con2nue to expand systems as more
researchers get involved.
• Build more performance in….
• Extend reach to other campus units
• Founda2on for grants.