A Cl
d C
i
Pl f
f
FY 4 B
d
A
Cloud
Computing
Platform
for
FY
‐
4
Based
on
Resource
Scheduling
Technology
Xiangang Zhao Manyun Lin Lan Wei Lizi Xie Zhanyun Zhang Peng Guo Xiangang Zhao, Manyun Lin, Lan Wei, Lizi Xie, Zhanyun Zhang, Peng Guo
National Satellite Meteorological Center ,CMA
5th Asia‐Oceania Meteorological Satellite Users Conference
Outline
Outline
1. IT scale of FY-4 ground segment
2 M j
h ll
d
l ti
2. Major challenges and solutions
3 FY 4 IT architecture design
3. FY-4 IT architecture design
1 IT scale of FY
‐
4 ground segment
1
IT
scale
of
FY
‐
4
ground
segment
• Second generation geostationary satellite 4 instruments larger amounts
• Second generation geostationary satellite, 4 instruments, larger amounts
of data, more products.
• Computing capability requirement: 340TFlops.
• Storage capacity requirement: 11PB.
Storage capacity requirements(TB)
Computing capability requirements(TFlops)
/FY-3C
Storage capacity requirements(TB)
CNS NRS PGS, 5587
DSS, 18169 ADS, 12557
Computing capability requirements(TFlops)
CNS NRS /FY 3C CNS, 5220 DSS, 4000 NRS CVS MCS ADS CNS, 182917.6 MCS 19884 8 SWS, 2822.2 DTS, 42998 CVS MCS SWS CVS, 1225 ADS, 450 SWS, 50 SWS DSS , CVS, 38214 MCS, 19884.8 DTS PGS DSS ADS
National Satellite Meteorological Center ,CMA
5th Asia‐Oceania Meteorological Satellite Users Conference
NRS, 200 MCS, 280
NRS, 15727.6
Outline
Outline
1. IT scale of FY-4 ground segment
2 M j
h ll
d
l ti
2. Major challenges and solutions
3 FY 4 IT architecture design
3. FY-4 IT architecture design
Major Challenges
Major
Challenges
•
How to achieve high reliability and high performance?
•
How
to
achieve
high
reliability
and
high
performance?
•
How
to
share
resources
and
save
costs?
Wh t b
t
i
d
d ? H
t b
k
•
What
about
expansions
and
upgrades?
How
to
break
information
islands
and
build
a
sustainable
system
for
FY
‐
4A FY
‐
4B FY
‐
3 and
4A,FY 4B,FY 3,
and
…
National Satellite Meteorological Center ,CMA
Solutions
‐‐
Adopt the IT architecture of FY
‐
2
Solutions
‐‐
Adopt
the
IT
architecture
of
FY
‐
2
• Adopt the IT architecture of FY 2
• Adopt
the
IT
architecture
of
FY
‐
2
• Choose Unix servers and high‐end storage system
• Set up an exclusive system for each satellite
d l b l d f
• Good reliability and performance
Solutions
‐‐
Adopt a new kind of IT technology
Solutions
‐‐
Adopt
a
new
kind
of
IT
technology
• Cloud computing as a new kind of IT technology is widely applied.
• High scalability, rapid deployment speed, cost savings and so on.
National Satellite Meteorological Center ,CMA
Application of cloud computing
Application
of
cloud
computing
Cl d ti i l li d i th d t f t llit h • Cloud computing is also applied in the ground segment of satellites, such as GPS, communication and meteorological satellites. • Nebula is one of NASA Cloud Computing Platforms, for data sharing and p g g application supporting such as climate prediction. • According to Gartner, Inc. ,nearly half of large enterprises will have cloud
deployments by the end of 2017 deployments by the end of 2017.
Outline
Outline
1. IT scale of FY-4 ground segment
2 M j
h ll
d
l ti
2. Major challenges and solutions
3 FY 4 IT architecture design
3. FY-4 IT architecture design
4. Summary and plan
y
p
National Satellite Meteorological Center ,CMA
3 IT architecture design of FY 4
3
IT
architecture
design
of
FY
‐
4
•
Schedule system design
•
Schedule
system
design
• The separation of operation scheduling and resource scheduling brings rich flexibility.
• Operation scheduling need not care about the underlying platform architecture.
• Resource scheduling need not care about operation logic and only concentrates on resource management and a single job scheduling.
The cloud platform architecture of FY
‐
4
The
cloud
platform
architecture
of
FY
‐
4
National Satellite Meteorological Center ,CMA
Architecture description
Architecture
description
The infrastructure layer organizes all the medium
and low level heterogeneous physical resources
and low level heterogeneous physical resources
such as computing, networking and storage to
supply high performance computing power, high‐
Architecture description
Architecture
description
The resource scheduling layer achieves the unified pool
management of heterogeneous computing resources and
designs fault‐tolerant mechanisms that deal with resources and
application exceptions to ensure high efficiency, flexibility and
l b l f h
Architecture description
Architecture
description
The job scheduling bus layer is designed to provide a
standard interface for job submission of application layer and
is compatible with LSF, PBS, and other operation Scheduler in
the resource scheduling layer. Corresponding to a meta‐
Scheduler, this layer can forward jobs to their appropriate
schedulers, in which fault‐tolerant strategies for fault
Architecture description
Architecture
description
The application layer is used to provide the user interface
Equipment selection and resource pooling design
Equipment
selection
and
resource
pooling
design
U i S
Computing capability distribution diagram
Unix Server 17% Blade Server PC Server 28% Unix Server Blade Server PC Server Blade Server 55%
Storage capacity disribution
high‐end
storage 38% low‐end
high‐end storage low‐end storage storage
Key algorithms(1/2)
Key
algorithms(1/2)
1. Resource
failure
processing
algorithm
h l d f l h dl h f l d
• When a single computing node fails, it can handle this failure and move all the jobs on this node to other nodes.
2. Resource
group
failure
processing
algorithm
Wh ll it t t it ll
• When a resource group collapses, it can try restore it or move all the jobs on this group to other groups including related data
migration if the restoration fails.
National Satellite Meteorological Center ,CMA
Key algorithms(2/2)
Key
algorithms(2/2)
3. Job
failure
processing
algorithm
• When a job fails, it can redo it or move it to another computing d
node.
4. Scheduler
failure
processing
algorithm
• When a scheduler becomes invalid, it can recover it or move all the jobs on this scheduler to other schedulers including related data if the recovery fails.
5. Load
balance
scheduling
algorithm
• According to scheduling strategy, it aims to optimize resource usage, maximize throughput, minimize response time, and avoid overload of any single resource.
Outline
Outline
1. IT scale of FY-4 ground segment
2 M j
h ll
d
l ti
2. Major challenges and solutions
3 FY 4 IT architecture design
3. FY-4 IT architecture design
4. Summary and plan
y
p
National Satellite Meteorological Center ,CMA
Summary
Summary
• Setting up resource pooling using general devices without virtualization technology enhances the expansibility and improve the system performance to price ratio. It can save 60% money for computing servers theoretically. 60%=80%* (1‐1/4) 60%=80% (1‐1/4) • Resource scheduling including load balancing scheduling and fault tolerance mechanism can ensure the reliability and efficiency of the • The architecture is still in design stage, more problems need to be system. solved during the implementation phase in the future.Plan and advice
Plan
and
advice
•
Share
a
cloud
for
all
FengYun satellites
in
the
future.
•
Make
full
use
of
social
resources
to
gain
standard
ti
d t
it
computing
and
storage
capacity.
•
Design
the
interface
between
private
and
public
cloud
and provide data sharing conveniently for the public
and
provide
data
sharing
conveniently
for
the
public.
•
Advice:
Carry
out
more
exchanges
about
IT
architecture
in the future.
in
the
future.
FengYun Cloud
National Satellite Meteorological Center ,CMA