• No results found

GRID computing at LHC Science without Borders

N/A
N/A
Protected

Academic year: 2021

Share "GRID computing at LHC Science without Borders"

Copied!
23
0
0

Loading.... (view fulltext now)

Full text

(1)

GRID computing at LHC

Science without Borders

Kajari Mazumdar

Department of High Energy Physics

Tata Institute of Fundamental Research, Mumbai.

Dr. Paul’s Eng. College, Velucherry September 12, 2011

Disclaimer:

I am a physicist whose research field induces & utilizes

(2)

Plan of talk

Requirements of today’s scientific community

Grid concept in simple terms

Evolution of Grid

LHC Computing Grid and CMS experiment

CMS Tier2 Grid Computing Centre at TIFR, Mumbai

Outlook

Basic idea (G. Gilder): when the network is as fast as the computer’s

internal link, the machine disintegrates across the net into a set of

special purpose appliances

.
(3)

Computing requirements and challenges

Today’s science is based on computations, data analysis, data visualization, .. 1. Scientific and engineering problems are getting ever more complex .

2. Collaborations are becoming larger.

Computer simulation and modelling is more cost-effective than experimental methods in some cases (eg. reactor safety, designing of an aircraft).

Users need more accurate and precise solutions to their problems in shortest time possible (eg. weather forecasts).

Recent years is seeing mammoth scientific projects where data size is several PetaBytes per year (eg., LHC experiments) to be used by several thousand people.

To work with a colleague even across a campus on Petabyte (1015 ) scale we need ultrafast network.

Even though CPU power, disc storage, communication speed continue to increase, computing resources are failing to satisfy users’ demands!

(4)

Current trend in scientific communications

1. Free, open-source software GNU/Linux based OS has been developed consciously

with many applications Research/academic institutes use cheaper PC clusters to

achieve high performance easy to develop loosely coupled distributed applications. • Softwares have to catch up with users’ demands and expectations for high end

computing.

2. Parallel computing: multiple computers or processors working together on a

common task

-- each processor works on its section of the problem

-- processors are allowed to exchange information among themselves

• Two big advantages of parallel computers: performance and memory.

3. Internet computing using idle PC’s is becoming an important computing platform (LHC@home, Seti@home, Napster, ..)

www is the promising candidate for core component of wide-area distributed

computing environment.

Efficient client/server models/protocols

Transparent networking, navigation, GUI with multimedia access and dissemination

(5)

Grid computing in simple words

• Grid is an utility or infra-structure for complex, huge computations, where remote resources are accessible through web (internet), from desktop, laptop, mobile phone.

It is similar to the electrical power grid, where the user does not have to worry about the source of the computing power.

Imagine millions of computers owned by individuals, institutes from various countries across the world connected to form a single, huge, super-computer! This technology, developed since last only one decade, is being used by

--- high energy physicists to store, analyze data being produced by LHC experiments at CERN, Geneva, Switzerland.

--- Earth scientists to monitor Ozone layer activity. --- Biologists to monitor behaviour of bees

--- ....

(6)

World Wide Web – Information Sharing

Invented at CERN by Tim Berners-Lee (in 1990s)For use in High Energy Physics experiments

Agreed protocols, like, HTTP

Anyone can access information and post their own

Quickly crossed over into public use

Going back

GRID is changing the way science is

being done.

(7)

2. Efficient use of major and minor resources at many institutes.

People from many institutions working to solve a common problem

Ensure data accessible anywhere and anytime.

3. Interactions with the underneath layers need to be transparent and seemless to the user.

4. Harness the power of internet to aggregate and share resources spread across the globe: both challenging and highly cost-effective can give unlimited

capability.

Grow rapidly, yet remain reliable for more than a decade.

From Web to Grid Computing

Use of internet as infrastructure, and advanced web services for seemless Integration.

1. Sharing more than just information; Data, computing power, applications

in dynamic, multi-institutional, virtual organizations tools: email, video

conference, webcast. white board. Working together apart.

(8)

Large Hadron Collider (LHC)

• 27 km circumference

• at 1.9 K

at 10-13 Torr

• at 50-175 m below surface

• more than 10K magnets

4 big experiments, with about 10K scientists, 3k students,engineers. Operational since 2009, Q4

excellent performance fast reap of science!

Largest ever scientific project  20 years to plan, build

(9)

Big Bang Big Bang Today Today ~300‘000 years ~300‘000 years

Experiments in Astrophysics & Cosmology Experiments in Astrophysics & Cosmology

WMAP WMAP ((2001) COBE( COBE(1989) LHC: ~ 10 LHC: ~ 10-12 -12 seconds (p-p)seconds (p-p) ~ 10~ 10-6-6 seconds (Pb-Pb) seconds (Pb-Pb)

(10)

LHC collides 6-8 hundred million

proton-on-proton per second for several years.

Only 1 in ~20 thousand collisions will have an important tale to

tell, but we do not know which one!

 so we have to search through all

of them!

 Huge task!

• 15 PBytes (10 15 bytes) of data a

year

• Analysis requires ~100,000 computers to get results in reasonable time.

GRID computing is essential

(11)

Complexity of LHC experiments

When 2 very high energy protons collide at LHC, it results in a very crowded situation.

• In a single experiment several million electrical signals are recorded within tiny fraction of a second, repeatedly, for a long time. There are 4 big experiments.

• Using computers, a digital image is created for each such instance. Image size can vary from 1 to 80 MB depending on the impact.

But, unfortunately, most of these pictures are not interesting!

One in few thousand billion collisions will be really useful to provide the clue about the early conditions in the universe !

Store data by colliding intense beams of energetic protons.

(12)

Data volume rates for a typical experiment

Presently event size ~ 1MB

(13)

Tier 0

Tier 1

National centres

Tier 2

Regional groups in a continent/nation Different Universities, Institutes in a country Individual scientist’s PC,laptop, .. Experimental site

CERN computer

centre,

Geneva

ASIA (Taiwan) India Chin a Korea Taiwa n France Italy Germany USA TIFR BARC Panjab Univ.

Indiacms

T2_IN_TIFR

Layered

Structure of CMS GRID 

connecting computers across globe

Delhi Univ.

Pakist an

(14)

Overview of Grid Components

A huge manpower is invisibly at work

(15)

The grid relies on advanced software which interfaces between resources and

applications linked by internet: Middleware mediates everything

1.Secure and effective unifrom access to wide range of resources 2.Optimal use

3.Authentication to the system by digital certificate and then to groups and sites 4. Application level amangemnet: job execution and monitoring during progress 5.Problem recovery

6.Collection of results after execution and delivery to user 7. Address inter-domain issue of security, policy, etc.

authorisation rights to use the facility for the user’s purpose

Grid middleware

Middleware components:

• User Interface

• Resource broker/Worksload management system

• Information system, file and replica catalogues

Logging and book-keeping •Storage elements

• compute elements

1. You submit task to grid.

2. Grid find convenient places to execute the task. decomposes if necessary.

(16)

GRID portal / Gateway

Event level parallelism: process event-by event.

Split large job into M efficient processes, each dealing with M events. Large memory needed, though scalability is built-in.

(17)

CMS in Total:

1 Tier-0 at CERN (GVA) 7 Tier-1s on 3 continents 50 Tier-2s on 4 continents

CMS T2 in India : one of the 5 in Asia-Pacific region

Today : 6 collaborating institutes in CMS , ~ 50 scientists +students 2.1% of signing authors in publication,

Contributing to computing resource of CMS ~ 3%

(18)

CMS Tier2 site at TIFR: T2_IN_TIFR

About 60 users/scientists at present, still growing.

Grid facility has been functional at TIFR for last few years.

The CMS collaboration at LHC, CERN has been using the computer resources at Mumbai to mainly perform event simulation, storing Physics data Indian contribution noted as collective service to

the experiment.

Current resources:

Storage: 450 TB

400 worker nodes.

Internet bandwidth > 1 GBps

Note, continuous monitoring essential.

(19)

1 Gbps to CERN peered to GEANT 2.5 Gbps NKN +TEIN3 TIFR-INDIACMS T2 100 Mbps to VECC RRCAT, IPR

Network connections

VECC-INDIAALICE-T2

Grid Connectivity within India

(20)

Data Transfers from/to TIFR

•Total data volume at present ~ 250 TB •Total transfers during last 6 months ~ 70 TB • TIFR hosting i) centrally managed data (simulated, custodial) ii)collision data skims

• Current CMS total CPU pledge at T2s : 18k jobs slots

• Nominal Analysis pledge : 50%

• Slot utilization during Summer/Fall 09 was reasonable

 but need to go into sustained analysis mode

upload

(21)

August 15-18, 2011

Maximum: 1.5 Gbps Avg. : 1Gbps

(22)

Statistics and plots

Site summary table

Site history

(23)

Front ranking science and engineering requires massive amounts of computing,

including huge data collection, storage and access to data, database etc.

LHC is the largest grid serving in the world with 200 sites in 40 countires,

equipped with tens of thousands of linux servers and tens of PetaByte storage.

Seemless and transparent access is enabled by grid technology, without

compromising on security and convenience.

Challenge for the younger generation

Conservation of network bandwidth or use on demand basis is a challenge.

The technology is still young and immature

Good tools are required

Portability and scalability likely be resolved by virtualization

YOU ARE WELCOME TO GET STARTED WITH GRID ISSUES!

References

Related documents

By virtue of both Edge Tier and Server Tier, multimedia packets in our proposed In- M emor y Method can choose to give their workload to either tier according to scheduling

© BRIGHT MLS - All information, regardless of source, should be verified by personal inspection by and/or with the appropriate professional(s). The information is

Its pulse profile shows a trailing component at our frequencies and their separation from the main component changes over time.. At certain epochs when this trailing component

 A fully hosted and supported Blackboard Learn instance  Blackboard Learn Application and HelpDesk support  SLN customizations and enhancements for the campus 

This brief reveals a significant level of absenteeism in the early school years, espe- cially among low-income children, and confirms its detrimental effects on school success by

meat may increase the incidence of can- cers of the breast, colon, and prostate (68, 69). However, large prospective studies of fat intake and breast cancer have consis- tently shown

For instance, in choice situations 1, 4 and 6, society A presents a higher level of absolute income, while society B offers a bundle of more favourable positional measures (i.e.

Pixel has been / is currently involved in about 90 projects financed by the European Commission in the framework of the following programmes: Socrates ODL; Socrates Minerva;