• No results found

Concepts, Planning, and Installation Guide

N/A
N/A
Protected

Academic year: 2021

Share "Concepts, Planning, and Installation Guide"

Copied!
138
0
0

Loading.... (view fulltext now)

Full text

(1)

General

Parallel

File

System

Concepts,

Planning,

and

Installation

Guide

Version

3

Release

2

(2)
(3)

General

Parallel

File

System

Concepts,

Planning,

and

Installation

Guide

Version

3

Release

2

(4)

Note:

Beforeusingthisinformationandtheproductitsupports,besuretoreadthegeneralinformationunder“Notices”onpage 99.

SecondEdition(October2007)

Thiseditionappliestoversion3release2ofIBMGeneralParallelFileSystemMultiplatform(programnumber 5724-N94),IBMGeneralParallelFileSystemforPOWER(programnumber5765-G66),andtoallsubsequent releasesandmodificationsuntilotherwiseindicatedinneweditions.Significantchangesoradditionstothetextand illustrationsareindicatedbyaverticalline(|)totheleftofthechange.

IBMwelcomesyourcomments.Aformforyourcommentsmaybeprovidedatthebackofthispublication,oryou mayaddressyourcommentstothefollowing:

InternationalBusinessMachinesCorporation DepartmentH6MA,MailStationP181 2455SouthRoad

Poughkeepsie,NY12601-5400 UnitedStatesofAmerica

FAX(UnitedStatesandCanada):1+845+432-9405 FAX(OtherCountries):

YourInternationalAccessCode+1+845+432-9405

IBMLink

(UnitedStatescustomersonly):IBMUSM10(MHVRCFS)

Internete-mail:[email protected]

Ifyouwouldlikeareply,besuretoincludeyourname,address,telephonenumber,orFAXnumber. Makesuretoincludethefollowinginyourcommentornote:

v Titleandordernumberofthisbook

v Pagenumberortopicrelatedtoyourcomment

WhenyousendinformationtoIBM,yougrantIBManonexclusiverighttouseordistributetheinformationinany wayitbelievesappropriatewithoutincurringanyobligationtoyou.

| | | |

(5)

Contents

Figures . . . vii

Tables. . . ix

Aboutthispublication . . . xi

Whoshouldreadthispublication . . . xi

Conventionsusedinthispublication . . . xi

Prerequisiteandrelatedinformation. . . xii

ISO9000 . . . xii

UsingLookAttolookupmessageexplanations . . . xii

Howtosend yourcomments . . . xiii

Summaryofchanges . . . xv

Chapter1. IntroducingGeneralParallelFileSystem . . . 1

Thestrengthsof GPFS. . . 1

SharedfilesystemaccessamongGPFSclusters . . . 1

Improvedsystemperformance . . . 2

Fileconsistency . . . 3

Highrecoverability andincreaseddata availability . . . 3

Enhancedsystemflexibility . . . 3

Simplifiedstoragemanagement. . . 4

Simplifiedadministration . . . 4

ThebasicGPFSstructure. . . 5

GPFSadministration commands . . . 5

TheGPFSkernelextension . . . 5

TheGPFSdaemon . . . 5

TheGPFSopensourceportabilitylayer. . . 6

GPFScluster configurations . . . 6

Interoperableclusterrequirements . . . 10

Chapter2. PlanningforGPFS . . . 13

Hardwarerequirements . . . 13

Softwarerequirements . . . 13

Recoverabilityconsiderations . . . 14

Nodefailure . . . 14

NetworkSharedDiskserveranddiskfailure . . . 17

ReducedrecoverytimeusingPersistentReserve. . . 20

GPFScluster creationconsiderations . . . 20

GPFSnode adapterinterface names . . . 21

NodesinyourGPFScluster . . . 21

GPFScluster configurationservers . . . 22

Remoteshellcommand . . . 22

Remotefilecopycommand. . . 23

Clustername . . . 23

UserIDdomainfor thecluster. . . 23

StartingGPFSautomatically . . . 23

Clusterconfigurationfile . . . 23

Managingdistributedtokens . . . 24

Diskconsiderations. . . 24

NSDcreation considerations . . . 25

NSDserverconsiderations . . . 28

Filesystemdescriptorquorum. . . 29 ||

(6)

Filesystemcreationconsiderations . . . 30

Devicename ofthefilesystem . . . 32

Listofdiskdescriptors. . . 32

NFSV4’deny-writeopen lock’. . . 32

Disksfor yourfilesystem . . . 33

Decidinghowthefilesystem ismounted . . . 33

Blocksize . . . 33

atimevalues . . . 34

mtimevalues . . . 34

Blockallocation map . . . 34

Filesystemauthorization. . . 35

Strictreplication . . . 35

Internallogfile . . . 35

Filesystemrecoverabilityparameters . . . 36

Numberofnodesmountingthefilesystem . . . 36

Maximumnumberof files . . . 37

Mountpointdirectory . . . 37

Assignmountcommandoptions . . . 37

Automaticquota activation . . . 37

EnableDMAPI . . . 38

Asamplefilesystem creation . . . 38

Chapter3.StepstoestablishingandstartingyourGPFScluster . . . 41

Chapter4.InstallingGPFSon Linuxnodes . . . 43

Filesto easetheinstallationprocess . . . 43

Verifyingthelevelofprerequisitesoftware . . . 43

Installationprocedures . . . 43

Acceptingtheelectroniclicenseagreement . . . 44

CreatingtheGPFSdirectory . . . 44

InstallingtheGPFSmanpages . . . 44

InstallingGPFSoveranetwork . . . 45

VerifyingtheGPFSinstallation . . . 45

BuildingyourGPFSportabilitylayer. . . 45

Usingtheautomaticconfigurationtoolto buildGPFSportabilitylayer . . . 45

Chapter5.InstallingGPFSonAIX5Lnodes . . . 47

Filesto easetheinstallationprocess . . . 47

Verifyingthelevelofprerequisitesoftware . . . 47

Installationprocedures . . . 47

Acceptingtheelectroniclicenseagreement . . . 48

CreatingtheGPFSdirectory . . . 48

CreatingtheGPFSinstallationtableofcontentsfile . . . 48

InstallingtheGPFSmanpages . . . 48

InstallingGPFSoveranetwork . . . 48

ReconcilingexistingGPFSfiles . . . 49

VerifyingtheGPFSinstallation . . . 49

Chapter6.Migration,coexistenceandcompatibility . . . 51

Migratingto GPFS3.2fromGPFS3.1. . . 51

Migratingto GPFS3.2fromGPFS2.3. . . 51

Migratingto GPFS3.2fromGPFS2.2orearlierreleasesof GPFS . . . 52

Completingthemigrationtoa newlevelofGPFS. . . 54

Additionalconsiderations whenmigratingGPFS2.3andearlierfilesystems. . . 55

Revertingtothepreviouslevelof GPFS . . . 55

Revertingtoa previouslevelofGPFSwhenyouhavenotissuedmmchconfigrelease=LATEST 56 || || || || || || || || || ||

(7)

Revertingtoa previouslevelofGPFSwhenyouhaveissued mmchconfigrelease=LATEST. . . . 56

Coexistence . . . 57

Compatibility . . . 57

ApplyingmaintenancetoyourGPFSsystem . . . 57

Chapter7. ConfiguringandtuningyoursystemforGPFS . . . 59

Generalsystemconfigurationandtuningconsiderations . . . 59

Clocksynchronization . . . 59

GPFSadministration security . . . 59

Cacheusage . . . 60

GPFSI/O . . . 61

Accesspatterns . . . 61

Aggregatenetworkinterfaces . . . 62

Swapspace . . . 62

Linuxconfigurationandtuningconsiderations . . . 62

updatedbconsiderations . . . 63

SUSELINUX considerations . . . 63

GPFShelperthreads . . . 63

CommunicationsI/O . . . 63

DiskI/O . . . 64

AIXconfigurationandtuningconsiderations. . . 65

CommunicationsI/O . . . 65

DiskI/O . . . 65

Switchpool. . . 65

eServerHighPerformanceSwitch . . . 66

IBMVirtualShared Disk . . . 66

GPFSuse withOracle. . . 67

Chapter8. StepstopermanentlyuninstallGPFS . . . 69

Chapter9. GPFSarchitecture . . . 71

Specialmanagementfunctions . . . 71

TheGPFSclustermanager. . . 71

Thefile systemmanager. . . 72

Themetanode . . . 73

Useof diskstorageandfilestructure withinaGPFSfilesystem . . . 73

Quotafiles . . . 75

GPFSrecovery logs . . . 75

GPFSandmemory. . . 76

Pinnedandnon-pinnedmemory . . . 76

GPFSandnetwork communication . . . 77

GPFSdaemon communication . . . 77

GPFSadministration commands . . . 79

ApplicationanduserinteractionwithGPFS . . . 79

Operatingsystem commands . . . 79

Operatingsystem calls . . . 80

GPFScommandprocessing . . . 83

NSDdiskdiscovery. . . 84

Recovery . . . 85

Clusterconfigurationdatafiles. . . 85

GPFSbackup data . . . 86

Chapter10.IBM VirtualSharedDiskconsiderations . . . 87

Virtualshareddiskserverconsiderations . . . 87

Diskdistribution . . . 87

Diskconnectivity. . . 88 ||

(8)

Virtualshareddiskcreationconsiderations . . . 88

Virtualshareddiskserveranddiskfailure . . . 91

Chapter11.ConsiderationsforGPFSapplications . . . 95

ExceptionstoOpenGrouptechnicalstandards . . . 95

Determiningif afilesystem iscontrolledbyGPFS . . . 95

GPFSexceptionsandlimitationstoNFSV4ACLs . . . 96

AccessibilityfeaturesforGPFS . . . 97

Accessibilityfeatures . . . 97

Keyboardnavigation . . . 97

IBMandaccessibility . . . 97

Notices . . . 99 Trademarks . . . 100 Glossary . . . 103 Index . . . 107 || || || ||

(9)

Figures

1. ALinux-onlyclusterwithdisksthatareSAN-attachedto allnodes . . . 7

2. ALinux-onlyclusterwithanNSDserver. . . 7

3. AnAIXandLinuxclusterwithanNSDserver. . . 8

4. AnAIXandLinuxclusterprovidingremoteaccesstodisksthroughtheHighPerformanceSwitch (HPS)for theAIXnodesandaLANconnectionfor theLinuxnodes . . . 8

5. AnAIXandLinuxclusterthatprovidesremoteaccesstodisksthroughmultipleNSDservers 9 6. AnAIXclusterwithanNSDserver. . . 9

7. GPFSclustersprovidingsharedfile systemaccess . . . 10

8. GPFSconfigurationutilizingnodequorum . . . 15

9. GPFSconfigurationutilizingnodequorumwithtiebreakerdisks. . . 17

10. RAID/ESSControllertwin-tailed ina SANconfiguration. . . 18

11. GPFSconfigurationspecifyingmultipleNSDserversconnectedtoa commondiskcontroller utilizingRAID5withfourdatadisksandoneparitydisk. . . 18

12. GPFSutilizesfailuregroupstominimizetheprobabilityof aservicedisruptionduetoa single componentfailure . . . 19

13. GPFSfileshavea typicalUNIXstructure . . . 74

14. Basicfailuregroupswithserversanddisks . . . 89

15. Failuregroupswithtwin-taileddisks . . . 90

16. Primarynodeserving RAIDdevice . . . 91

17. BackupnodeservingRAIDdevice . . . 92

18. RAID/ESSControllermulti-tailed totheprimaryandsecondaryvirtualshareddiskservers. . . . 92

19. Concurrentnodeservingdevice . . . 93

|| || || | || | ||

(10)
(11)

Tables

1. Typographicconventions . . . xi

2. GPFSclustercreation options . . . 20

3. Diskdescriptorusagefor theGPFSdiskcommands. . . 27

(12)
(13)

About

this

publication

TheGeneralParallelFileSystem:Concepts,Planning, andInstallationGuidedescribes:

v TheIBM

®

GeneralParallelFileSystem™(GPFS™)Multiplatformlicensedprogram,5724-N94 v TheIBMGPFSfor POWER

licensedprogram,5765-G66

Thispublicationincludesinformationabout thesetopics: v IntroducingGPFS

v Planningconceptsfor GPFS

v SNMPsupport

v InstallingGPFS

v Migration,coexistenceandcompatibility

v Applying maintenance

v Configurationandtuning

v Stepsto uninstallGPFS

Who

should

read

this

publication

Thispublicationisintended forsystem administrators,analysts,installers,planners,andprogrammersof GPFSclusters.

Itassumesthatyouareveryexperiencedwithandfullyunderstandtheoperating systemsonwhichyour clusterisbased.

Usethis publicationif youare: v Planningfor GPFS

v InstallingGPFSonasupportedcluster configuration,consistingof:

– Linux

®nodes

– AIX5L

nodes

– Aninteroperable clustercomprisedofbothoperating systems

Conventions

used

in

this

publication

Table1describesthetypographicconventionsusedinthispublication.

Table1.Typographicconventions Typographic

convention

Usage

Bold Boldwordsorcharactersrepresentsystemelementsthatyoumustuseliterally,suchas commands,flags,pathnames,directories,filenames,values,andselectedmenuoptions.

BoldUnderlined BoldUnderlinedkeywordsaredefaults.Thesetakeeffectifyoufailtospecifyadifferent keyword.

Italic v Italicwordsorcharactersrepresentvariablevaluesthatyoumustsupply.

v Italicsarealsousedforpublicationtitlesandforgeneralemphasisintext.

| |

(14)

Table1.Typographicconventions (continued) Typographic

convention

Usage

Constant width Allofthefollowingaredisplayedinconstantwidthtypeface: v

Displayedinformation

v

Messagetext

v Exampletext

v Specifiedtexttypedbytheuser

v Fieldnamesasdisplayedonthescreen

v Promptsfromthesystem

v Referencestoexampletext

[] Bracketsencloseoptionalitemsinformatandsyntaxdescriptions.

{} Bracesenclosealistfromwhichyoumustchooseaniteminformatandsyntaxdescriptions. | Averticalbarseparatesitemsinalistofchoices.(Inotherwords,itmeans″or″)

<> Anglebrackets(less-thanandgreater-than)enclosethenameofakeyonthekeyboard.For example,<Enter>referstothekeyonyourterminalorworkstationthatislabeledwiththe wordEnter.

... Anellipsisindicatesthatyoucanrepeattheprecedingitemoneormoretimes.

<Ctrl-x> Thenotation<Ctrl-x>indicatesacontrolcharactersequence.Forexample,<Ctrl-c>means thatyouholddownthecontrolkeywhilepressing<c>.

\ Thecontinuationcharacterisusedinprogrammingexamplesinthispublicationforformatting purposes.

Prerequisite

and

related

information

Forupdatesto thispublication,seepublib.boulder.ibm.com/infocenter/clresctr/topic/ com.ibm.cluster.gpfs.doc/gpfsbooks.html.

Forthelatestsupportinformation,seetheGPFSFrequentlyAskedQuestionsat publib.boulder.ibm.com/ infocenter/clresctr/topic/com.ibm.cluster.gpfs.doc/gpfs_faqs/gpfsclustersfaq.html.

ISO

9000

ISO9000registeredquality systemswere usedinthedevelopmentandmanufacturingof thisproduct.

Using

LookAt

to

look

up

message

explanations

LookAtisanonlinefacilitythatletsyou lookupexplanationsformost oftheIBMmessages youencounter, aswellasfor somesystem abendsandcodes.You canuseLookAtfromthefollowinglocationsto find IBMmessageexplanationsfor ClustersforAIX®andLinux:

v TheInternet.YoucanaccessIBMmessageexplanationsdirectlyfromtheLookAtWebsite:

www.ibm.com/eserver/zseries/zos/bkserv/lookat

v Yourwirelesshandhelddevice.Youcan usetheLookAtMobileEditionwitha handhelddevicethathas

wirelessaccessandanInternetbrowser(for example,InternetExplorerforPocketPCs, Blazer,or Eudorafor PalmOS,orOperaforLinuxhandhelddevices).Link totheLookAtMobileEditionfromthe LookAtWebsite.

(15)

How

to

send

your

comments

Yourfeedbackisimportant inhelping usto produceaccurate,high-qualityinformation.If youhaveany commentsaboutthispublicationoranyotherGPFSdocumentation:

v Sendyourcommentsbye-mailto:[email protected]

Includethepublicationtitleandordernumber,and,if applicable,thespecificlocationof theinformation youhave commentson(for example,apagenumberoratable number).

v Filloutoneoftheforms atthebackofthispublicationandreturnitbymail,byfax, orbygivingitto an

IBMrepresentative.

TocontacttheIBMclusterdevelopmentorganization,sendyourcommentsbye-mailto: [email protected].

(16)
(17)

Summary

of

changes

Thefollowingsectionssummarizechangesto theGPFSlicensedprogramandtheGPFSlibraryfor version3release2. Withineach bookinthelibrary,a verticallineto theleftoftextandillustrations indicatestechnicalchangesor additionsmadetothepreviouseditionofthebook.

Summary of changes

for GPFS Version 3 Release 2 as updated, October 2007

ChangestoGPFSandtotheGPFSlibraryfor version3release 2include: v Newinformation:

– Inthepast, migratingto anewreleaseof GPFSrequiredshuttingdownGPFSandupgradingall

nodesbeforeGPFScouldberestarted.GPFSV3.2supportsrollingupgrades andalimitedformof backwardcompatibility:

- Rollingupgrades enableyou toinstallnew GPFScode onenodeat atimewithout shuttingdown

GPFSonothernodes.Itisexpectedthatallnodeswillbeupgradedwithinashort time.Some features becomeavailableoneach nodeassoonasthenodeisupgraded,whileotherfeatures become availableassoon asallparticipatingnodesareupgraded.

- Backwardcompatibilityallowsrunningwitha mixtureofoldandnew nodes.Multi-cluster

environmentsmay beableto upgradethelocalclusterwhilestillallowingmountsfromremote nodesinotherclustersthathave notyetbeenupgraded.

– Youcandesignateuptoeight NSDserversto simultaneouslyserviceI/Orequestsfromdifferent

clients.EachoftheseNSDserversmust havephysicalaccesstothesamelogicalunitnumber (LUN).DifferentserverscanserveI/Otodifferentnon-intersecting setsof clientsfora varietyof reasons,suchasloadbalancingontheserver,network partitioningbybalancingtheloadondifferent networks,orworkloadpartitioning. MultipleNSDserverfunctionsrequireall(peer)NSDserversto be partofthesameGPFScluster.TheexistingsubnetfunctionsinGPFSdeterminewhichNSDserver shouldserveaparticularclient.Theassumptionisthatnodeswithina subnetareconnectedvia high-speednetworks.

– GPFSforLinuxoffersClusteredNFS(CNFS)toenablehighly-availableNFSexportingoffile

systems.SomeorallnodesinanexistingGPFSclusteralsoserveNFSandarepartof theNFS cluster.Allof thenodesintheNFSclusterexportthesamefilesystemsto theNFSclients.This supportincludesthefollowing:

- Monitoring.Every nodeintheNFSclusterrunsanNFSmonitoringutilitythatmonitorsGPFSand

theNFSandnetworkingcomponentsonthenode.Afterfailuredetection,andbasedoncustomer configuration,themonitoringutilitymayinvokea failover.

- Failover.Theautomaticfailover proceduretransferstheNFSservingloadfromthefailingnodeto

anothernodeintheNFScluster.Thefailureismanaged bytheGPFScluster,includinglockand state recovery.

- Load balancing.TheIPaddressistheloadunitthatcan bemovedfromonenodeto another

becauseoffailureorload balancingneeds.This solutionsupports afailoverof allthenode’sload (allNFSIPaddresses)asoneunittoanothernode. However,ifnolocksareoutstanding,

individual IPaddressescanbemovedto othernodesforloadbalancingpurposes. – TheGPFSInfiniBand

®

Remote DirectMemoryAccess(RDMA)code usesRDMAfor NSCclientfile I/Orequests.RDMAtransfersdatadirectlybetweentheNSDclientmemoryandtheNSDserver memoryinsteadof sendingandreceivingthedataovertheTCPsocket.Using RDMAmayimprove bandwidthanddecreaseCPUutilization.

– GPFScanusetheSCSI-3(PersistentReserve)standardto providefastfailoverwithimproved

recoverytimes.Toexploitthisfunctionalitythefilesystem hastobecreatedonSCSI-3 capable disks.

(18)

Toenablethisfunction,youmustsetupPRwhenyoucreateeachNSD.Thebasicrequirements for PRarethattheNSDservermust beanAIXnode andthatthedisksmust beregularAIXhdisks.To enablePR,usethenewmmchconfigcommandoptioncalledusePersistentReserve.

– MonitoringisenabledwiththeSNMP-basedmanagement application,Net-SNMP.SNMPrequiresa

Linuxnodeinstalledinorderto collectthedata.Monitoringinvolvesgaining aviewoftheGPFS system.Monitoredinformationcanberoughlygroupedintothecategoriesofconfiguration,status, andperformance:

- Configurationdenotes theinitiallycustomizedaspectsofthesystem’scurrentstate.

- Status informationisdynamicinformationthatexpresses thecurrenthealthofnodes,disks,and

other hardware,includinganyreportederrorconditionsdiskutilization,andfragmentation. - Performanceinformationincludesquantitativemeasurementoftheworkingsof asystem.

– Performanceenhancementsincludesupportfor paralleldefragmentationofa filesystem,larger

pagepoolsupport,anddirectory-locking:

- Defragmentationof afilesystemcannow beruninparallelacrossnodesina cluster.

- Maximumpagepoolsupportisincreasedto 256GB.

- Directory-locking improvementsforconcurrentfilecreatesanddeletesfrommultiplenodes(this

function willbeavailableinAPARIZ01431).

– Whenasocket connectionbreaksduetoa networkfailure,GPFSnowtriesto re-establishthe

connectionratherthan immediatelyinitiatingnode expulsionprocedures. – Mountupto 256filesystems.

– GPFSV3.2extendsInformationLifecycleManagement(ILM)functionalityto integratewithHSM

products.Asinglesetofpoliciescanbeusedto movedata acrossdifferentstoragepoolsof afile systemandto movedata fromGPFSstoragepoolstonear-linestorageandfromnear-linestorage to GPFSstoragepools.Additionalenhancementsincludetheabilityto usepoliciesforbackup and restore.

- Subroutine gpfs_fputattrswithpathnamefor backupandrestorefunctionshasbeenadded.This

subroutinesets alltheextendedfileattributesforafile andinvokesthepolicyenginefor restoring files.

- Subroutinesgpfs_fgetattrsandgpfs_fputattrshavebeenenhancedwithnewflags.

– GPFSV3.2enablesthepolicycodeto runinparallelacrossallnodesinthehome clusterthathave

thefilesystemmounted.Thepolicyevaluation canthenscalewiththesizeofthecluster. – Administrationenhancements,whichinclude:

Newcommands:

- mmchnode,which changesnode attributes

- mmnsddiscover,which rediscoverspathstothespecifiednetworkshareddisksonthespecified

nodes

- mmtracectl,which setsupandenablesGPFStracing

Updatedcommands:

- mmapplypolicyto addthe-M,-N,and-Soptions

- mmchconfigto addseveralnew configurationparameters

- mmchdisk toaddthe-Foption

- mmchfs toaddfull orcompatto the-Voption

- mmchmgr toaddthe-coption

- mmchnsdto statethatyoucanspecifyupto 8NSDserversinthediskdescriptors

- mmcrfs:

v ToremovethemandatoryMountPointpositionalparameter

v Toaddthe-L and-Toptions

(19)

v Tochangethedefaultfor the-koptionfromposixtoall

v Toaddthe--version Versionoption

v Tochangethedefaultvalues forthe-Rand-Moptionsto 2

- mmcrnsd tostate thatyoucanspecifyupto 8NSDserversinthediskdescriptors andtochange

PrimaryServerandBackupServertoserverList

- mmdefragfsto addthe-Pand-Noptionsandto removethe-v option

- mmdfto removethe-qoption

- mmedquotato addDevice:Filesettothe-joption

- mmlscluster toaddthe--cnfsoptionwhich displaystheclusterNFS

- mmlsfs toaddtheall_localandall_remoteparameters

- mmlsmgr toaddthe-cand-Cparameters

- mmmount toaddtheall_localandall_remoteparameters

– Tracingisimprovedtoincreasethereliability oftracedatagathering.Itenhancestheabilityto

acquireaccurateandreliable problemdeterminationinformation.Withthenew mmtracectltrace command,youcan:

- Turn traceonoroffonthenextsessiontoautomaticallystarttrace whenGPFSstarts

- Allowfor predefinedtrace levels:io,all,def,anduser-specifiedtracelevels

- Change thesizeof tracebuffers

- Allowauser-defined directoryforkeeping tracefiles

- Control tracerecyclingduringdaemontermination:(off,local,global,or globalOnShutdown)

v Changedinformation:

– Theterminology"clusterconfigurationmanager"or "configurationmanager"hasbeenchanged to

"clustermanager". v Deletedinformation:

– Themmsanrepairfscommandhasbeenremoved.

– Allreferencesto SANergy

(20)
(21)

Chapter

1.

Introducing

General

Parallel

File

System

IBM’sGeneralParallelFileSystem(GPFS)providesfile systemservicestoparallel andserialapplications. GPFSallowsparallelapplicationssimultaneousaccesstothesamefiles,ordifferentfiles,fromanynode whichhastheGPFSfilesystemmountedwhilemanagingahigh levelof controloverallfilesystem operations.GPFSisparticularly appropriateinanenvironmentwheretheaggregate peakneedfor data bandwidthexceedsthecapability ofa distributedfilesystemserver.

GPFSallowsuserssharedfileaccesswithina singleGPFSclusterandacrossmultipleGPFSclusters.A GPFScluster consistsof:

v AIX5Lnodes,Linuxnodes,ora combinationthereof(see“GPFSclusterconfigurations”onpage 6).A

nodemay be:

– Anindividual operatingsystemimageonasinglecomputer withinacluster.

– Asystem partitioncontaininganoperatingsystem.Some IBMSystemp5

andIBMSystemp®,

machinesallowmultiplesystempartitions,eachof whichisconsideredtobeanode withintheGPFS cluster.

v Networkshareddisks(NSDs)createdandmaintainedbytheNSDcomponentof GPFS

– AlldisksutilizedbyGPFSmustfirstbegivenagloballyaccessibleNSDname.

– TheGPFSNSDcomponentprovidesamethodforcluster-widedisknamingandaccess.

– OnLinuxmachinesrunningGPFS,youmaygiveanNSDnameto:

- Physicaldisks

- Logical partitionsofadisk

- Representationsof physicaldisks(suchasLUNs)

– OnAIXmachinesrunningGPFS,youmaygive anNSDnameto:

- Physicaldisks

- Virtual shareddisks

- Representationsof physicaldisks(suchasLUNs)

v Asharednetworkfor GPFScommunicationsallowingasinglenetwork viewof theconfiguration.A

singlenetwork,aLANor aswitch,isused forGPFScommunication,includingtheNSDcommunication.

The

strengths

of

GPFS

GPFSisa powerfulfilesystem offering:

v “Sharedfile systemaccessamongGPFSclusters”

v “Improved systemperformance”onpage2

v “Fileconsistency”onpage3

v “Highrecoverabilityandincreaseddataavailability”onpage3

v “Enhanced systemflexibility”onpage3

v “Simplifiedstoragemanagement”onpage4

v “Simplifiedadministration”onpage4

Shared

file

system

access

among

GPFS

clusters

GPFSallowsuserssharedaccesstofilesineithertheclusterwherethefile systemwascreatedor other GPFSclusters.Eachsite inthenetworkismanaged asa separatecluster,whileallowingsharedfile systemaccess.Whenmultipleclustersareconfiguredto accessthesameGPFSfilesystem,OpenSecure SocketsLayer(OpenSSL)isusedto authenticateandcheckauthorizationfor allnetworkconnections. |

(22)

Note: If youuseacipher,thedatawillbeencryptedfor transmissions.However, ifyousetthecipherlist

keywordof themmauthcommandto AUTHONLY,onlyauthenticationwillbeusedfordata transmissions anddatawillnotbeencrypted.

GPFSsharedfilesystem accessprovidesfor:

v Theabilityof theclustergrantingaccessto specifymultiplesecuritylevels, upto oneforeach

authorizedcluster.

v Ahighlyavailable serviceasthelocalclustermay remainactivepriorto changingsecuritykeys.

Periodicchangingofkeysisnecessaryfor avarietyofreasons, including:

– In orderto makeconnection rateperformanceacceptableinlargeclusters,thesizeofthesecurity

keysusedfor authenticationcannotbevery large.As aresultitmaybenecessaryto change securitykeysinordertopreventa givenkey frombeingcompromisedwhileitisstillinuse. – Asa matterof policy,someinstitutionsmayrequiresecuritykeysarechangedperiodically.

Note: Thepairof publicandprivatesecuritykeysprovidedbyGPFSaresimilarto hostbased

authentication mechanismprovidedbyOpenSSH.EachGPFSclusterhasapairofthesekeysthat identify thecluster.Inaddition,eachclusteralso hasanauthorized_keyslist.Eachline inthe authorized_keys listcontainsthepublickey ofoneremoteclusterandalistof filesystemsthat clusterisauthorizedto mount.For detailsonsharedfilesystem access,see theGPFS:Advanced

AdministrationGuide.

Improved

system

performance

UsingGPFStostoreandretrieveyourfilescanimprovesystem performanceby:

v Allowingmultipleprocesses orapplicationsonallnodesintheclustersimultaneousaccessto thesame

fileusingstandardfilesystem calls.

v Increasingaggregate bandwidthofyourfile systembyspreadingreads andwritesacrossmultipledisks.

v Balancingtheloadevenlyacrossalldisksto maximizetheircombinedthroughput.Onediskisnomore

activethananother.

v Supporting verylargefileandfilesystem sizes.

v Allowingconcurrentreadsandwritesfrommultiplenodes.Thisisakey conceptinparallelprocessing.

v Allowingfordistributedtoken(lock)management.Distributingtokenmanagement reducessystem

delaysassociatedwithalockableobjectwaitingto obtaininga token.Referto “Managingdistributed tokens”onpage24and“Highrecoverabilityandincreaseddata availability”onpage3 foradditional informationontokenmanagement.

v Allowingforthespecification ofdifferentnetworksforGPFSdaemoncommunicationandfor GPFS

administrationcommandusagewithinyourcluster.

Achievinghighthroughputto asingle,large filerequiresstripingdataacrossmultipledisksandmultiple diskcontrollers. Ratherthan relyingonstripinginaseparate volumemanagerlayer,GPFSimplements stripinginthefile system.Managingitsownstripingaffords GPFSthecontrolit needsto achievefault toleranceandto balanceloadacrossadapters, storagecontrollers,anddisks.LargefilesinGPFSare dividedintoequalsizedblocks,andconsecutiveblocks areplacedondifferentdisksinaround-robin fashion1

Toexploitdiskparallelismwhenreadingalargefilefromasingle-threaded application,wheneverit can recognizeapattern,GPFSprefetchesdatainto itsbufferpool, issuingI/Orequests inparallel toasmany disksasnecessarytoachievethebandwidthofwhich theswitchingfabriciscapable.GPFSrecognizes sequential,reversesequential,andvarious formsofstridedaccesspatterns1.

(23)

GPFSI/Operformancemay bemonitoredthroughthemmpmon command.SeetheGPFS:Advanced

AdministrationGuide.

File

consistency

GPFSusesasophisticated tokenmanagement systemtoprovide dataconsistencywhileallowingmultiple independentpathstothesamefilebythesamenamefromanywhereinthecluster. SeeChapter9, “GPFSarchitecture,” onpage 71.

High

recoverability

and

increased

data

availability

GPFSfailover supportallowsyouto organizeyourhardware intofailuregroups.Afailuregroup isa setof disksthatsharea commonpointof failurethatcouldcausethemallto becomesimultaneously

unavailable.WhenusedinconjunctionwiththereplicationfeatureofGPFS,thecreationofmultiplefailure groupsprovidesforincreasedfileavailabilityshould agroupof disksfail.GPFSmaintainseachinstanceof replicateddataandmetadataondisksindifferentfailuregroups.Shoulda setofdisksbecome

unavailable,GPFSfailsovertothereplicatedcopiesinanotherfailuregroup.

Duringconfiguration,youassignareplicationfactor toindicatethetotalnumberofcopiesof dataand metadatayouwishto store.Replicationallowsyoutosetdifferentlevelsof protectionfor eachfileor one levelforanentirefilesystem.Sincereplicationusesadditionaldiskspaceandrequiresextra writetime, youmightwantto considerreplicatingonly filesystemsthatarefrequentlyreadfrombutseldomwrittento. Toreducetheoverheadinvolvedwiththereplicationofdata, youmayalsochoose toreplicateonly

metadataasameansof providingadditionalfilesystem protection.ForfurtherinformationonGPFS replication,see“Filesystem recoverabilityparameters”onpage36.

GPFSisa loggingfilesystemthatcreatesseparate logsforeach node.Theselogsrecordtheallocation andmodificationofmetadataaidinginfastrecoveryandtherestorationofdataconsistency intheeventof nodefailure.Evenifyou donotspecifyreplicationwhencreating afilesystem,GPFSautomatically replicatesrecoverylogs inseparatefailuregroups,if multiplefailuregroupshavebeenspecified.This replicationfeaturecanbeused inconjunctionwithotherGPFScapabilitiesto maintainonereplicaina geographicallyseparatelocationwhichprovidessomecapability forsurvivingdisastersat theother location.For furtherinformationonfailuregroups,see “NSDcreationconsiderations” onpage 25.For furtherinformationondisasterrecoverywithGPFSseetheGPFS:AdvancedAdministrationGuide. Onceyourfilesystemiscreated,itcan beconfiguredtomountwhenevertheGPFSdaemonisstarted. Thisfeatureassuresthatwheneverthesystemanddisksareup,thefilesystem willbeavailable.When utilizingsharedfilesystem accessamongGPFSclusters,toreduceoverallGPFScontroltraffic youmay indicateto mountthefilesystem whenit isfirstaccessed.Thisisdonethrougheitherthemmremotefs commandor themmchfscommandusingthe-Aautomount option.GPFSmounttrafficmaybelessened byusingautomaticmountsinsteadof mountingat GPFSstartup.Automaticmountsonly produce

additionalcontroltrafficatthepoint thatthefile systemisfirstusedbyanapplicationoruser. Mountingat GPFSstartupontheotherhandproducesadditionalcontroltraffic ateveryGPFSstartup.Thusstartupof hundredsof nodesat oncemay bebetterservedbyusingautomaticmounts.However,whenexportingthe filesystemfor NetworkFileSystem(NFS)mounts, itmightbeusefulto mountthefilesystem whenGPFS isstarted.For furtherinformationonsharedfile systemaccessandtheuseof NFSwithGPFS,seethe

GPFS:AdministrationandProgrammingReference.

Enhanced

system

flexibility

WithGPFS,yoursystemresourcesarenotfrozen.Youcanaddordeletediskswhilethefilesystemis mounted.Whenthetimeisrightandsystemdemand islow,youcanrebalancethefilesystemacrossall currentlyconfigureddisks.In addition,youcanalso addordeletenodeswithouthavingto stopandrestart theGPFSdaemononallnodes.

(24)

Note: In thenodequorumwithtiebreakerdiskconfiguration,GPFShasalimitofeight quorumnodes.If

youaddquorumnodesandexceedthatlimit,theGPFSdaemonmust beshutdown. Beforeyou restart thedaemon,youmustswitchquorumsemanticstonode quorum.Foradditionalinformation, referto “Quorum”onpage14.

Ina SANconfigurationwhereyouhavealso definedNSDservers, ifthephysicalconnectionto thediskis broken,GPFSdynamically switchesdiskaccessto theserversnodesandcontinuesto providedata throughNSDservernodes.GPFSfallsbacktolocaldiskaccesswhenithasdiscovered thepathhas beenrepaired.

AfterGPFShasbeen configuredfor yoursystem,dependingonyourapplications,hardware,and

workload,youcanre-configureGPFStoincreasethroughput.YoucansetupyourGPFSenvironmentfor yourcurrentapplicationsandusers,secureintheknowledgethatyoucanexpandinthefuturewithout jeopardizingyour data.GPFScapacitycan growasyourhardwareexpands.

Simplified

storage

management

GPFSprovidesstoragemanagementbasedonthedefinitionanduseof: v Storage pools

v Policies

v Filesets

Storagepools

Astoragepoolisacollectionof disksorRAIDswithsimilarpropertiesthataremanaged together

asa group.Storage poolsprovidea methodtopartitionstorageonthefilesystem.While youplan howto configureyourstorage,considerfactorssuchas:

v Improvedprice-performance bymatchingthecost ofstoragetothevalueofthedata.

v Improvedperformanceby:

– Reducingthecontentionforpremium storage

– Reducingtheimpactofslower devices

v Improvedreliability byprovidingfor:

– Replicationbasedonneed

– Betterfailurecontainment

Policies

Filesareassignedtoa storagepool basedondefinedpolicies.Policiesprovidefor: v Placingfilesina specificstoragepool whenthefilesarecreated

v Migratingfilesfromonestoragepoolto another

v Filedeletion basedonfilecharacteristics

v Snapshotmetadata scansandfilelistcreation

Filesets

Filesetsprovidea methodfor partitioningafile systemandallowadministrativeoperationsat a

finergranularitythantheentirefilesystem.Forexamplefilesetsallowyouto: v Definedatablockandinodequotasatthefilesetlevel

v Apply policyrulesto specificfilesets

Forfurtherinformationonstoragepools,filesets,andpoliciessee theGPFS:AdvancedAdministration Guide.

Simplified

administration

GPFSoffersmanyof thestandardUNIX® filesysteminterfacesallowingmostapplicationsto execute withoutmodificationorrecompiling.UNIXfilesystemutilitiesarealsosupportedbyGPFS.Thatis,users |

(25)

cancontinueto usetheUNIXcommandstheyhavealwaysused forordinaryfileoperations(see Chapter11,“ConsiderationsforGPFSapplications,”onpage95).Theonly uniquecommands arethose foradministeringtheGPFSfilesystem.

GPFSadministration commandsaresimilar inname andfunctiontoUNIXfilesystem commands,withone importantdifference:theGPFScommandsoperateonmultiplenodes.AsingleGPFScommandperforms afilesystem functionacrosstheentirecluster. Seetheindividualcommands asdocumented intheGPFS:

AdministrationandProgrammingReference.

GPFScommands saveconfigurationandfilesysteminformationinoneormorefiles,collectivelyknown as GPFScluster configurationdatafiles.TheGPFSadministrationcommands aredesignedtokeep these filessynchronized betweeneachother andwiththeGPFSsystemfilesoneach nodeinthecluster, therebyprovidingfor accurateconfigurationdata.See“Clusterconfigurationdatafiles”onpage85.

The

basic

GPFS

structure

MostGPFSadministrationtaskscanbeperformedfromanynoderunningGPFS.Seetheindividual commandsasdocumentedintheGPFS:AdministrationandProgrammingReference.

GPFSisa clusteredfilesystem definedoveranumberofnodes.Oneach nodeinthecluster, GPFS consistsof:

1. “GPFSadministration commands”

2. “TheGPFSkernelextension”

3. “TheGPFSdaemon”

4. FornodesinyourclusteroperatingwiththeLinuxoperating system,“TheGPFSopensource

portabilitylayer”onpage6

Foradetailed discussionofGPFS,seeChapter9,“GPFSarchitecture,”onpage71.

GPFS

administration

commands

MostGPFSadministrationtaskscanbeperformedfromanynoderunningGPFS.Seetheindividual commandsasdocumentedintheGPFS:AdministrationandProgrammingReference.

The

GPFS

kernel

extension

TheGPFSkernelextensionprovidestheinterfacesto theoperating systemvnodeandvirtualfilesystem (VFS)interfacesforaddingafilesystem.Structurally,applicationsmakefile systemcallstotheoperating system,which presentsthemto theGPFSfilesystem kernelextension. Inthisway,GPFSappearsto applicationsasjustanotherfilesystem.TheGPFSkernelextensionwilleithersatisfy theserequests using resourceswhich arealreadyavailableinthesystem,or sendamessageto theGPFSdaemonto complete therequest.

The

GPFS

daemon

TheGPFSdaemonperformsallI/Oandbuffermanagement forGPFS.Thisincludesread-aheadfor sequentialreadsandwrite-behindforallwritesnotspecifiedassynchronous.AllI/Oisprotected byGPFS tokenmanagement whichhonorsatomicitytherebyprovidingfor dataconsistencyofa filesystemon multiplenodes.

Thedaemonisa multithreadedprocess withsomethreadsdedicatedtospecificfunctions.Dedicated threadsforservicesrequiringpriorityattentionarenotusedfor orblockedbyroutinework.Thedaemon alsocommunicates withinstancesof thedaemononothernodesto coordinateconfigurationchanges, recoveryandparallelupdates ofthesamedatastructures.Specific functionsthatexecuteonthedaemon include:

(26)

1. Allocationofdiskspacetonewfilesandnewlyextendedfiles.This isdone incoordinationwiththefile

systemmanager.

2. Managementof directoriesincludingcreationofnew directories,insertionandremoval ofentriesinto

existingdirectories, andsearchingof directoriesthatrequireI/O.

3. Allocationofappropriatelockstoprotecttheintegrityof dataandmetadata.Locksaffecting datathat

maybeaccessedfrommultiplenodesrequireinteractionwiththetokenmanagementfunction. 4. DiskI/Oisinitiatedonthreadsofthedaemon.

5. Usersecurityandquotasarealso managedbythedaemoninconjunctionwiththefile system

manager.

TheGPFSNetworkSharedDisk(NSD)componentprovidesamethodforcluster-widedisknamingand high-speedaccessto dataforapplicationsrunningonnodesthatdonothavedirectaccesstothedisks. TheNSDsinyourclustermay bephysicallyattached toallnodesorservetheirdatathrougha NSD serverthatprovidesavirtualconnection.You areallowedtospecifyuptoeightNSDserversforeach NSD.Ifoneserverfails,thenextserveronthelisttakescontrolfromthefailednode.

Fora givendisk,eachof itsNSDservers musthavephysicalaccesstothesameLUN.However,different serverscanserveI/Oto differentnon-intersectingsets ofclients.TheexistingsubnetfunctionsinGPFS determinewhich NSDservershouldservea particularclient.

Note: GPFSassumesthatnodeswithinasubnetareconnectedusinghigh-speednetworks.For

additionalinformationonsubnetconfiguration,referto“UsingpublicandprivateIPaddressesfor GPFSnodes”onpage 78.

GPFSdeterminesifanode hasphysicalorvirtualconnectivityto anunderlyingNSDthroughasequence ofcommandsinvokedfromtheGPFSdaemon.This determinationiscalleddiskdiscoveryandoccursat bothinitialGPFSstartupaswellaswheneverafilesystemismounted.

Thedefaultorderof accessused indiskdiscovery:

1. Local/devblockdevice interfacesfor virtualshareddisk,SAN,SCSIorIDEdisks

2. NSDservers

ThisordercanbechangedwiththeuseNSDserver mountoption.

ItissuggestedthatyoualwaysdefineNSDserversfor thedisks. InaSANconfigurationwhereNSD servershavealso beendefined,if thephysicalconnection isbroken,GPFSdynamicallyswitches tothe servernodesandcontinuesto providedata.GPFSfallsbacktolocaldiskaccesswhenithas discovered thepathhasbeenrepaired.Thisisthedefaultbehavior,andit canbechanged withtheuseNSDserver file systemmountoption.

Forfurtherinformation, see“Diskconsiderations”onpage24and“NSDdiskdiscovery”onpage84.

The

GPFS

open

source

portability

layer

ForLinuxnodesrunningGPFS,youmustbuildcustomportabilitymodulesbasedonyourparticular hardwareplatformandLinuxdistributionto enablecommunicationbetweentheLinuxkernelandtheGPFS kernelmodules.See“BuildingyourGPFSportabilitylayer”onpage45.

GPFS

cluster

configurations

GPFScanbeconfiguredinavarietyof ways.Asubsetof theseconfigurationsincludes:

1. ConfigurationswherealldisksareSAN-attachedtoallnodesintheclusterandthenodesinthecluster

areeither allLinuxor allAIX(refertoFigure1onpage7).For thelatesthardware thatGPFShas | | | | | | | | | | | | | | | | | | | | | |

(27)

com.ibm.cluster.gpfs.doc/gpfs_faqs/gpfsclustersfaq.html

2. ALinux-onlyclusterconsistingofxSeries

®,IBMSystemp5,IBMSystemp, oreServerprocessorswith

anNSDserverattached tothedisk(referto Figure2). Nodesthatarenotdirectlyattachedto thedisk haveremotedataaccessoverthelocalareanetwork(either Ethernetor Myrinet)totheNSDserver. AnNSDserverwithdirectFibreChannelaccesstothediskscanalso bedefined.Anynodesdirectly attachedto thediskwillnotaccessdatathroughtheNSDserver.Thisisthedefaultbehavior,which canbechanged withtheuseNSDserver filesystemmountoption.

3. AnAIXandLinuxclusterwithanNSDserver(referto Figure3onpage8). Nodesnotdirectlyattached

tothediskhaveremoteaccessoverthelocalareanetworkto theNSDserver.

Figure1.ALinux-onlyclusterwithdisksthatareSAN-attachedtoallnodes

Figure2.ALinux-onlyclusterwithanNSDserver

|

(28)

4. AnAIXandLinuxclusterwithanNSDserver(referto Figure4).AIXnodeswithRecoverableVirtual

SharedDisk(RVSD)andIBMHighPerformanceSwitch(HPS)connectionstothediskserveraccess datathroughthatpath.Linuxnodesor otherAIXnodeswithnoRVSDswitchaccesshave remotedata accessovertheLANto theNSDserver.Youcanalso defineadditionalNSDserverswithaccessto thediskthroughRVSDandtheswitch.

5. AnAIXandLinuxclusterthatprovidesremoteaccesstodisksthroughmultipleNSDservers(referto

Figure5 onpage 9).Inthis configuration: v InternodecommunicationusestheLAN

v AlldiskaccesspassesthroughNSDservers

– NSDserversconnectthediskto theLinuxnodes

– NSDserversuseRVSDtoconnect thediskto theAIXnodes

Figure3.AnAIXandLinuxclusterwithanNSDserver

HPS

GPFS

NSD

Linux

Application

GPFS

NSD server

AIX

Application

GPFS

NSD

AIX

Application

RVSD

RVSD

Local Area Network

Figure4.AnAIXandLinuxclusterprovidingremoteaccesstodisksthroughtheHighPerformanceSwitch(HPS)for theAIXnodesandaLANconnectionfortheLinuxnodes

| |

(29)

6. AnAIXclusterwithanNSDserver(referto Figure6). Nodesnotdirectlyattached tothediskhave

remoteaccessovertheHPSto theNSDserver.

7. SharedfilesystemaccessamongmultipleGPFSclusters(referto Figure7 onpage 10).TheGPFS

clusterssharingfilesystem accessmay beanysupportedconfiguration.

HPS

GPFS

NSD server

Linux

Application

GPFS

NSD server

AIX

Application

Application

GPFS

NSD server

AIX

GPFS

NSD server

Linux

Application

RVSD

RVSD

Local Area Network

Figure5.AnAIXandLinuxclusterthatprovidesremoteaccesstodisksthroughmultipleNSDservers

GPFS

NSD

AIX

Application

GPFS

NSD server

AIX

Application

GPFS

NSD

AIX

Application

High Performance Switch

Figure6.AnAIXclusterwithanNSDserver

|

| | |

(30)

Forthelatestlistof supportedclusterconfigurations,pleasesee theGPFSFAQat publib.boulder.ibm.com/ infocenter/clresctr/topic/com.ibm.cluster.gpfs.doc/gpfs_faqs/gpfsclustersfaq.html.

Interoperable

cluster

requirements

ConsulttheGPFSFAQatpublib.boulder.ibm.com/infocenter/clresctr/topic/com.ibm.cluster.gpfs.doc/ gpfs_faqs/gpfsclustersfaq.htmlforanychangestorequirements andcurrentlytested:

1. Hardwareconfigurations

2. Softwareconfigurations

3. Clusterconfigurations

Priorto GPFS3.2,upgradingyoursystem toanew versionofGPFSrequiredshuttingdownGPFSand upgradingallnodesbeforeyou couldrestart GPFS.However,ifyou areupgradingaGPFS3.1clusterto GPFS3.2,you canperforma rollingupgradewithalimitedformof backwardcompatibility.Rolling upgradesallowyouto installnewGPFScode onenodeata timewithoutshutting downGPFSonother nodes.However,youmust upgradeallnodeswithina shorttime.Thetimedependencyexistsbecause someGPFS3.2featuresbecomeavailableoneachnode assoon asthenode isupgraded, whileother featureswillnotbecome availableuntilyouupgradeallparticipatingnodes.Onceallnodeshave been migratedtothenew code,youmust finalizethemigrationbyrunningthecommandsmmchconfig release=LATESTandmmchfs -Vall(ormmchfs compat).Oncethisisdone,youcan createnew file systemsandusesuchnew featuresasPersistentReserve(PR)andmultipleNSDservers.

Limitedbackwardcompatibilityallowsyoutotemporarilyoperateyourclusterwithamixtureof oldand newnodes.In addition,GPFSrequiresbackwardcompatibilityfor multi-clusterenvironments.With backwardcompatibility,theadministratorshould beableto upgradethelocalclusterwhilestillallowing mountsfromremotenodesinotherclustersthathave notbeenupgradedyet.For additionalinformation, refertoChapter6, “Migration,coexistenceandcompatibility,” onpage51.

Theseconfigurationrequirementsapplyto aninteroperable GPFScluster:

v Allfilesystemsdefinedonversionsof GPFSpriorto version2.3mustbeexportedfromtheiroldcluster

definitionandre-importedintoanewlycreated3.2cluster.Cluster configurationdependenciesand setupchangedsignificantly inGPFSversion2.3.See“Migratingto GPFS3.2fromGPFS2.2orearlier releasesofGPFS”onpage52.

v Allnodesservingasetof NSDsmustbeona homogenoussetofLinuxorAIXnodes.AnNSDcannot

besplitbetweenoperatingsystem types.SeeFigure5onpage9.

Figure7.GPFSclustersprovidingsharedfilesystemaccess

| | | | | | | | | | | | | | | | |

(31)

v For mostdisksubsystems,allnodesaccessingaSAN-attacheddisk(LUN)mustusethesame

operating system.Most disksubsystemsdonotallowyoutohave LinuxnodesandAIXnodesattached to thesameLUN.Refertotheinformationsuppliedwithyourspecificdisksubsystemfor detailsabout supportedconfigurations.

v Yourclustercan haveamix ofnodeswithGPFS3.1(RDMAnotavailable),GPFS3.2withRDMA

configured,andGPFS3.2withoutRDMAconfigured.OnlytheGPFS3.2nodeswithRDMAconfigured willuseRDMAfor datatransferbetweentheNSDclientandserver.

| | |

(32)
(33)

Chapter

2.

Planning

for

GPFS

PlanningforGPFSincludes:

v “Hardwarerequirements”

v “Softwarerequirements”

v “Recoverabilityconsiderations” onpage14

v “GPFSclustercreationconsiderations” onpage 20

v “Diskconsiderations” onpage 24

v “Filesystem creationconsiderations”onpage30

AlthoughyoucanmodifyyourGPFSconfigurationafterithasbeenset,a littleconsiderationbefore installationandinitialsetupwillrewardyouwitha moreefficientandimmediatelyusefulfilesystem.During configuration,GPFSrequiresyouto specifyseveral operationalparametersthatreflectyourhardware resourcesandoperating environment.Duringfilesystem creation,youhave theopportunitytospecify parametersbasedontheexpectedsizeofthefilesorallowthedefaultvaluesto takeeffect.These parametersdefinethedisksforthefilesystem andhowdatawillbewrittentothem.

Hardware

requirements

1. Pleaseconsult theGPFSFAQat publib.boulder.ibm.com/infocenter/clresctr/topic/

com.ibm.cluster.gpfs.doc/gpfs_faqs/gpfsclustersfaq.htmlforlatestlistof: v Supportedhardware

v Testeddiskconfigurations

v Maximumclustersize

2. Enoughdisksto containthefilesystem.Diskscanbe:

v SAN-attachedto eachnodeinthecluster

v Attachedto oneormoreNSDservers

v Amixtureof directly-attacheddisksanddisksthatareattached toNSDservers

Referto “NSDcreation considerations”onpage25foradditionalinformation.

3. SinceGPFSpassesa largeamountofdatabetweenitsdaemons,it issuggestedthatyouconfigurea

dedicatedhighspeednetworksupportingtheIPprotocol whenyou areusingGPFS: v WithNSDdisksconfiguredwithserversprovidingremotediskcapability

v MultipleGPFSclustersprovidingremotemountingofandaccesstoGPFSfilesystems

Referto theGPFS:AdvancedAdministrationGuideforadditionalinformation.

GPFScommunications requireinvariantstaticIPaddressesfor eachspecificGPFSnode.AnyIPaddress takeoveroperationswhichtransfertheaddresstoanothercomputer arenotallowedfortheGPFS

network.Other IPaddresseswithinthesamecomputer whicharenotused byGPFScanparticipateinIP takeover.GPFScanusevirtualIPaddressescreatedbyaggregatingseveralnetwork adaptersusing techniquessuchasEtherChannelorchannel bonding.

Software

requirements

Pleaseconsult theGPFSFAQat publib.boulder.ibm.com/infocenter/clresctr/topic/com.ibm.cluster.gpfs.doc/ gpfs_faqs/gpfsclustersfaq.htmlforthelatestlistof:

v Linuxdistributions

v Linuxkernelversions

v AIXenvironments

| |

(34)

v OpenSSLlevels

Note: Whenmultipleclustersareconfiguredto accessthesameGPFSfilesystemOpenSSLisused to

authenticateandcheckauthorizationforallnetwork connections.Inaddition,ifyouusea cipher, datawillbeencryptedfortransmissions.However,if yousetthecipherlistkeywordofthe mmauthcommandtoAUTHONLY,onlyauthentication willbeusedfor datatransmissionsand datawillnotbeencrypted.

Recoverability

considerations

Soundfilesystem planningrequiresseveraldecisions aboutrecoverability.Afteryoumakethese decisions,GPFSparametersenableyoutocreateahighlyavailablefilesystem withfastrecoverability fromfailures:

v Atthefilesystem level,considerreplicationthroughthemetadataanddatareplicationparameters.See

“Filesystem recoverabilityparameters”onpage36.

v Atthedisklevel,considerpreparingdisksfor usewithyourfilesystembyspecifyingfailuregroupsthat

areassociatedwitheachdisk.Withthisconfiguration,informationisnotvulnerabletoa singlepointof failure.See“NSDcreation considerations”onpage25.

Additionally,GPFSprovidesseverallayersofprotectionagainstfailuresof varioustypes: 1. “Nodefailure”

2. “NetworkSharedDiskserveranddiskfailure”onpage 17

Node

failure

Intheeventofanode failure,GPFS:

v PreventsthecontinuationofI/Ofromthefailingnode

v Replaysthefilesystem metadatalogforthefailingnode

GPFSpreventsthecontinuationof I/OfromafailingnodethroughaGPFS-specificfencingmechanism calleddiskleasing.Whenanode hasaccessto filesystems,itobtains diskleasesthatallowit tosubmit I/O.However,whenanode fails,thatnode cannotobtainor renewa disklease.WhenGPFSselects anothernodetoperformrecoveryfor thefailingnode, itfirstwaits untilthediskleasefor thefailingnode expires.Thisallowsforthecompletionof previouslysubmittedI/Oandprovidesfor aconsistent filesystem metadatalog.Waiting forthediskleaseto expirealso avoidsdatacorruptioninthesubsequentrecovery step.For furtherinformationonrecoveryfromnodefailure,seetheGPFS:ProblemDetermination Guide. Filesystemrecovery fromnodefailureshouldnotbenoticeableto applicationsrunningonothernodes. Theonlynoticeableeffect maybea delayinaccessingobjectsbeingmodifiedonthefailingnode. Recoveryinvolvesrebuildingmetadatastructureswhichmayhave beenundermodificationat thetime of thefailure.If thefailingnodeisthefile systemmanagerforthefilesystem,thedelaywillbelongerand proportionaltotheactivityonthefilesystem atthetimeof failure.However,administrativeintervention will notbeneeded.

Note: If usePersistentReserveisenabled,GPFSpreventsthecontinuationofI/Ofromafailingnode by

fencing thefailednodeusingPersistentReserve(SCSI-3 protocol).PersistentReserveallowsthe failingnode torecoverfaster.GPFSdoesnotneedtowaitfor thediskleaseonthefailingnodeto expire.For additionalinformation,referto“Reducedrecoverytime usingPersistentReserve”on page 20.

Quorum

Duringnodefailuresituations,quorumneedstobemaintainedinordertorecoverthefailingnodes.If quorumisnotmaintaineddueto nodefailure,GPFSunmountslocalfilesystemsontheremainingnodes |

| | | |

(35)

andattemptsto reestablishquorum,atwhichpoint filesystemrecovery occurs.Forthis reasonit is

importantthatthesetofquorumnodesbecarefullyconsidered(referto“Selectingquorumnodes”onpage 17for additionalinformation).

GPFSquorummust bemaintainedwithin thecluster forGPFStoremainactive.Ifthequorumsemantics arebroken,GPFSperformsrecoveryinanattempttoachievequorumagain.GPFScanuseoneoftwo methodsfor determiningquorum:

v Nodequorum

v Nodequorumwithtiebreakerdisks.

Nodequorum: Nodequorumisthedefaultquorumalgorithmfor GPFS.Withnodequorum:

v Quorumisdefinedasoneplushalfof theexplicitlydefinedquorumnodesintheGPFScluster.

v Therearenodefaultquorumnodes,youmust specifywhichnodeshave thisrole.

v GPFSdoesnotlimitthenumberof quorumnodes.

Forexample,inFigure8, therearesix quorumnodes.In thisconfiguration,GPFSremains activeaslong astherearefourquorumnodesavailable.

Nodequorumwithtiebreakerdisks: Nodequorumwithtiebreakerdisksallowsyouto runwithaslittle

asonequorumnodeavailable aslong asyouhave accessto amajority ofthequorumdisks(referto Figure9 onpage 17).Switchingto quorumwithtiebreakerdisksisaccomplished byindicatingalist ofone tothreedisksto useonthetiebreakerDisks parameteronthemmchconfigcommand.

Whenutilizingnodequorumwithtiebreakerdisks,therearespecificrulesfor clusternodesandfor tiebreakerdisks.

Clusternoderules:

1. Thereisamaximumofeight quorumnodes.

2. Youshouldincludetheprimaryandsecondaryclusterconfigurationserversasquorumnodes.

High Performance Switch

q q

s - secondary cluster configuration server p - primary cluster configuration server

q q nq - non-quorum node q - quorum node NSD - NSD server NSD NSD q NSD nq nq NSD nq q nq s p

Figure8.GPFSconfigurationutilizingnodequorum

|

| | |

(36)

3. Youmayhave anunlimitednumberofnon-quorumnodes.

Changingquorumsemantics:

1. Ifyouexceedeight quorumnodes,you mustdisablenode quorumwithtiebreakerdisks andrestart

GPFSdaemon usingthedefaultnodequorumconfiguration.Todisable nodequorumwithtiebreaker disks:

a. ShutdowntheGPFSdaemonbyissuingmmshutdown-a onallnodes.

b. Changequorumsemanticsbyissuingmmchconfigtiebreakerdisks=no.

c. Addquorumnodes.

d. RestarttheGPFSdaemon byissuingmmstartup-aonallnodes.

2. Ifyouremovequorumnodesandthenew configurationhaslessthaneight quorumnodes,youcan

changetheconfigurationto nodequorumwithtiebreakerdisks.Toenablequorumwithtiebreaker disks:

a. ShutdowntheGPFSdaemonbyissuingmmshutdown-a onallnodes.

b. Deletethequorumnodes.

c. Changequorumsemanticsbyissuingthemmchconfigtiebreakerdisks=″diskList″command.

v ThediskListcontainsthenamesof thetiebreakerdisks.

v ThelistcontainstheNSDnamesof thedisks,preferablyoneor threedisks,separatedbya

semicolon (;)andenclosedbyquotes.

d. RestarttheGPFSdaemon byissuingmmstartup-aonallnodes.

Tiebreakerdiskrules:

v Youcan haveone,two, orthreetiebreakerdisks. However,youshoulduseanoddnumberof

tiebreakerdisks.

v Tiebreakerdisksmust bedefinedthroughthemmcrnsdcommand.

v Tiebreakerdisksmust useoneoffollowingattachmentstothequorumnodes:

– fibre-channelSAN

– IPSAN

– virtualshareddisks

InFigure9onpage17GPFSremains activewiththeminimumofasingle availablequorumnode andtwo availabletiebreakerdisks.

(37)

Selecting

quorum

nodes

Toconfigureasystem withefficient quorumnodes,follow theserules: v Selectnodesthatarelikelyto remainactive

– Ifa nodeislikelytoberebootedorrequiremaintenance,donotselectthatnodeasa quorumnode.

v Selectnodesthathave differentfailurepointssuchas:

– Nodeslocatedindifferentracks

– Nodesconnectedto differentpowerpanels

v Youshould selectnodesthatGPFSadministrativeandservingfunctionsrelyonsuchas:

– Primaryconfigurationservers

– Secondaryconfigurationservers

– NetworkSharedDiskservers

v Selectanoddnumberof nodesasquorumnodes

– Thesuggestedmaximumissevenquorumnodes.

v Havingalargenumberof quorumnodesmayincreasethetimerequiredforstartupandfailurerecovery.

– Havingmorethansevenquorumnodesdoes notguarantee higheravailability.

Network

Shared

Disk

server

and

disk

failure

Thethreemostcommonreasons whydatabecomesunavailableare: v Diskfailure

v Diskserverfailurewithnoredundancy

v Failureofa pathtothedisk

p

q

NSD

s

q

NSD

t

t

t

s

- secondary cluster

configuration server

p

- primary cluster

configuration server

nq

nq

nq

- non-quorum node

local area network

q

- quorum node

NSD - NSD server

t

- tiebreaker disk

Figure9.GPFSconfigurationutilizingnodequorumwithtiebreakerdisks

|

| | |

(38)

Intheeventofadiskfailureinwhich GPFScannolongerreadorwriteto thedisk,GPFSwilldiscontinue useofthediskuntilit returnsto anavailablestate.Youcanguardagainstlossof dataavailabilityfrom diskfailureby:

v Utilizing hardwaredatareplicationasprovidedbya RedundantArrayofIndependentDisks(RAID)

device

v Utilizing theGPFSdataandmetadata replicationfeatures(see“Highrecoverabilityandincreaseddata

availability”onpage3)alongwiththedesignationoffailuregroups(see“NSDcreation considerations” onpage25)

Figure10.RAID/ESSControllertwin-tailedinaSANconfiguration

GPFS

NSD Server

Disk Controller

GPFS

NSD Server

P

Figure11.GPFSconfigurationspecifyingmultipleNSDserversconnectedtoacommondiskcontrollerutilizingRAID5 withfourdatadisksandoneparitydisk

|

| | |

(39)

Ingeneral,it issuggestedthatyouconsiderRAIDasthefirst levelof redundancyforyour dataandadd GPFSreplication ifyoudesireadditionalprotection.

IntheeventofadiskserverfailureinwhichGPFScannolongercontactthenodethatprovidesremote accessto adisk,GPFSwillagaindiscontinueuseofthedisk.You canguardagainstlossofdiskserver availabilitybyusingcommon diskconnectivityonmultiplenodesandspecifyingmultipleNetworkShared Diskserversforthecommondisk.

Intheeventoffailureofa pathtothedisk:

v If avirtualshareddiskservergoesdownandGPFSreportsadiskfailure,followtheinstructionsinthe

RSCTforAIX5LManagingSharedDisksmanualforthelevelofyoursystem tocheckthestate ofthe

virtualshareddiskpath tothedisk.

v If aSANfailureremoves thepath tothediskandGPFSreportsadiskfailure,follow thedirections

suppliedbyyourstoragevendorto distinguishaSANfailurefroma diskfailure. Youcanguardagainst lossof dataavailabilityfromfailureof apathto adiskby:

v CreatingmultipleNSDserversforalldisks.AsGPFSdeterminestheavailableconnectionstodisksin

thefilesystem,itisrecommendedthatyou alwaysdefineNSDserversforthedisks.GPFSallowsyou to defineupto eightNSDserversforeachNSD.In aSANconfigurationwhereNSDservershavealso beendefined, ifthephysicalconnectionisbroken,GPFSdynamicallyswitchesto thenextavailable NSDserver(asdefinedontheserverlist)andcontinuesto providedata.WhenGPFSdiscoversthat thepathhasbeenrepaired, itfallsbackto localdiskaccess.This isthedefaultbehavior,whichcanbe changed withthe-ouseNSDserverfilesystem mountoptiononthemmchfs,mmmount,

mmremotefs,andmountcommands.

v UsingtheMultiplePathI/O(MPIO) featureofAIXtodefinealternatepathsto adeviceforfailover

purposes.Failoverisapath-management algorithmthatimprovesthereliabilityandavailabilityofa devicebecausethesystemautomaticallydetects whenoneI/OpathfailsandreroutesI/Othroughan alternatepath.AllSmall ComputerSystemInterface (SCSI)SelfConfiguredSCSIDrive(SCSD)disk drivesareautomaticallyconfiguredasMPIOdevices.Otherdevicescanbesupported,providingthe devicedriver iscompatiblewiththeMPIO implementationinAIX.For moreinformationaboutMPIO,see the:

AIX5LVersion 5.3SystemManagement Concepts:OperatingSystemandDevicesbookandsearch

onMulti-pathI/O.

AIX5LVersion 5.3SystemManagement Guide:OperatingSystemandDevicesbook andsearchon

Multi-pathI/O.

local area network

NSD

server

NSD

server

NSD

server

NSD

server

failure group 1

failure group 2

Figure12.GPFSutilizesfailuregroupstominimizetheprobabilityofaservicedisruptionduetoasinglecomponent failure | | | | | | | | | | | | |

(40)

v UseSubsystemDeviceDriver(SDD)orSubsystemDeviceDriverPathControlModule(SDDPCM)to

give theAIXhosttheabilityto accessmultiplepathsto asingleLUNwithinanEnterpriseStorage Server®(ESS).This abilitytoaccessasinglelogical unitnumber(LUN)onmultiplepathsallowsfora higherdegreeof dataavailabilityintheeventofa pathfailure.Datacancontinuetobeaccessedwithin theESSaslongasthereisatleastoneavailablepath.Withoutoneoftheseinstalled,youwilllose accesstotheLUNintheeventofa pathfailure.Foradditionalinformationabout:

– SSD,refertohttp://www.ibm.com/server/storage/support/software/sdd/

– SDDPCM,refertohttp://www.ibm.com/support/docview.wss?uid=ssg1S4000201

Reduced

recovery

time

using

Persistent

Reserve

PersistentReserve(PR)providesa mechanismfor reducingrecoverytimesfromnodefailures.Toenable PRandtoobtainrecoveryperformanceimprovements,yourclusterrequiresa specificenvironment: v AllnodesmustberunningAIX

v AlldisksmustbePR-capable

v Alldisksmustbehdisks

YoumustexplicitlyenablePRusingtheusePersistentReserve optionof themmchconfigcommand.If yousetusePersistentReserve=yes,GPFSwillattemptto setupPRonallthePRcapable disks.All subsequentNSDswillbecreatedwithPRenabledif theyarePRcapable.However,PRwillonlybe supportedinthehomecluster.Therefore,remotemountsmustaccessPRdisksthroughanNSDserver thatisinthehomecluster.

GPFS

cluster

creation

considerations

YoucreateGPFSclustersbyissuingthemmcrclustercommand.Table2 details: v TheGPFSclustercreationoptionsprovided bythemmcrclustercommand

v Howto changetheoptions

v Thedefaultvaluesareforeach option

Note: RefertotheGPFS:AdvancedAdministrationGuidefor informationonaccessingGPFSfilesystems

inremoteclustersandlargeclusteradministration.

Table2.GPFSclustercreationoptions

Clusteroption Commandtochangethe

option Defaultvalue

“NodesinyourGPFScluster”onpage 21

Addnodesthroughthe

mmaddnodecommandor deletenodesthroughthe

mmdelnodecommand

None

Nodedesignation:Managerorclient, see“NodesinyourGPFScluster”on page21

mmchnode Client

Nodedesignation:Quorumor

non-quorum,see“NodesinyourGPFS cluster”onpage21

mmchnode Non-quorum

Primaryclusterconfigurationserver, see“GPFSclusterconfiguration servers”onpage22

mmchcluster None

Secondaryclusterconfigurationserver, see“GPFSclusterconfiguration servers”onpage22

mmchcluster None

“Remoteshellcommand”onpage22 mmchcluster /usr/bin/rsh

| | | | | | | | | | | | |

(41)

Table2.GPFSclustercreationoptions (continued)

Clusteroption Commandtochangethe

option Defaultvalue

“Remotefilecopycommand”onpage

23 mmchcluster /usr/bin/rcp

“Clustername”onpage23

mmchcluster ThenodenameoftheprimaryGPFS

clusterconfigurationserver GPFSadministrationadapterport

name,see“GPFSnodeadapter interfacenames”

mmchnode

SameastheGPFScommunications adapterportname

GPFScommunicationsadapterport name,see“GPFSnodeadapter interfacenames”

mmchnode None

“UserIDdomainforthecluster”on

page23 mmchconfig ThenameoftheGPFScluster

“StartingGPFSautomatically”onpage

23 mmchconfig No

“Clusterconfigurationfile”onpage23 mmchconfig None “Managingdistributedtokens”onpage

24 mmchconfig Yes

GPFS

node

adapter

interface

names

Anadapterinterfacename referstothehostnameorIPaddressthatGPFSusesto communicatewitha node.Specifically,thehostnameorIPaddressidentifiesthecommunicationsadapteroverwhichthe GPFSdaemonsorGPFSadministrationcommandscommunicate.GPFSpermitstheadministrator to specifytwo nodeadapterinterfacenamesforeach nodeinthecluster:

GPFSnodename

Specifiesthename ofthenodeadapterinterfaceto beused bytheGPFSdaemonsforinternode communication.

GPFSadmin nodename

Specifiesthename ofthenodeadapterinterfaceto beused byGPFSadministrationcommands whencommunicatingbetweennodes.If notspecified,theGPFSadministrationcommands usethe samenodeadapterinterface usedbytheGPFSdaemons.

Thesenamescanbespecifiedbymeansofthenodedescriptorspassed tothemmaddnode, mmchcnode,ormmcrclustercommand.

Nodes

in

your

GPFS

cluster

WhenyoucreateyourGPFSclusteryou mustprovideafilecontaining alistof nodedescriptors,oneper line,for eachnodeto beincludedinthecluster.GPFSstoresthis informationonthe“GPFScluster configurationservers”onpage22.Eachdescriptormustbespecifiedintheform:

NodeName:NodeDesignations:AdminNodeName

NodeName

Thehost nameorIPaddressof thenodeforGPFSdaemon-to-daemoncommunication. Thehost nameorIPaddressthatisused foranode mustrefertothecommunicationadapter overwhichtheGPFSdaemonscommunicate.Alias namesarenotallowed.You canspecifyanIP addressatNSDcreation,butit willbeconvertedtoahost namethatmustmatchtheGPFSnode name.Youcan specifyanode usingany oftheseforms:

v Shorthostname(forexample,h135n01)

| | | | | |

References

Related documents

The P-to-Ca ratio changed from 1.03 (milk) to 0.81 in the case of the final retentate (DF-W); however, there appeared to be no visible difference between the soluble casein levels

Forward all completed forms and credentials listed above to the Human Resources EPA Contract Office Employment Background Check (Submit to Legal Affairs) - Background Check

responses during the seeking test after 1 day of abstinence and B) illustrates the responses after the 40 day abstinence period. GluA2 receptor expression did not interact with

Sin embargo, el ODR también puede ofrecer lecciones importantes para otros mecanismos más tradicionales de resolución de disputas por lo que a la mejora de la responsabilidad y

If the reading range of malicious readers can be increased, e-passports become more vulnerable to threats such as tracking as the tags used are based on the ISO/IEC 14443

Included are best practices and configuration guidelines for EMC VMAX storage and Lenovo System x Server with IBM GPFS and XFS file systems in a SAP HANA TDI implementation

Index A accessibility x, 47 adapter attributes by adapter action 37 descriptions 35 permissions 35 customization options 21 steps 21 features 1 installation dispatcher requirement

directory integrator connector 1 dispatcher adapter architecture 1 installation, verifying 9 download, software 7 E education x F fields creating a group 33 creating a user 34