General
Parallel
File
System
Concepts,
Planning,
and
Installation
Guide
Version
3
Release
2
General
Parallel
File
System
Concepts,
Planning,
and
Installation
Guide
Version
3
Release
2
Note:
Beforeusingthisinformationandtheproductitsupports,besuretoreadthegeneralinformationunder“Notices”onpage 99.
SecondEdition(October2007)
Thiseditionappliestoversion3release2ofIBMGeneralParallelFileSystemMultiplatform(programnumber 5724-N94),IBMGeneralParallelFileSystemforPOWER(programnumber5765-G66),andtoallsubsequent releasesandmodificationsuntilotherwiseindicatedinneweditions.Significantchangesoradditionstothetextand illustrationsareindicatedbyaverticalline(|)totheleftofthechange.
IBMwelcomesyourcomments.Aformforyourcommentsmaybeprovidedatthebackofthispublication,oryou mayaddressyourcommentstothefollowing:
InternationalBusinessMachinesCorporation DepartmentH6MA,MailStationP181 2455SouthRoad
Poughkeepsie,NY12601-5400 UnitedStatesofAmerica
FAX(UnitedStatesandCanada):1+845+432-9405 FAX(OtherCountries):
YourInternationalAccessCode+1+845+432-9405
IBMLink
™(UnitedStatescustomersonly):IBMUSM10(MHVRCFS)
Internete-mail:[email protected]
Ifyouwouldlikeareply,besuretoincludeyourname,address,telephonenumber,orFAXnumber. Makesuretoincludethefollowinginyourcommentornote:
v Titleandordernumberofthisbook
v Pagenumberortopicrelatedtoyourcomment
WhenyousendinformationtoIBM,yougrantIBManonexclusiverighttouseordistributetheinformationinany wayitbelievesappropriatewithoutincurringanyobligationtoyou.
| | | |
Contents
Figures . . . vii
Tables. . . ix
Aboutthispublication . . . xi
Whoshouldreadthispublication . . . xi
Conventionsusedinthispublication . . . xi
Prerequisiteandrelatedinformation. . . xii
ISO9000 . . . xii
UsingLookAttolookupmessageexplanations . . . xii
Howtosend yourcomments . . . xiii
Summaryofchanges . . . xv
Chapter1. IntroducingGeneralParallelFileSystem . . . 1
Thestrengthsof GPFS. . . 1
SharedfilesystemaccessamongGPFSclusters . . . 1
Improvedsystemperformance . . . 2
Fileconsistency . . . 3
Highrecoverability andincreaseddata availability . . . 3
Enhancedsystemflexibility . . . 3
Simplifiedstoragemanagement. . . 4
Simplifiedadministration . . . 4
ThebasicGPFSstructure. . . 5
GPFSadministration commands . . . 5
TheGPFSkernelextension . . . 5
TheGPFSdaemon . . . 5
TheGPFSopensourceportabilitylayer. . . 6
GPFScluster configurations . . . 6
Interoperableclusterrequirements . . . 10
Chapter2. PlanningforGPFS . . . 13
Hardwarerequirements . . . 13
Softwarerequirements . . . 13
Recoverabilityconsiderations . . . 14
Nodefailure . . . 14
NetworkSharedDiskserveranddiskfailure . . . 17
ReducedrecoverytimeusingPersistentReserve. . . 20
GPFScluster creationconsiderations . . . 20
GPFSnode adapterinterface names . . . 21
NodesinyourGPFScluster . . . 21
GPFScluster configurationservers . . . 22
Remoteshellcommand . . . 22
Remotefilecopycommand. . . 23
Clustername . . . 23
UserIDdomainfor thecluster. . . 23
StartingGPFSautomatically . . . 23
Clusterconfigurationfile . . . 23
Managingdistributedtokens . . . 24
Diskconsiderations. . . 24
NSDcreation considerations . . . 25
NSDserverconsiderations . . . 28
Filesystemdescriptorquorum. . . 29 ||
Filesystemcreationconsiderations . . . 30
Devicename ofthefilesystem . . . 32
Listofdiskdescriptors. . . 32
NFSV4’deny-writeopen lock’. . . 32
Disksfor yourfilesystem . . . 33
Decidinghowthefilesystem ismounted . . . 33
Blocksize . . . 33
atimevalues . . . 34
mtimevalues . . . 34
Blockallocation map . . . 34
Filesystemauthorization. . . 35
Strictreplication . . . 35
Internallogfile . . . 35
Filesystemrecoverabilityparameters . . . 36
Numberofnodesmountingthefilesystem . . . 36
Maximumnumberof files . . . 37
Mountpointdirectory . . . 37
Assignmountcommandoptions . . . 37
Automaticquota activation . . . 37
EnableDMAPI . . . 38
Asamplefilesystem creation . . . 38
Chapter3.StepstoestablishingandstartingyourGPFScluster . . . 41
Chapter4.InstallingGPFSon Linuxnodes . . . 43
Filesto easetheinstallationprocess . . . 43
Verifyingthelevelofprerequisitesoftware . . . 43
Installationprocedures . . . 43
Acceptingtheelectroniclicenseagreement . . . 44
CreatingtheGPFSdirectory . . . 44
InstallingtheGPFSmanpages . . . 44
InstallingGPFSoveranetwork . . . 45
VerifyingtheGPFSinstallation . . . 45
BuildingyourGPFSportabilitylayer. . . 45
Usingtheautomaticconfigurationtoolto buildGPFSportabilitylayer . . . 45
Chapter5.InstallingGPFSonAIX5Lnodes . . . 47
Filesto easetheinstallationprocess . . . 47
Verifyingthelevelofprerequisitesoftware . . . 47
Installationprocedures . . . 47
Acceptingtheelectroniclicenseagreement . . . 48
CreatingtheGPFSdirectory . . . 48
CreatingtheGPFSinstallationtableofcontentsfile . . . 48
InstallingtheGPFSmanpages . . . 48
InstallingGPFSoveranetwork . . . 48
ReconcilingexistingGPFSfiles . . . 49
VerifyingtheGPFSinstallation . . . 49
Chapter6.Migration,coexistenceandcompatibility . . . 51
Migratingto GPFS3.2fromGPFS3.1. . . 51
Migratingto GPFS3.2fromGPFS2.3. . . 51
Migratingto GPFS3.2fromGPFS2.2orearlierreleasesof GPFS . . . 52
Completingthemigrationtoa newlevelofGPFS. . . 54
Additionalconsiderations whenmigratingGPFS2.3andearlierfilesystems. . . 55
Revertingtothepreviouslevelof GPFS . . . 55
Revertingtoa previouslevelofGPFSwhenyouhavenotissuedmmchconfigrelease=LATEST 56 || || || || || || || || || ||
Revertingtoa previouslevelofGPFSwhenyouhaveissued mmchconfigrelease=LATEST. . . . 56
Coexistence . . . 57
Compatibility . . . 57
ApplyingmaintenancetoyourGPFSsystem . . . 57
Chapter7. ConfiguringandtuningyoursystemforGPFS . . . 59
Generalsystemconfigurationandtuningconsiderations . . . 59
Clocksynchronization . . . 59
GPFSadministration security . . . 59
Cacheusage . . . 60
GPFSI/O . . . 61
Accesspatterns . . . 61
Aggregatenetworkinterfaces . . . 62
Swapspace . . . 62
Linuxconfigurationandtuningconsiderations . . . 62
updatedbconsiderations . . . 63
SUSELINUX considerations . . . 63
GPFShelperthreads . . . 63
CommunicationsI/O . . . 63
DiskI/O . . . 64
AIXconfigurationandtuningconsiderations. . . 65
CommunicationsI/O . . . 65
DiskI/O . . . 65
Switchpool. . . 65
eServerHighPerformanceSwitch . . . 66
IBMVirtualShared Disk . . . 66
GPFSuse withOracle. . . 67
Chapter8. StepstopermanentlyuninstallGPFS . . . 69
Chapter9. GPFSarchitecture . . . 71
Specialmanagementfunctions . . . 71
TheGPFSclustermanager. . . 71
Thefile systemmanager. . . 72
Themetanode . . . 73
Useof diskstorageandfilestructure withinaGPFSfilesystem . . . 73
Quotafiles . . . 75
GPFSrecovery logs . . . 75
GPFSandmemory. . . 76
Pinnedandnon-pinnedmemory . . . 76
GPFSandnetwork communication . . . 77
GPFSdaemon communication . . . 77
GPFSadministration commands . . . 79
ApplicationanduserinteractionwithGPFS . . . 79
Operatingsystem commands . . . 79
Operatingsystem calls . . . 80
GPFScommandprocessing . . . 83
NSDdiskdiscovery. . . 84
Recovery . . . 85
Clusterconfigurationdatafiles. . . 85
GPFSbackup data . . . 86
Chapter10.IBM VirtualSharedDiskconsiderations . . . 87
Virtualshareddiskserverconsiderations . . . 87
Diskdistribution . . . 87
Diskconnectivity. . . 88 ||
Virtualshareddiskcreationconsiderations . . . 88
Virtualshareddiskserveranddiskfailure . . . 91
Chapter11.ConsiderationsforGPFSapplications . . . 95
ExceptionstoOpenGrouptechnicalstandards . . . 95
Determiningif afilesystem iscontrolledbyGPFS . . . 95
GPFSexceptionsandlimitationstoNFSV4ACLs . . . 96
AccessibilityfeaturesforGPFS . . . 97
Accessibilityfeatures . . . 97
Keyboardnavigation . . . 97
IBMandaccessibility . . . 97
Notices . . . 99 Trademarks . . . 100 Glossary . . . 103 Index . . . 107 || || || ||
Figures
1. ALinux-onlyclusterwithdisksthatareSAN-attachedto allnodes . . . 7
2. ALinux-onlyclusterwithanNSDserver. . . 7
3. AnAIXandLinuxclusterwithanNSDserver. . . 8
4. AnAIXandLinuxclusterprovidingremoteaccesstodisksthroughtheHighPerformanceSwitch (HPS)for theAIXnodesandaLANconnectionfor theLinuxnodes . . . 8
5. AnAIXandLinuxclusterthatprovidesremoteaccesstodisksthroughmultipleNSDservers 9 6. AnAIXclusterwithanNSDserver. . . 9
7. GPFSclustersprovidingsharedfile systemaccess . . . 10
8. GPFSconfigurationutilizingnodequorum . . . 15
9. GPFSconfigurationutilizingnodequorumwithtiebreakerdisks. . . 17
10. RAID/ESSControllertwin-tailed ina SANconfiguration. . . 18
11. GPFSconfigurationspecifyingmultipleNSDserversconnectedtoa commondiskcontroller utilizingRAID5withfourdatadisksandoneparitydisk. . . 18
12. GPFSutilizesfailuregroupstominimizetheprobabilityof aservicedisruptionduetoa single componentfailure . . . 19
13. GPFSfileshavea typicalUNIXstructure . . . 74
14. Basicfailuregroupswithserversanddisks . . . 89
15. Failuregroupswithtwin-taileddisks . . . 90
16. Primarynodeserving RAIDdevice . . . 91
17. BackupnodeservingRAIDdevice . . . 92
18. RAID/ESSControllermulti-tailed totheprimaryandsecondaryvirtualshareddiskservers. . . . 92
19. Concurrentnodeservingdevice . . . 93
|| || || | || | ||
Tables
1. Typographicconventions . . . xi
2. GPFSclustercreation options . . . 20
3. Diskdescriptorusagefor theGPFSdiskcommands. . . 27
About
this
publication
TheGeneralParallelFileSystem:Concepts,Planning, andInstallationGuidedescribes:
v TheIBM
®
GeneralParallelFileSystem™(GPFS™)Multiplatformlicensedprogram,5724-N94 v TheIBMGPFSfor POWER
™licensedprogram,5765-G66
Thispublicationincludesinformationabout thesetopics: v IntroducingGPFS
v Planningconceptsfor GPFS
v SNMPsupport
v InstallingGPFS
v Migration,coexistenceandcompatibility
v Applying maintenance
v Configurationandtuning
v Stepsto uninstallGPFS
Who
should
read
this
publication
Thispublicationisintended forsystem administrators,analysts,installers,planners,andprogrammersof GPFSclusters.
Itassumesthatyouareveryexperiencedwithandfullyunderstandtheoperating systemsonwhichyour clusterisbased.
Usethis publicationif youare: v Planningfor GPFS
v InstallingGPFSonasupportedcluster configuration,consistingof:
– Linux
®nodes
– AIX5L
™
nodes
– Aninteroperable clustercomprisedofbothoperating systems
Conventions
used
in
this
publication
Table1describesthetypographicconventionsusedinthispublication.
Table1.Typographicconventions Typographic
convention
Usage
Bold Boldwordsorcharactersrepresentsystemelementsthatyoumustuseliterally,suchas commands,flags,pathnames,directories,filenames,values,andselectedmenuoptions.
BoldUnderlined BoldUnderlinedkeywordsaredefaults.Thesetakeeffectifyoufailtospecifyadifferent keyword.
Italic v Italicwordsorcharactersrepresentvariablevaluesthatyoumustsupply.
v Italicsarealsousedforpublicationtitlesandforgeneralemphasisintext.
| |
Table1.Typographicconventions (continued) Typographic
convention
Usage
Constant width Allofthefollowingaredisplayedinconstantwidthtypeface: v
Displayedinformation
v
Messagetext
v Exampletext
v Specifiedtexttypedbytheuser
v Fieldnamesasdisplayedonthescreen
v Promptsfromthesystem
v Referencestoexampletext
[] Bracketsencloseoptionalitemsinformatandsyntaxdescriptions.
{} Bracesenclosealistfromwhichyoumustchooseaniteminformatandsyntaxdescriptions. | Averticalbarseparatesitemsinalistofchoices.(Inotherwords,itmeans″or″)
<> Anglebrackets(less-thanandgreater-than)enclosethenameofakeyonthekeyboard.For example,<Enter>referstothekeyonyourterminalorworkstationthatislabeledwiththe wordEnter.
... Anellipsisindicatesthatyoucanrepeattheprecedingitemoneormoretimes.
<Ctrl-x> Thenotation<Ctrl-x>indicatesacontrolcharactersequence.Forexample,<Ctrl-c>means thatyouholddownthecontrolkeywhilepressing<c>.
\ Thecontinuationcharacterisusedinprogrammingexamplesinthispublicationforformatting purposes.
Prerequisite
and
related
information
Forupdatesto thispublication,seepublib.boulder.ibm.com/infocenter/clresctr/topic/ com.ibm.cluster.gpfs.doc/gpfsbooks.html.
Forthelatestsupportinformation,seetheGPFSFrequentlyAskedQuestionsat publib.boulder.ibm.com/ infocenter/clresctr/topic/com.ibm.cluster.gpfs.doc/gpfs_faqs/gpfsclustersfaq.html.
ISO
9000
ISO9000registeredquality systemswere usedinthedevelopmentandmanufacturingof thisproduct.
Using
LookAt
to
look
up
message
explanations
LookAtisanonlinefacilitythatletsyou lookupexplanationsformost oftheIBMmessages youencounter, aswellasfor somesystem abendsandcodes.You canuseLookAtfromthefollowinglocationsto find IBMmessageexplanationsfor ClustersforAIX®andLinux:
v TheInternet.YoucanaccessIBMmessageexplanationsdirectlyfromtheLookAtWebsite:
www.ibm.com/eserver/zseries/zos/bkserv/lookat
v Yourwirelesshandhelddevice.Youcan usetheLookAtMobileEditionwitha handhelddevicethathas
wirelessaccessandanInternetbrowser(for example,InternetExplorerforPocketPCs, Blazer,or Eudorafor PalmOS,orOperaforLinuxhandhelddevices).Link totheLookAtMobileEditionfromthe LookAtWebsite.
How
to
send
your
comments
Yourfeedbackisimportant inhelping usto produceaccurate,high-qualityinformation.If youhaveany commentsaboutthispublicationoranyotherGPFSdocumentation:
v Sendyourcommentsbye-mailto:[email protected]
Includethepublicationtitleandordernumber,and,if applicable,thespecificlocationof theinformation youhave commentson(for example,apagenumberoratable number).
v Filloutoneoftheforms atthebackofthispublicationandreturnitbymail,byfax, orbygivingitto an
IBMrepresentative.
TocontacttheIBMclusterdevelopmentorganization,sendyourcommentsbye-mailto: [email protected].
Summary
of
changes
Thefollowingsectionssummarizechangesto theGPFSlicensedprogramandtheGPFSlibraryfor version3release2. Withineach bookinthelibrary,a verticallineto theleftoftextandillustrations indicatestechnicalchangesor additionsmadetothepreviouseditionofthebook.
Summary of changes
for GPFS Version 3 Release 2 as updated, October 2007
ChangestoGPFSandtotheGPFSlibraryfor version3release 2include: v Newinformation:
– Inthepast, migratingto anewreleaseof GPFSrequiredshuttingdownGPFSandupgradingall
nodesbeforeGPFScouldberestarted.GPFSV3.2supportsrollingupgrades andalimitedformof backwardcompatibility:
- Rollingupgrades enableyou toinstallnew GPFScode onenodeat atimewithout shuttingdown
GPFSonothernodes.Itisexpectedthatallnodeswillbeupgradedwithinashort time.Some features becomeavailableoneach nodeassoonasthenodeisupgraded,whileotherfeatures become availableassoon asallparticipatingnodesareupgraded.
- Backwardcompatibilityallowsrunningwitha mixtureofoldandnew nodes.Multi-cluster
environmentsmay beableto upgradethelocalclusterwhilestillallowingmountsfromremote nodesinotherclustersthathave notyetbeenupgraded.
– Youcandesignateuptoeight NSDserversto simultaneouslyserviceI/Orequestsfromdifferent
clients.EachoftheseNSDserversmust havephysicalaccesstothesamelogicalunitnumber (LUN).DifferentserverscanserveI/Otodifferentnon-intersecting setsof clientsfora varietyof reasons,suchasloadbalancingontheserver,network partitioningbybalancingtheloadondifferent networks,orworkloadpartitioning. MultipleNSDserverfunctionsrequireall(peer)NSDserversto be partofthesameGPFScluster.TheexistingsubnetfunctionsinGPFSdeterminewhichNSDserver shouldserveaparticularclient.Theassumptionisthatnodeswithina subnetareconnectedvia high-speednetworks.
– GPFSforLinuxoffersClusteredNFS(CNFS)toenablehighly-availableNFSexportingoffile
systems.SomeorallnodesinanexistingGPFSclusteralsoserveNFSandarepartof theNFS cluster.Allof thenodesintheNFSclusterexportthesamefilesystemsto theNFSclients.This supportincludesthefollowing:
- Monitoring.Every nodeintheNFSclusterrunsanNFSmonitoringutilitythatmonitorsGPFSand
theNFSandnetworkingcomponentsonthenode.Afterfailuredetection,andbasedoncustomer configuration,themonitoringutilitymayinvokea failover.
- Failover.Theautomaticfailover proceduretransferstheNFSservingloadfromthefailingnodeto
anothernodeintheNFScluster.Thefailureismanaged bytheGPFScluster,includinglockand state recovery.
- Load balancing.TheIPaddressistheloadunitthatcan bemovedfromonenodeto another
becauseoffailureorload balancingneeds.This solutionsupports afailoverof allthenode’sload (allNFSIPaddresses)asoneunittoanothernode. However,ifnolocksareoutstanding,
individual IPaddressescanbemovedto othernodesforloadbalancingpurposes. – TheGPFSInfiniBand
®
Remote DirectMemoryAccess(RDMA)code usesRDMAfor NSCclientfile I/Orequests.RDMAtransfersdatadirectlybetweentheNSDclientmemoryandtheNSDserver memoryinsteadof sendingandreceivingthedataovertheTCPsocket.Using RDMAmayimprove bandwidthanddecreaseCPUutilization.
– GPFScanusetheSCSI-3(PersistentReserve)standardto providefastfailoverwithimproved
recoverytimes.Toexploitthisfunctionalitythefilesystem hastobecreatedonSCSI-3 capable disks.
Toenablethisfunction,youmustsetupPRwhenyoucreateeachNSD.Thebasicrequirements for PRarethattheNSDservermust beanAIXnode andthatthedisksmust beregularAIXhdisks.To enablePR,usethenewmmchconfigcommandoptioncalledusePersistentReserve.
– MonitoringisenabledwiththeSNMP-basedmanagement application,Net-SNMP.SNMPrequiresa
Linuxnodeinstalledinorderto collectthedata.Monitoringinvolvesgaining aviewoftheGPFS system.Monitoredinformationcanberoughlygroupedintothecategoriesofconfiguration,status, andperformance:
- Configurationdenotes theinitiallycustomizedaspectsofthesystem’scurrentstate.
- Status informationisdynamicinformationthatexpresses thecurrenthealthofnodes,disks,and
other hardware,includinganyreportederrorconditionsdiskutilization,andfragmentation. - Performanceinformationincludesquantitativemeasurementoftheworkingsof asystem.
– Performanceenhancementsincludesupportfor paralleldefragmentationofa filesystem,larger
pagepoolsupport,anddirectory-locking:
- Defragmentationof afilesystemcannow beruninparallelacrossnodesina cluster.
- Maximumpagepoolsupportisincreasedto 256GB.
- Directory-locking improvementsforconcurrentfilecreatesanddeletesfrommultiplenodes(this
function willbeavailableinAPARIZ01431).
– Whenasocket connectionbreaksduetoa networkfailure,GPFSnowtriesto re-establishthe
connectionratherthan immediatelyinitiatingnode expulsionprocedures. – Mountupto 256filesystems.
– GPFSV3.2extendsInformationLifecycleManagement(ILM)functionalityto integratewithHSM
products.Asinglesetofpoliciescanbeusedto movedata acrossdifferentstoragepoolsof afile systemandto movedata fromGPFSstoragepoolstonear-linestorageandfromnear-linestorage to GPFSstoragepools.Additionalenhancementsincludetheabilityto usepoliciesforbackup and restore.
- Subroutine gpfs_fputattrswithpathnamefor backupandrestorefunctionshasbeenadded.This
subroutinesets alltheextendedfileattributesforafile andinvokesthepolicyenginefor restoring files.
- Subroutinesgpfs_fgetattrsandgpfs_fputattrshavebeenenhancedwithnewflags.
– GPFSV3.2enablesthepolicycodeto runinparallelacrossallnodesinthehome clusterthathave
thefilesystemmounted.Thepolicyevaluation canthenscalewiththesizeofthecluster. – Administrationenhancements,whichinclude:
Newcommands:
- mmchnode,which changesnode attributes
- mmnsddiscover,which rediscoverspathstothespecifiednetworkshareddisksonthespecified
nodes
- mmtracectl,which setsupandenablesGPFStracing
Updatedcommands:
- mmapplypolicyto addthe-M,-N,and-Soptions
- mmchconfigto addseveralnew configurationparameters
- mmchdisk toaddthe-Foption
- mmchfs toaddfull orcompatto the-Voption
- mmchmgr toaddthe-coption
- mmchnsdto statethatyoucanspecifyupto 8NSDserversinthediskdescriptors
- mmcrfs:
v ToremovethemandatoryMountPointpositionalparameter
v Toaddthe-L and-Toptions
v Tochangethedefaultfor the-koptionfromposixtoall
v Toaddthe--version Versionoption
v Tochangethedefaultvalues forthe-Rand-Moptionsto 2
- mmcrnsd tostate thatyoucanspecifyupto 8NSDserversinthediskdescriptors andtochange
PrimaryServerandBackupServertoserverList
- mmdefragfsto addthe-Pand-Noptionsandto removethe-v option
- mmdfto removethe-qoption
- mmedquotato addDevice:Filesettothe-joption
- mmlscluster toaddthe--cnfsoptionwhich displaystheclusterNFS
- mmlsfs toaddtheall_localandall_remoteparameters
- mmlsmgr toaddthe-cand-Cparameters
- mmmount toaddtheall_localandall_remoteparameters
– Tracingisimprovedtoincreasethereliability oftracedatagathering.Itenhancestheabilityto
acquireaccurateandreliable problemdeterminationinformation.Withthenew mmtracectltrace command,youcan:
- Turn traceonoroffonthenextsessiontoautomaticallystarttrace whenGPFSstarts
- Allowfor predefinedtrace levels:io,all,def,anduser-specifiedtracelevels
- Change thesizeof tracebuffers
- Allowauser-defined directoryforkeeping tracefiles
- Control tracerecyclingduringdaemontermination:(off,local,global,or globalOnShutdown)
v Changedinformation:
– Theterminology"clusterconfigurationmanager"or "configurationmanager"hasbeenchanged to
"clustermanager". v Deletedinformation:
– Themmsanrepairfscommandhasbeenremoved.
– Allreferencesto SANergy
Chapter
1.
Introducing
General
Parallel
File
System
IBM’sGeneralParallelFileSystem(GPFS)providesfile systemservicestoparallel andserialapplications. GPFSallowsparallelapplicationssimultaneousaccesstothesamefiles,ordifferentfiles,fromanynode whichhastheGPFSfilesystemmountedwhilemanagingahigh levelof controloverallfilesystem operations.GPFSisparticularly appropriateinanenvironmentwheretheaggregate peakneedfor data bandwidthexceedsthecapability ofa distributedfilesystemserver.
GPFSallowsuserssharedfileaccesswithina singleGPFSclusterandacrossmultipleGPFSclusters.A GPFScluster consistsof:
v AIX5Lnodes,Linuxnodes,ora combinationthereof(see“GPFSclusterconfigurations”onpage 6).A
nodemay be:
– Anindividual operatingsystemimageonasinglecomputer withinacluster.
– Asystem partitioncontaininganoperatingsystem.Some IBMSystemp5
™andIBMSystemp®,
machinesallowmultiplesystempartitions,eachof whichisconsideredtobeanode withintheGPFS cluster.
v Networkshareddisks(NSDs)createdandmaintainedbytheNSDcomponentof GPFS
– AlldisksutilizedbyGPFSmustfirstbegivenagloballyaccessibleNSDname.
– TheGPFSNSDcomponentprovidesamethodforcluster-widedisknamingandaccess.
– OnLinuxmachinesrunningGPFS,youmaygiveanNSDnameto:
- Physicaldisks
- Logical partitionsofadisk
- Representationsof physicaldisks(suchasLUNs)
– OnAIXmachinesrunningGPFS,youmaygive anNSDnameto:
- Physicaldisks
- Virtual shareddisks
- Representationsof physicaldisks(suchasLUNs)
v Asharednetworkfor GPFScommunicationsallowingasinglenetwork viewof theconfiguration.A
singlenetwork,aLANor aswitch,isused forGPFScommunication,includingtheNSDcommunication.
The
strengths
of
GPFS
GPFSisa powerfulfilesystem offering:
v “Sharedfile systemaccessamongGPFSclusters”
v “Improved systemperformance”onpage2
v “Fileconsistency”onpage3
v “Highrecoverabilityandincreaseddataavailability”onpage3
v “Enhanced systemflexibility”onpage3
v “Simplifiedstoragemanagement”onpage4
v “Simplifiedadministration”onpage4
Shared
file
system
access
among
GPFS
clusters
GPFSallowsuserssharedaccesstofilesineithertheclusterwherethefile systemwascreatedor other GPFSclusters.Eachsite inthenetworkismanaged asa separatecluster,whileallowingsharedfile systemaccess.Whenmultipleclustersareconfiguredto accessthesameGPFSfilesystem,OpenSecure SocketsLayer(OpenSSL)isusedto authenticateandcheckauthorizationfor allnetworkconnections. |
Note: If youuseacipher,thedatawillbeencryptedfor transmissions.However, ifyousetthecipherlist
keywordof themmauthcommandto AUTHONLY,onlyauthenticationwillbeusedfordata transmissions anddatawillnotbeencrypted.
GPFSsharedfilesystem accessprovidesfor:
v Theabilityof theclustergrantingaccessto specifymultiplesecuritylevels, upto oneforeach
authorizedcluster.
v Ahighlyavailable serviceasthelocalclustermay remainactivepriorto changingsecuritykeys.
Periodicchangingofkeysisnecessaryfor avarietyofreasons, including:
– In orderto makeconnection rateperformanceacceptableinlargeclusters,thesizeofthesecurity
keysusedfor authenticationcannotbevery large.As aresultitmaybenecessaryto change securitykeysinordertopreventa givenkey frombeingcompromisedwhileitisstillinuse. – Asa matterof policy,someinstitutionsmayrequiresecuritykeysarechangedperiodically.
Note: Thepairof publicandprivatesecuritykeysprovidedbyGPFSaresimilarto hostbased
authentication mechanismprovidedbyOpenSSH.EachGPFSclusterhasapairofthesekeysthat identify thecluster.Inaddition,eachclusteralso hasanauthorized_keyslist.Eachline inthe authorized_keys listcontainsthepublickey ofoneremoteclusterandalistof filesystemsthat clusterisauthorizedto mount.For detailsonsharedfilesystem access,see theGPFS:Advanced
AdministrationGuide.
Improved
system
performance
UsingGPFStostoreandretrieveyourfilescanimprovesystem performanceby:
v Allowingmultipleprocesses orapplicationsonallnodesintheclustersimultaneousaccessto thesame
fileusingstandardfilesystem calls.
v Increasingaggregate bandwidthofyourfile systembyspreadingreads andwritesacrossmultipledisks.
v Balancingtheloadevenlyacrossalldisksto maximizetheircombinedthroughput.Onediskisnomore
activethananother.
v Supporting verylargefileandfilesystem sizes.
v Allowingconcurrentreadsandwritesfrommultiplenodes.Thisisakey conceptinparallelprocessing.
v Allowingfordistributedtoken(lock)management.Distributingtokenmanagement reducessystem
delaysassociatedwithalockableobjectwaitingto obtaininga token.Referto “Managingdistributed tokens”onpage24and“Highrecoverabilityandincreaseddata availability”onpage3 foradditional informationontokenmanagement.
v Allowingforthespecification ofdifferentnetworksforGPFSdaemoncommunicationandfor GPFS
administrationcommandusagewithinyourcluster.
Achievinghighthroughputto asingle,large filerequiresstripingdataacrossmultipledisksandmultiple diskcontrollers. Ratherthan relyingonstripinginaseparate volumemanagerlayer,GPFSimplements stripinginthefile system.Managingitsownstripingaffords GPFSthecontrolit needsto achievefault toleranceandto balanceloadacrossadapters, storagecontrollers,anddisks.LargefilesinGPFSare dividedintoequalsizedblocks,andconsecutiveblocks areplacedondifferentdisksinaround-robin fashion1
Toexploitdiskparallelismwhenreadingalargefilefromasingle-threaded application,wheneverit can recognizeapattern,GPFSprefetchesdatainto itsbufferpool, issuingI/Orequests inparallel toasmany disksasnecessarytoachievethebandwidthofwhich theswitchingfabriciscapable.GPFSrecognizes sequential,reversesequential,andvarious formsofstridedaccesspatterns1.
GPFSI/Operformancemay bemonitoredthroughthemmpmon command.SeetheGPFS:Advanced
AdministrationGuide.
File
consistency
GPFSusesasophisticated tokenmanagement systemtoprovide dataconsistencywhileallowingmultiple independentpathstothesamefilebythesamenamefromanywhereinthecluster. SeeChapter9, “GPFSarchitecture,” onpage 71.
High
recoverability
and
increased
data
availability
GPFSfailover supportallowsyouto organizeyourhardware intofailuregroups.Afailuregroup isa setof disksthatsharea commonpointof failurethatcouldcausethemallto becomesimultaneously
unavailable.WhenusedinconjunctionwiththereplicationfeatureofGPFS,thecreationofmultiplefailure groupsprovidesforincreasedfileavailabilityshould agroupof disksfail.GPFSmaintainseachinstanceof replicateddataandmetadataondisksindifferentfailuregroups.Shoulda setofdisksbecome
unavailable,GPFSfailsovertothereplicatedcopiesinanotherfailuregroup.
Duringconfiguration,youassignareplicationfactor toindicatethetotalnumberofcopiesof dataand metadatayouwishto store.Replicationallowsyoutosetdifferentlevelsof protectionfor eachfileor one levelforanentirefilesystem.Sincereplicationusesadditionaldiskspaceandrequiresextra writetime, youmightwantto considerreplicatingonly filesystemsthatarefrequentlyreadfrombutseldomwrittento. Toreducetheoverheadinvolvedwiththereplicationofdata, youmayalsochoose toreplicateonly
metadataasameansof providingadditionalfilesystem protection.ForfurtherinformationonGPFS replication,see“Filesystem recoverabilityparameters”onpage36.
GPFSisa loggingfilesystemthatcreatesseparate logsforeach node.Theselogsrecordtheallocation andmodificationofmetadataaidinginfastrecoveryandtherestorationofdataconsistency intheeventof nodefailure.Evenifyou donotspecifyreplicationwhencreating afilesystem,GPFSautomatically replicatesrecoverylogs inseparatefailuregroups,if multiplefailuregroupshavebeenspecified.This replicationfeaturecanbeused inconjunctionwithotherGPFScapabilitiesto maintainonereplicaina geographicallyseparatelocationwhichprovidessomecapability forsurvivingdisastersat theother location.For furtherinformationonfailuregroups,see “NSDcreationconsiderations” onpage 25.For furtherinformationondisasterrecoverywithGPFSseetheGPFS:AdvancedAdministrationGuide. Onceyourfilesystemiscreated,itcan beconfiguredtomountwhenevertheGPFSdaemonisstarted. Thisfeatureassuresthatwheneverthesystemanddisksareup,thefilesystem willbeavailable.When utilizingsharedfilesystem accessamongGPFSclusters,toreduceoverallGPFScontroltraffic youmay indicateto mountthefilesystem whenit isfirstaccessed.Thisisdonethrougheitherthemmremotefs commandor themmchfscommandusingthe-Aautomount option.GPFSmounttrafficmaybelessened byusingautomaticmountsinsteadof mountingat GPFSstartup.Automaticmountsonly produce
additionalcontroltrafficatthepoint thatthefile systemisfirstusedbyanapplicationoruser. Mountingat GPFSstartupontheotherhandproducesadditionalcontroltraffic ateveryGPFSstartup.Thusstartupof hundredsof nodesat oncemay bebetterservedbyusingautomaticmounts.However,whenexportingthe filesystemfor NetworkFileSystem(NFS)mounts, itmightbeusefulto mountthefilesystem whenGPFS isstarted.For furtherinformationonsharedfile systemaccessandtheuseof NFSwithGPFS,seethe
GPFS:AdministrationandProgrammingReference.
Enhanced
system
flexibility
WithGPFS,yoursystemresourcesarenotfrozen.Youcanaddordeletediskswhilethefilesystemis mounted.Whenthetimeisrightandsystemdemand islow,youcanrebalancethefilesystemacrossall currentlyconfigureddisks.In addition,youcanalso addordeletenodeswithouthavingto stopandrestart theGPFSdaemononallnodes.
Note: In thenodequorumwithtiebreakerdiskconfiguration,GPFShasalimitofeight quorumnodes.If
youaddquorumnodesandexceedthatlimit,theGPFSdaemonmust beshutdown. Beforeyou restart thedaemon,youmustswitchquorumsemanticstonode quorum.Foradditionalinformation, referto “Quorum”onpage14.
Ina SANconfigurationwhereyouhavealso definedNSDservers, ifthephysicalconnectionto thediskis broken,GPFSdynamically switchesdiskaccessto theserversnodesandcontinuesto providedata throughNSDservernodes.GPFSfallsbacktolocaldiskaccesswhenithasdiscovered thepathhas beenrepaired.
AfterGPFShasbeen configuredfor yoursystem,dependingonyourapplications,hardware,and
workload,youcanre-configureGPFStoincreasethroughput.YoucansetupyourGPFSenvironmentfor yourcurrentapplicationsandusers,secureintheknowledgethatyoucanexpandinthefuturewithout jeopardizingyour data.GPFScapacitycan growasyourhardwareexpands.
Simplified
storage
management
GPFSprovidesstoragemanagementbasedonthedefinitionanduseof: v Storage pools
v Policies
v Filesets
Storagepools
Astoragepoolisacollectionof disksorRAIDswithsimilarpropertiesthataremanaged together
asa group.Storage poolsprovidea methodtopartitionstorageonthefilesystem.While youplan howto configureyourstorage,considerfactorssuchas:
v Improvedprice-performance bymatchingthecost ofstoragetothevalueofthedata.
v Improvedperformanceby:
– Reducingthecontentionforpremium storage
– Reducingtheimpactofslower devices
v Improvedreliability byprovidingfor:
– Replicationbasedonneed
– Betterfailurecontainment
Policies
Filesareassignedtoa storagepool basedondefinedpolicies.Policiesprovidefor: v Placingfilesina specificstoragepool whenthefilesarecreated
v Migratingfilesfromonestoragepoolto another
v Filedeletion basedonfilecharacteristics
v Snapshotmetadata scansandfilelistcreation
Filesets
Filesetsprovidea methodfor partitioningafile systemandallowadministrativeoperationsat a
finergranularitythantheentirefilesystem.Forexamplefilesetsallowyouto: v Definedatablockandinodequotasatthefilesetlevel
v Apply policyrulesto specificfilesets
Forfurtherinformationonstoragepools,filesets,andpoliciessee theGPFS:AdvancedAdministration Guide.
Simplified
administration
GPFSoffersmanyof thestandardUNIX® filesysteminterfacesallowingmostapplicationsto execute withoutmodificationorrecompiling.UNIXfilesystemutilitiesarealsosupportedbyGPFS.Thatis,users |
cancontinueto usetheUNIXcommandstheyhavealwaysused forordinaryfileoperations(see Chapter11,“ConsiderationsforGPFSapplications,”onpage95).Theonly uniquecommands arethose foradministeringtheGPFSfilesystem.
GPFSadministration commandsaresimilar inname andfunctiontoUNIXfilesystem commands,withone importantdifference:theGPFScommandsoperateonmultiplenodes.AsingleGPFScommandperforms afilesystem functionacrosstheentirecluster. Seetheindividualcommands asdocumented intheGPFS:
AdministrationandProgrammingReference.
GPFScommands saveconfigurationandfilesysteminformationinoneormorefiles,collectivelyknown as GPFScluster configurationdatafiles.TheGPFSadministrationcommands aredesignedtokeep these filessynchronized betweeneachother andwiththeGPFSsystemfilesoneach nodeinthecluster, therebyprovidingfor accurateconfigurationdata.See“Clusterconfigurationdatafiles”onpage85.
The
basic
GPFS
structure
MostGPFSadministrationtaskscanbeperformedfromanynoderunningGPFS.Seetheindividual commandsasdocumentedintheGPFS:AdministrationandProgrammingReference.
GPFSisa clusteredfilesystem definedoveranumberofnodes.Oneach nodeinthecluster, GPFS consistsof:
1. “GPFSadministration commands”
2. “TheGPFSkernelextension”
3. “TheGPFSdaemon”
4. FornodesinyourclusteroperatingwiththeLinuxoperating system,“TheGPFSopensource
portabilitylayer”onpage6
Foradetailed discussionofGPFS,seeChapter9,“GPFSarchitecture,”onpage71.
GPFS
administration
commands
MostGPFSadministrationtaskscanbeperformedfromanynoderunningGPFS.Seetheindividual commandsasdocumentedintheGPFS:AdministrationandProgrammingReference.
The
GPFS
kernel
extension
TheGPFSkernelextensionprovidestheinterfacesto theoperating systemvnodeandvirtualfilesystem (VFS)interfacesforaddingafilesystem.Structurally,applicationsmakefile systemcallstotheoperating system,which presentsthemto theGPFSfilesystem kernelextension. Inthisway,GPFSappearsto applicationsasjustanotherfilesystem.TheGPFSkernelextensionwilleithersatisfy theserequests using resourceswhich arealreadyavailableinthesystem,or sendamessageto theGPFSdaemonto complete therequest.
The
GPFS
daemon
TheGPFSdaemonperformsallI/Oandbuffermanagement forGPFS.Thisincludesread-aheadfor sequentialreadsandwrite-behindforallwritesnotspecifiedassynchronous.AllI/Oisprotected byGPFS tokenmanagement whichhonorsatomicitytherebyprovidingfor dataconsistencyofa filesystemon multiplenodes.
Thedaemonisa multithreadedprocess withsomethreadsdedicatedtospecificfunctions.Dedicated threadsforservicesrequiringpriorityattentionarenotusedfor orblockedbyroutinework.Thedaemon alsocommunicates withinstancesof thedaemononothernodesto coordinateconfigurationchanges, recoveryandparallelupdates ofthesamedatastructures.Specific functionsthatexecuteonthedaemon include:
1. Allocationofdiskspacetonewfilesandnewlyextendedfiles.This isdone incoordinationwiththefile
systemmanager.
2. Managementof directoriesincludingcreationofnew directories,insertionandremoval ofentriesinto
existingdirectories, andsearchingof directoriesthatrequireI/O.
3. Allocationofappropriatelockstoprotecttheintegrityof dataandmetadata.Locksaffecting datathat
maybeaccessedfrommultiplenodesrequireinteractionwiththetokenmanagementfunction. 4. DiskI/Oisinitiatedonthreadsofthedaemon.
5. Usersecurityandquotasarealso managedbythedaemoninconjunctionwiththefile system
manager.
TheGPFSNetworkSharedDisk(NSD)componentprovidesamethodforcluster-widedisknamingand high-speedaccessto dataforapplicationsrunningonnodesthatdonothavedirectaccesstothedisks. TheNSDsinyourclustermay bephysicallyattached toallnodesorservetheirdatathrougha NSD serverthatprovidesavirtualconnection.You areallowedtospecifyuptoeightNSDserversforeach NSD.Ifoneserverfails,thenextserveronthelisttakescontrolfromthefailednode.
Fora givendisk,eachof itsNSDservers musthavephysicalaccesstothesameLUN.However,different serverscanserveI/Oto differentnon-intersectingsets ofclients.TheexistingsubnetfunctionsinGPFS determinewhich NSDservershouldservea particularclient.
Note: GPFSassumesthatnodeswithinasubnetareconnectedusinghigh-speednetworks.For
additionalinformationonsubnetconfiguration,referto“UsingpublicandprivateIPaddressesfor GPFSnodes”onpage 78.
GPFSdeterminesifanode hasphysicalorvirtualconnectivityto anunderlyingNSDthroughasequence ofcommandsinvokedfromtheGPFSdaemon.This determinationiscalleddiskdiscoveryandoccursat bothinitialGPFSstartupaswellaswheneverafilesystemismounted.
Thedefaultorderof accessused indiskdiscovery:
1. Local/devblockdevice interfacesfor virtualshareddisk,SAN,SCSIorIDEdisks
2. NSDservers
ThisordercanbechangedwiththeuseNSDserver mountoption.
ItissuggestedthatyoualwaysdefineNSDserversfor thedisks. InaSANconfigurationwhereNSD servershavealso beendefined,if thephysicalconnection isbroken,GPFSdynamicallyswitches tothe servernodesandcontinuesto providedata.GPFSfallsbacktolocaldiskaccesswhenithas discovered thepathhasbeenrepaired.Thisisthedefaultbehavior,andit canbechanged withtheuseNSDserver file systemmountoption.
Forfurtherinformation, see“Diskconsiderations”onpage24and“NSDdiskdiscovery”onpage84.
The
GPFS
open
source
portability
layer
ForLinuxnodesrunningGPFS,youmustbuildcustomportabilitymodulesbasedonyourparticular hardwareplatformandLinuxdistributionto enablecommunicationbetweentheLinuxkernelandtheGPFS kernelmodules.See“BuildingyourGPFSportabilitylayer”onpage45.
GPFS
cluster
configurations
GPFScanbeconfiguredinavarietyof ways.Asubsetof theseconfigurationsincludes:
1. ConfigurationswherealldisksareSAN-attachedtoallnodesintheclusterandthenodesinthecluster
areeither allLinuxor allAIX(refertoFigure1onpage7).For thelatesthardware thatGPFShas | | | | | | | | | | | | | | | | | | | | | |
com.ibm.cluster.gpfs.doc/gpfs_faqs/gpfsclustersfaq.html
2. ALinux-onlyclusterconsistingofxSeries
®,IBMSystemp5,IBMSystemp, oreServerprocessorswith
anNSDserverattached tothedisk(referto Figure2). Nodesthatarenotdirectlyattachedto thedisk haveremotedataaccessoverthelocalareanetwork(either Ethernetor Myrinet)totheNSDserver. AnNSDserverwithdirectFibreChannelaccesstothediskscanalso bedefined.Anynodesdirectly attachedto thediskwillnotaccessdatathroughtheNSDserver.Thisisthedefaultbehavior,which canbechanged withtheuseNSDserver filesystemmountoption.
3. AnAIXandLinuxclusterwithanNSDserver(referto Figure3onpage8). Nodesnotdirectlyattached
tothediskhaveremoteaccessoverthelocalareanetworkto theNSDserver.
Figure1.ALinux-onlyclusterwithdisksthatareSAN-attachedtoallnodes
Figure2.ALinux-onlyclusterwithanNSDserver
|
4. AnAIXandLinuxclusterwithanNSDserver(referto Figure4).AIXnodeswithRecoverableVirtual
SharedDisk(RVSD)andIBMHighPerformanceSwitch(HPS)connectionstothediskserveraccess datathroughthatpath.Linuxnodesor otherAIXnodeswithnoRVSDswitchaccesshave remotedata accessovertheLANto theNSDserver.Youcanalso defineadditionalNSDserverswithaccessto thediskthroughRVSDandtheswitch.
5. AnAIXandLinuxclusterthatprovidesremoteaccesstodisksthroughmultipleNSDservers(referto
Figure5 onpage 9).Inthis configuration: v InternodecommunicationusestheLAN
v AlldiskaccesspassesthroughNSDservers
– NSDserversconnectthediskto theLinuxnodes
– NSDserversuseRVSDtoconnect thediskto theAIXnodes
Figure3.AnAIXandLinuxclusterwithanNSDserver
HPS
GPFS
NSD
Linux
Application
GPFS
NSD server
AIX
Application
GPFS
NSD
AIX
Application
RVSD
RVSD
Local Area Network
Figure4.AnAIXandLinuxclusterprovidingremoteaccesstodisksthroughtheHighPerformanceSwitch(HPS)for theAIXnodesandaLANconnectionfortheLinuxnodes
| |
6. AnAIXclusterwithanNSDserver(referto Figure6). Nodesnotdirectlyattached tothediskhave
remoteaccessovertheHPSto theNSDserver.
7. SharedfilesystemaccessamongmultipleGPFSclusters(referto Figure7 onpage 10).TheGPFS
clusterssharingfilesystem accessmay beanysupportedconfiguration.
HPS
GPFS
NSD server
Linux
Application
GPFS
NSD server
AIX
Application
Application
GPFS
NSD server
AIX
GPFS
NSD server
Linux
Application
RVSD
RVSD
Local Area Network
Figure5.AnAIXandLinuxclusterthatprovidesremoteaccesstodisksthroughmultipleNSDservers
GPFS
NSD
AIX
Application
GPFS
NSD server
AIX
Application
GPFS
NSD
AIX
Application
High Performance Switch
Figure6.AnAIXclusterwithanNSDserver
|
| | |
Forthelatestlistof supportedclusterconfigurations,pleasesee theGPFSFAQat publib.boulder.ibm.com/ infocenter/clresctr/topic/com.ibm.cluster.gpfs.doc/gpfs_faqs/gpfsclustersfaq.html.
Interoperable
cluster
requirements
ConsulttheGPFSFAQatpublib.boulder.ibm.com/infocenter/clresctr/topic/com.ibm.cluster.gpfs.doc/ gpfs_faqs/gpfsclustersfaq.htmlforanychangestorequirements andcurrentlytested:
1. Hardwareconfigurations
2. Softwareconfigurations
3. Clusterconfigurations
Priorto GPFS3.2,upgradingyoursystem toanew versionofGPFSrequiredshuttingdownGPFSand upgradingallnodesbeforeyou couldrestart GPFS.However,ifyou areupgradingaGPFS3.1clusterto GPFS3.2,you canperforma rollingupgradewithalimitedformof backwardcompatibility.Rolling upgradesallowyouto installnewGPFScode onenodeata timewithoutshutting downGPFSonother nodes.However,youmust upgradeallnodeswithina shorttime.Thetimedependencyexistsbecause someGPFS3.2featuresbecomeavailableoneachnode assoon asthenode isupgraded, whileother featureswillnotbecome availableuntilyouupgradeallparticipatingnodes.Onceallnodeshave been migratedtothenew code,youmust finalizethemigrationbyrunningthecommandsmmchconfig release=LATESTandmmchfs -Vall(ormmchfs compat).Oncethisisdone,youcan createnew file systemsandusesuchnew featuresasPersistentReserve(PR)andmultipleNSDservers.
Limitedbackwardcompatibilityallowsyoutotemporarilyoperateyourclusterwithamixtureof oldand newnodes.In addition,GPFSrequiresbackwardcompatibilityfor multi-clusterenvironments.With backwardcompatibility,theadministratorshould beableto upgradethelocalclusterwhilestillallowing mountsfromremotenodesinotherclustersthathave notbeenupgradedyet.For additionalinformation, refertoChapter6, “Migration,coexistenceandcompatibility,” onpage51.
Theseconfigurationrequirementsapplyto aninteroperable GPFScluster:
v Allfilesystemsdefinedonversionsof GPFSpriorto version2.3mustbeexportedfromtheiroldcluster
definitionandre-importedintoanewlycreated3.2cluster.Cluster configurationdependenciesand setupchangedsignificantly inGPFSversion2.3.See“Migratingto GPFS3.2fromGPFS2.2orearlier releasesofGPFS”onpage52.
v Allnodesservingasetof NSDsmustbeona homogenoussetofLinuxorAIXnodes.AnNSDcannot
besplitbetweenoperatingsystem types.SeeFigure5onpage9.
Figure7.GPFSclustersprovidingsharedfilesystemaccess
| | | | | | | | | | | | | | | | |
v For mostdisksubsystems,allnodesaccessingaSAN-attacheddisk(LUN)mustusethesame
operating system.Most disksubsystemsdonotallowyoutohave LinuxnodesandAIXnodesattached to thesameLUN.Refertotheinformationsuppliedwithyourspecificdisksubsystemfor detailsabout supportedconfigurations.
v Yourclustercan haveamix ofnodeswithGPFS3.1(RDMAnotavailable),GPFS3.2withRDMA
configured,andGPFS3.2withoutRDMAconfigured.OnlytheGPFS3.2nodeswithRDMAconfigured willuseRDMAfor datatransferbetweentheNSDclientandserver.
| | |
Chapter
2.
Planning
for
GPFS
PlanningforGPFSincludes:v “Hardwarerequirements”
v “Softwarerequirements”
v “Recoverabilityconsiderations” onpage14
v “GPFSclustercreationconsiderations” onpage 20
v “Diskconsiderations” onpage 24
v “Filesystem creationconsiderations”onpage30
AlthoughyoucanmodifyyourGPFSconfigurationafterithasbeenset,a littleconsiderationbefore installationandinitialsetupwillrewardyouwitha moreefficientandimmediatelyusefulfilesystem.During configuration,GPFSrequiresyouto specifyseveral operationalparametersthatreflectyourhardware resourcesandoperating environment.Duringfilesystem creation,youhave theopportunitytospecify parametersbasedontheexpectedsizeofthefilesorallowthedefaultvaluesto takeeffect.These parametersdefinethedisksforthefilesystem andhowdatawillbewrittentothem.
Hardware
requirements
1. Pleaseconsult theGPFSFAQat publib.boulder.ibm.com/infocenter/clresctr/topic/
com.ibm.cluster.gpfs.doc/gpfs_faqs/gpfsclustersfaq.htmlforlatestlistof: v Supportedhardware
v Testeddiskconfigurations
v Maximumclustersize
2. Enoughdisksto containthefilesystem.Diskscanbe:
v SAN-attachedto eachnodeinthecluster
v Attachedto oneormoreNSDservers
v Amixtureof directly-attacheddisksanddisksthatareattached toNSDservers
Referto “NSDcreation considerations”onpage25foradditionalinformation.
3. SinceGPFSpassesa largeamountofdatabetweenitsdaemons,it issuggestedthatyouconfigurea
dedicatedhighspeednetworksupportingtheIPprotocol whenyou areusingGPFS: v WithNSDdisksconfiguredwithserversprovidingremotediskcapability
v MultipleGPFSclustersprovidingremotemountingofandaccesstoGPFSfilesystems
Referto theGPFS:AdvancedAdministrationGuideforadditionalinformation.
GPFScommunications requireinvariantstaticIPaddressesfor eachspecificGPFSnode.AnyIPaddress takeoveroperationswhichtransfertheaddresstoanothercomputer arenotallowedfortheGPFS
network.Other IPaddresseswithinthesamecomputer whicharenotused byGPFScanparticipateinIP takeover.GPFScanusevirtualIPaddressescreatedbyaggregatingseveralnetwork adaptersusing techniquessuchasEtherChannelorchannel bonding.
Software
requirements
Pleaseconsult theGPFSFAQat publib.boulder.ibm.com/infocenter/clresctr/topic/com.ibm.cluster.gpfs.doc/ gpfs_faqs/gpfsclustersfaq.htmlforthelatestlistof:
v Linuxdistributions
v Linuxkernelversions
v AIXenvironments
| |
v OpenSSLlevels
Note: Whenmultipleclustersareconfiguredto accessthesameGPFSfilesystemOpenSSLisused to
authenticateandcheckauthorizationforallnetwork connections.Inaddition,ifyouusea cipher, datawillbeencryptedfortransmissions.However,if yousetthecipherlistkeywordofthe mmauthcommandtoAUTHONLY,onlyauthentication willbeusedfor datatransmissionsand datawillnotbeencrypted.
Recoverability
considerations
Soundfilesystem planningrequiresseveraldecisions aboutrecoverability.Afteryoumakethese decisions,GPFSparametersenableyoutocreateahighlyavailablefilesystem withfastrecoverability fromfailures:
v Atthefilesystem level,considerreplicationthroughthemetadataanddatareplicationparameters.See
“Filesystem recoverabilityparameters”onpage36.
v Atthedisklevel,considerpreparingdisksfor usewithyourfilesystembyspecifyingfailuregroupsthat
areassociatedwitheachdisk.Withthisconfiguration,informationisnotvulnerabletoa singlepointof failure.See“NSDcreation considerations”onpage25.
Additionally,GPFSprovidesseverallayersofprotectionagainstfailuresof varioustypes: 1. “Nodefailure”
2. “NetworkSharedDiskserveranddiskfailure”onpage 17
Node
failure
Intheeventofanode failure,GPFS:
v PreventsthecontinuationofI/Ofromthefailingnode
v Replaysthefilesystem metadatalogforthefailingnode
GPFSpreventsthecontinuationof I/OfromafailingnodethroughaGPFS-specificfencingmechanism calleddiskleasing.Whenanode hasaccessto filesystems,itobtains diskleasesthatallowit tosubmit I/O.However,whenanode fails,thatnode cannotobtainor renewa disklease.WhenGPFSselects anothernodetoperformrecoveryfor thefailingnode, itfirstwaits untilthediskleasefor thefailingnode expires.Thisallowsforthecompletionof previouslysubmittedI/Oandprovidesfor aconsistent filesystem metadatalog.Waiting forthediskleaseto expirealso avoidsdatacorruptioninthesubsequentrecovery step.For furtherinformationonrecoveryfromnodefailure,seetheGPFS:ProblemDetermination Guide. Filesystemrecovery fromnodefailureshouldnotbenoticeableto applicationsrunningonothernodes. Theonlynoticeableeffect maybea delayinaccessingobjectsbeingmodifiedonthefailingnode. Recoveryinvolvesrebuildingmetadatastructureswhichmayhave beenundermodificationat thetime of thefailure.If thefailingnodeisthefile systemmanagerforthefilesystem,thedelaywillbelongerand proportionaltotheactivityonthefilesystem atthetimeof failure.However,administrativeintervention will notbeneeded.
Note: If usePersistentReserveisenabled,GPFSpreventsthecontinuationofI/Ofromafailingnode by
fencing thefailednodeusingPersistentReserve(SCSI-3 protocol).PersistentReserveallowsthe failingnode torecoverfaster.GPFSdoesnotneedtowaitfor thediskleaseonthefailingnodeto expire.For additionalinformation,referto“Reducedrecoverytime usingPersistentReserve”on page 20.
Quorum
Duringnodefailuresituations,quorumneedstobemaintainedinordertorecoverthefailingnodes.If quorumisnotmaintaineddueto nodefailure,GPFSunmountslocalfilesystemsontheremainingnodes |
| | | |
andattemptsto reestablishquorum,atwhichpoint filesystemrecovery occurs.Forthis reasonit is
importantthatthesetofquorumnodesbecarefullyconsidered(referto“Selectingquorumnodes”onpage 17for additionalinformation).
GPFSquorummust bemaintainedwithin thecluster forGPFStoremainactive.Ifthequorumsemantics arebroken,GPFSperformsrecoveryinanattempttoachievequorumagain.GPFScanuseoneoftwo methodsfor determiningquorum:
v Nodequorum
v Nodequorumwithtiebreakerdisks.
Nodequorum: Nodequorumisthedefaultquorumalgorithmfor GPFS.Withnodequorum:
v Quorumisdefinedasoneplushalfof theexplicitlydefinedquorumnodesintheGPFScluster.
v Therearenodefaultquorumnodes,youmust specifywhichnodeshave thisrole.
v GPFSdoesnotlimitthenumberof quorumnodes.
Forexample,inFigure8, therearesix quorumnodes.In thisconfiguration,GPFSremains activeaslong astherearefourquorumnodesavailable.
Nodequorumwithtiebreakerdisks: Nodequorumwithtiebreakerdisksallowsyouto runwithaslittle
asonequorumnodeavailable aslong asyouhave accessto amajority ofthequorumdisks(referto Figure9 onpage 17).Switchingto quorumwithtiebreakerdisksisaccomplished byindicatingalist ofone tothreedisksto useonthetiebreakerDisks parameteronthemmchconfigcommand.
Whenutilizingnodequorumwithtiebreakerdisks,therearespecificrulesfor clusternodesandfor tiebreakerdisks.
Clusternoderules:
1. Thereisamaximumofeight quorumnodes.
2. Youshouldincludetheprimaryandsecondaryclusterconfigurationserversasquorumnodes.
High Performance Switch
q q
s - secondary cluster configuration server p - primary cluster configuration server
q q nq - non-quorum node q - quorum node NSD - NSD server NSD NSD q NSD nq nq NSD nq q nq s p
Figure8.GPFSconfigurationutilizingnodequorum
|
| | |
3. Youmayhave anunlimitednumberofnon-quorumnodes.
Changingquorumsemantics:
1. Ifyouexceedeight quorumnodes,you mustdisablenode quorumwithtiebreakerdisks andrestart
GPFSdaemon usingthedefaultnodequorumconfiguration.Todisable nodequorumwithtiebreaker disks:
a. ShutdowntheGPFSdaemonbyissuingmmshutdown-a onallnodes.
b. Changequorumsemanticsbyissuingmmchconfigtiebreakerdisks=no.
c. Addquorumnodes.
d. RestarttheGPFSdaemon byissuingmmstartup-aonallnodes.
2. Ifyouremovequorumnodesandthenew configurationhaslessthaneight quorumnodes,youcan
changetheconfigurationto nodequorumwithtiebreakerdisks.Toenablequorumwithtiebreaker disks:
a. ShutdowntheGPFSdaemonbyissuingmmshutdown-a onallnodes.
b. Deletethequorumnodes.
c. Changequorumsemanticsbyissuingthemmchconfigtiebreakerdisks=″diskList″command.
v ThediskListcontainsthenamesof thetiebreakerdisks.
v ThelistcontainstheNSDnamesof thedisks,preferablyoneor threedisks,separatedbya
semicolon (;)andenclosedbyquotes.
d. RestarttheGPFSdaemon byissuingmmstartup-aonallnodes.
Tiebreakerdiskrules:
v Youcan haveone,two, orthreetiebreakerdisks. However,youshoulduseanoddnumberof
tiebreakerdisks.
v Tiebreakerdisksmust bedefinedthroughthemmcrnsdcommand.
v Tiebreakerdisksmust useoneoffollowingattachmentstothequorumnodes:
– fibre-channelSAN
– IPSAN
– virtualshareddisks
InFigure9onpage17GPFSremains activewiththeminimumofasingle availablequorumnode andtwo availabletiebreakerdisks.
Selecting
quorum
nodes
Toconfigureasystem withefficient quorumnodes,follow theserules: v Selectnodesthatarelikelyto remainactive
– Ifa nodeislikelytoberebootedorrequiremaintenance,donotselectthatnodeasa quorumnode.
v Selectnodesthathave differentfailurepointssuchas:
– Nodeslocatedindifferentracks
– Nodesconnectedto differentpowerpanels
v Youshould selectnodesthatGPFSadministrativeandservingfunctionsrelyonsuchas:
– Primaryconfigurationservers
– Secondaryconfigurationservers
– NetworkSharedDiskservers
v Selectanoddnumberof nodesasquorumnodes
– Thesuggestedmaximumissevenquorumnodes.
v Havingalargenumberof quorumnodesmayincreasethetimerequiredforstartupandfailurerecovery.
– Havingmorethansevenquorumnodesdoes notguarantee higheravailability.
Network
Shared
Disk
server
and
disk
failure
Thethreemostcommonreasons whydatabecomesunavailableare: v Diskfailure
v Diskserverfailurewithnoredundancy
v Failureofa pathtothedisk
p
q
NSD
s
q
NSD
t
t
t
s
- secondary cluster
configuration serverp
- primary cluster
configuration servernq
nq
nq
- non-quorum node
local area network
q
- quorum node
NSD - NSD server
t
- tiebreaker disk
Figure9.GPFSconfigurationutilizingnodequorumwithtiebreakerdisks
|
| | |
Intheeventofadiskfailureinwhich GPFScannolongerreadorwriteto thedisk,GPFSwilldiscontinue useofthediskuntilit returnsto anavailablestate.Youcanguardagainstlossof dataavailabilityfrom diskfailureby:
v Utilizing hardwaredatareplicationasprovidedbya RedundantArrayofIndependentDisks(RAID)
device
v Utilizing theGPFSdataandmetadata replicationfeatures(see“Highrecoverabilityandincreaseddata
availability”onpage3)alongwiththedesignationoffailuregroups(see“NSDcreation considerations” onpage25)
Figure10.RAID/ESSControllertwin-tailedinaSANconfiguration
GPFS
NSD Server
Disk Controller
GPFS
NSD Server
P
Figure11.GPFSconfigurationspecifyingmultipleNSDserversconnectedtoacommondiskcontrollerutilizingRAID5 withfourdatadisksandoneparitydisk
|
| | |
Ingeneral,it issuggestedthatyouconsiderRAIDasthefirst levelof redundancyforyour dataandadd GPFSreplication ifyoudesireadditionalprotection.
IntheeventofadiskserverfailureinwhichGPFScannolongercontactthenodethatprovidesremote accessto adisk,GPFSwillagaindiscontinueuseofthedisk.You canguardagainstlossofdiskserver availabilitybyusingcommon diskconnectivityonmultiplenodesandspecifyingmultipleNetworkShared Diskserversforthecommondisk.
Intheeventoffailureofa pathtothedisk:
v If avirtualshareddiskservergoesdownandGPFSreportsadiskfailure,followtheinstructionsinthe
RSCTforAIX5LManagingSharedDisksmanualforthelevelofyoursystem tocheckthestate ofthe
virtualshareddiskpath tothedisk.
v If aSANfailureremoves thepath tothediskandGPFSreportsadiskfailure,follow thedirections
suppliedbyyourstoragevendorto distinguishaSANfailurefroma diskfailure. Youcanguardagainst lossof dataavailabilityfromfailureof apathto adiskby:
v CreatingmultipleNSDserversforalldisks.AsGPFSdeterminestheavailableconnectionstodisksin
thefilesystem,itisrecommendedthatyou alwaysdefineNSDserversforthedisks.GPFSallowsyou to defineupto eightNSDserversforeachNSD.In aSANconfigurationwhereNSDservershavealso beendefined, ifthephysicalconnectionisbroken,GPFSdynamicallyswitchesto thenextavailable NSDserver(asdefinedontheserverlist)andcontinuesto providedata.WhenGPFSdiscoversthat thepathhasbeenrepaired, itfallsbackto localdiskaccess.This isthedefaultbehavior,whichcanbe changed withthe-ouseNSDserverfilesystem mountoptiononthemmchfs,mmmount,
mmremotefs,andmountcommands.
v UsingtheMultiplePathI/O(MPIO) featureofAIXtodefinealternatepathsto adeviceforfailover
purposes.Failoverisapath-management algorithmthatimprovesthereliabilityandavailabilityofa devicebecausethesystemautomaticallydetects whenoneI/OpathfailsandreroutesI/Othroughan alternatepath.AllSmall ComputerSystemInterface (SCSI)SelfConfiguredSCSIDrive(SCSD)disk drivesareautomaticallyconfiguredasMPIOdevices.Otherdevicescanbesupported,providingthe devicedriver iscompatiblewiththeMPIO implementationinAIX.For moreinformationaboutMPIO,see the:
– AIX5LVersion 5.3SystemManagement Concepts:OperatingSystemandDevicesbookandsearch
onMulti-pathI/O.
– AIX5LVersion 5.3SystemManagement Guide:OperatingSystemandDevicesbook andsearchon
Multi-pathI/O.
local area network
NSD
server
NSD
server
NSD
server
NSD
server
failure group 1
failure group 2
Figure12.GPFSutilizesfailuregroupstominimizetheprobabilityofaservicedisruptionduetoasinglecomponent failure | | | | | | | | | | | | |
v UseSubsystemDeviceDriver(SDD)orSubsystemDeviceDriverPathControlModule(SDDPCM)to
give theAIXhosttheabilityto accessmultiplepathsto asingleLUNwithinanEnterpriseStorage Server®(ESS).This abilitytoaccessasinglelogical unitnumber(LUN)onmultiplepathsallowsfora higherdegreeof dataavailabilityintheeventofa pathfailure.Datacancontinuetobeaccessedwithin theESSaslongasthereisatleastoneavailablepath.Withoutoneoftheseinstalled,youwilllose accesstotheLUNintheeventofa pathfailure.Foradditionalinformationabout:
– SSD,refertohttp://www.ibm.com/server/storage/support/software/sdd/
– SDDPCM,refertohttp://www.ibm.com/support/docview.wss?uid=ssg1S4000201
Reduced
recovery
time
using
Persistent
Reserve
PersistentReserve(PR)providesa mechanismfor reducingrecoverytimesfromnodefailures.Toenable PRandtoobtainrecoveryperformanceimprovements,yourclusterrequiresa specificenvironment: v AllnodesmustberunningAIX
v AlldisksmustbePR-capable
v Alldisksmustbehdisks
YoumustexplicitlyenablePRusingtheusePersistentReserve optionof themmchconfigcommand.If yousetusePersistentReserve=yes,GPFSwillattemptto setupPRonallthePRcapable disks.All subsequentNSDswillbecreatedwithPRenabledif theyarePRcapable.However,PRwillonlybe supportedinthehomecluster.Therefore,remotemountsmustaccessPRdisksthroughanNSDserver thatisinthehomecluster.
GPFS
cluster
creation
considerations
YoucreateGPFSclustersbyissuingthemmcrclustercommand.Table2 details: v TheGPFSclustercreationoptionsprovided bythemmcrclustercommand
v Howto changetheoptions
v Thedefaultvaluesareforeach option
Note: RefertotheGPFS:AdvancedAdministrationGuidefor informationonaccessingGPFSfilesystems
inremoteclustersandlargeclusteradministration.
Table2.GPFSclustercreationoptions
Clusteroption Commandtochangethe
option Defaultvalue
“NodesinyourGPFScluster”onpage 21
Addnodesthroughthe
mmaddnodecommandor deletenodesthroughthe
mmdelnodecommand
None
Nodedesignation:Managerorclient, see“NodesinyourGPFScluster”on page21
mmchnode Client
Nodedesignation:Quorumor
non-quorum,see“NodesinyourGPFS cluster”onpage21
mmchnode Non-quorum
Primaryclusterconfigurationserver, see“GPFSclusterconfiguration servers”onpage22
mmchcluster None
Secondaryclusterconfigurationserver, see“GPFSclusterconfiguration servers”onpage22
mmchcluster None
“Remoteshellcommand”onpage22 mmchcluster /usr/bin/rsh
| | | | | | | | | | | | |
Table2.GPFSclustercreationoptions (continued)
Clusteroption Commandtochangethe
option Defaultvalue
“Remotefilecopycommand”onpage
23 mmchcluster /usr/bin/rcp
“Clustername”onpage23
mmchcluster ThenodenameoftheprimaryGPFS
clusterconfigurationserver GPFSadministrationadapterport
name,see“GPFSnodeadapter interfacenames”
mmchnode
SameastheGPFScommunications adapterportname
GPFScommunicationsadapterport name,see“GPFSnodeadapter interfacenames”
mmchnode None
“UserIDdomainforthecluster”on
page23 mmchconfig ThenameoftheGPFScluster
“StartingGPFSautomatically”onpage
23 mmchconfig No
“Clusterconfigurationfile”onpage23 mmchconfig None “Managingdistributedtokens”onpage
24 mmchconfig Yes
GPFS
node
adapter
interface
names
Anadapterinterfacename referstothehostnameorIPaddressthatGPFSusesto communicatewitha node.Specifically,thehostnameorIPaddressidentifiesthecommunicationsadapteroverwhichthe GPFSdaemonsorGPFSadministrationcommandscommunicate.GPFSpermitstheadministrator to specifytwo nodeadapterinterfacenamesforeach nodeinthecluster:
GPFSnodename
Specifiesthename ofthenodeadapterinterfaceto beused bytheGPFSdaemonsforinternode communication.
GPFSadmin nodename
Specifiesthename ofthenodeadapterinterfaceto beused byGPFSadministrationcommands whencommunicatingbetweennodes.If notspecified,theGPFSadministrationcommands usethe samenodeadapterinterface usedbytheGPFSdaemons.
Thesenamescanbespecifiedbymeansofthenodedescriptorspassed tothemmaddnode, mmchcnode,ormmcrclustercommand.
Nodes
in
your
GPFS
cluster
WhenyoucreateyourGPFSclusteryou mustprovideafilecontaining alistof nodedescriptors,oneper line,for eachnodeto beincludedinthecluster.GPFSstoresthis informationonthe“GPFScluster configurationservers”onpage22.Eachdescriptormustbespecifiedintheform:
NodeName:NodeDesignations:AdminNodeName
NodeName
Thehost nameorIPaddressof thenodeforGPFSdaemon-to-daemoncommunication. Thehost nameorIPaddressthatisused foranode mustrefertothecommunicationadapter overwhichtheGPFSdaemonscommunicate.Alias namesarenotallowed.You canspecifyanIP addressatNSDcreation,butit willbeconvertedtoahost namethatmustmatchtheGPFSnode name.Youcan specifyanode usingany oftheseforms:
v Shorthostname(forexample,h135n01)
| | | | | |