Sim Now ™: Fast Plat for m
Sim ulat ion Pur ely I n Soft w ar e
Rober t Bedichek
I nt r oduct ion
• Sim ulat or s com e in m any flav or s.
• Som e sim ulat or s ar e m icr o - ar chit ect ur al sim ulat or s, and som e ar e inst r uct ion- lev el sim ulat or s.
• Som e sim ulat or s em ulat e only t he CPU m odel, and ot her s em ulat e t he w hole PC plat for m .
• I nst r uct ion- lev el sim ulat or s ar e usually w r it t en in one of t hr ee differ ent w ay s
– Conv ent ional em ulat ion – Thr eaded-code sim ulat ion
9 / 4 / 2 0 0 4 3
What is t he Sim Now ™ Sim ulat or?
• Sim Now ™ is a fast and configur able x 86 and AMD64 dy nam ically -t r ansla-t ing ins-t r uc-t ion-lev el pla-t for m sim ula-t or . Wi-t h Sim Now™, user s connect com plex soft w ar e m odels t o for m a full PC plat for m em ulat ion env ir onm ent .
• We use Sim Now ™ t o em ulat e AMD At hlon™ 64 and AMD Opt er on ™ m ult ipr ocessor sy st em s t hat r un com m er cial oper at ing sy st em s and applicat ions. Specifically , AMD and it s par t ner s use Sim Now ™ for:
– BI OS and Device Driver developm ent
– Generat ing st at e snapshot s and event t races for input t o det ailed t im ing sim ulat ors, t o support processor archit ect ure developm ent
– Prot ot yping soft ware- visible archit ect ure changes
– Non- int rusive and det erm inist ic m easurem ent and t est ing of soft ware at t he inst ruct ion-execut ion level
– Modeling of fut ure plat form t radeoffs for correct ness and perfor m ance analysis
Pr esent at ion Map
• Ov er v iew
• Com par ison w it h Ot her Sim ulat or s • How it Works
• Dem onst r at ion
• Requir em ent s/ Goals/ Uses • St at u s
9 / 4 / 2 0 0 4 5
Sim Now™ vs. Earlier Sim ulat ors
• Sim Now ™ is m uch fast er t han any ot her x 86 sim ulat or of w hich w e ar e aw ar e
– I t s speed com es fr om using dy nam ic t r anslat ion and in not at t em pt ing t o m odel fine t im ing det ail
• Sim Now ™ m odels t he ent ire PC plat form .
– Sim Now ™ m odels specific chipset s and funct ionalit y of t hose chipset s; enough t o allow unm odified com m er cially -av ailable BI OS’s an d OS’s t o boot and r un cor r ect ly .
• Sim Now ™ is configur able, and can em ulat e about a dozen differ ent AMD At hlon™64 and AMD Opt er on™ plat for m s. • Sim Now ™ is ent ir ely a user applicat ion
– Sim Now ™ does not r equir e any special dr iv er s t o be inst alled on t h e h ost m ach in e
Sim ulat or Perform ance
• Appr ox im at e Slow dow ns:
– Rev 1.x ( Cir ca 2002) of Sim Now ™: 1000: 1 – Typical t hr eaded -code sim ulat or : 100: 1 – Rev 2.1 ( August , 2003) Sim Now ™: 50: 1 – Cur r ent Sim Now ™: 1 0 : 1 – Ty pical Vir t ual Machine: 1.5: 1
• Sim ulat or per for m ance is cr it ical – it det er m ines how m any ar chit ect ur al t r adeoffs w e can m ak e in a fix ed num ber of m ont hs. A fast sim ulat or enables t he user t o do t ask s in sim ulat ion t hat w er e infeasible w it h a slow sim ulat or .
9 / 4 / 2 0 0 4 7
Why I s The New Sim Now™ So Much
Fast er Than Ear lier Ver sions?
• Pr ev ious v er sions of Sim Now ™ CPU m odels em ulat ed ev er y inst r uct ion in C+ + code.
• New v er sions of Sim Now ™ CPU m odels t r anslat e sim ulat ed x 86 inst r uct ions t o sequences of host x 86 inst r uct ions
– These ar e called “ t r anslat ions” an d ar e cach ed an d ex ecu t ed – Decoding phase is like convent ional sim ulat or s
– Code gener at ion phase is new
• Higher per for m ance because t r anslat ions ar e ex ecut ed m any t im es, so decode/ code- gen t im e is am or t ized
• Many feat ur es can be added w it hout slow ing dow n t he sim ulat or w hen t hese feat ur es ar e off
– Conv ent ional sim ulat or s slow dow n ev en w hen feat ur es ar e off, fr om all t he t est ing t hat w inds up in t he m ain loop
Translat ion Caching Exam ple
…
Add eax , ebx Mov ( eax ) , ecx …
Guest x 86 code
Cached Tr anslat ion I n Host Mem ory
Load host eax fr om m em or y hom e for guest eax v alue Load host ebx fr om m em or y hom e for guest ebx v alue Add eax , ebx
Load host ecx fr om m em or y hom e for guest ecx v alue Com put e host point er fr om eax ( I .e., sim ulat e TLB)
9 / 4 / 2 0 0 4 9
Sim Now™ Dem o!
• We w ill boot an unm odified com m er cially av ailable BI OS and Oper at in g Sy st em
Sim Now™ I nt er nals - - Analyzers
• Key feat ur e: user- w r it t en analy zer s
– Sm all sequences of st y lized C code com piled t o x 86 by t e st r eam – Dy nam ically link ed t o t he sim ulat or t o gat her st at ist ics and
gener at e ex cept ions and per haps ev en m odify inst r uct ion sem ant ics
– All t r acing and br eak point s can be w r it t en by t he user v ia analyzers
– Analy zer s can be added t o v ar ious point s in t he sim ulat or
• At inst ruct ion decode t im e • At except ion processing t im e • At inst ruct ion execut ion t im e
9 / 4 / 2 0 0 4 1 1
Dev ice Models
• While per for m ance of a full- sy st em sim ulat or is dr iv en alm ost ent ir ely by sim ulat ed pr ocessor per for m ance, a full- sy st em sim ulat or r equir es com plex dev ice m odels in or der t o funct ion.
• We spend m or e effor t on t he dev ice m odels t han t he high per for m ance pr ocessor m odel.
• Sim Now ™ m odels num er ous dev ices, including a NI C, a RAI D cont r oller , v ar ious sout hbr idges and nor t hbr idges
– but t his is a t iny fr act ion of t he har dw ar e dev ices in t he m ar ket .
• Sim Now ™ also m odels t he m achine check ar chit ect ur e and pow er st at es of AMD pr ocessor s.
General Sim ulat or Requirem ent s
• As high AMD64 per for m ance as is pr act ical
• Allow user s t o add I / O m odels and fr agm ent s of analy sis code • Det er m inist ic ex ecut ion
– This is cr it ical t o it s usefulness as a soft w ar e m easur em ent t oo l
• Rich scr ipt ing and debugger int er faces
• Abilit y t o m odel MP’s and com plex I / O br idges
• Test abilit y , in par t icular , w e should be able t o com par e ex ecut ion signat ur es w it h ot her sim ulat or s
9 / 4 / 2 0 0 4 1 3
Non- goals
• Por t abilit y t o plat for m s ot her t han t hose based on AMD64 pr ocessor s
– Long m ode sim ulat ion r equir es an AMD Opt er on™ or At hlon™ 6 4 host r unning a 64 -b it OS
• Por t abilit y t o OS’s ot her t han Linux- 64 and Window s- 64
– An d w e w on ’t suppor t all v er sions of Linux and Window s
• Tim ing accur acy ( w e use a m icr osim ulat or for t hat ) • Com plet e I / O m odels ( t ak es t oo long, not necessar y )
– Som e I / O m odels only m odel 10-2 0 % of t he w hole I / O chip
– We m odel w h at BI OS’s, OS’s and applicat ions need, not ev er y t hing in t he specificat ions/ RTL
Micr osim ulat ion On The Cheap
• We do m icr osam pling, a com binat ion of Sim Now ™ an d a t im ing- accur at e pr ocessor m odel ( sim x ) . Her e ’s how it w or k s:
– Sim Now ™ r uns t he w hole w or k load, i. e. , t he OS + benchm ar k – Sim x r uns for , say , 1m inst r uct ions ev er y billion inst r uct ions o f
w or k load. So it ex ecut es a sm all fr act ion of t he t ot al st r eam . – Resear ch r esult s ( see t he “ Sm ar t s” paper in 2003 I SCA) show t hat
w it h t his t echnique one can get per for m ance r esult s t hat ar e w it hin 1. 5 per cent of t he r esult obt ained w hen one r uns t he
9 / 4 / 2 0 0 4 1 5
Current St at us
• Sim Now ™ as of August , 2004:
– Boot s and r uns Linux-6 4 , Lin u x-32, Window s ® 2000, Window s ® XP, Window s64, Solar is ( 32-bit v er sion) , and DOS
– Runs unm odified Phoenix and Aw ar d BI OS’s for AMD At hlon ™ 6 4 and AMD Opt er on™ plat form s
– Runs SpecJBB w it h sim ulat ed 1P, 2P, 4P, and 8P configur at ions – Runs 64 -bit Linux builds of SPECint 2000 and SPECfp2000 – Runs SYSm ar k ® and Winst one ® benchm arks
– Can gener at e t r ace files in sev er al for m at s
– Pr ov ides an int er face t o sev er al com m er cial debugger s
Conclusion
• We hav e built a new k ind of CPU m odel ( new , t hat is, t o t he x 86 space)
– I t is about 100 t im es fast er t han t he m odel it replaced
– Tot al syst em perform ance im provem ent varies considerably wit h workload
• Our goal is t o pr oduce t he fast est , m ost flex ible, m ost r eliable , and m ost useful x 86 sim ulat or in t he indust r y . We t hink w e ’v e achiev ed t h is.
• Sim Now ™ is av ailable t o AMD par t ner s.
© 2 0 0 4 Ad v a n ced Mi cr o D ev i ces, I n c.
AMD, t h e AMD Ar r o w l o g o , AMD At h l o n , AMD Op t er o n , an d co m b i n at i o n s t h er eof ar e t r ad em ar k s of Ad v an ced Micr o Dev ices, I n c. Win d ow s is a r eg ist er ed t r adem ar k of
9 / 4 / 2 0 0 4 1 7
Backup
•
Slow dow n – Key Sim ulat or Met r ic
• B( X) m eans B runs on X
• Tim e( Y) m eans t im e t o do Y
• Say w e r un a benchm ar k B on sim ulat or S, w hich is
r unning on host H, i.e., B( S( H) ) • Tsim u lat or = Tim e( B( S( H) ) )
• Th ost = Tim e( B( H) )
9 / 4 / 2 0 0 4 1 9
Sim Now™ I nt er nals - - Translat ions
• Regist er - r egist er oper at ions ar e j ust copied, 1: 1
• Mem or y oper at ions call int o 12 t o 14 inst r uct ion soft w ar e TLB look up sequence t hat t r anslat es t he sim ulat ed v ir t ual
m em or y addr ess t o a point er on t he host m achine
• Com plex inst r uct ions t r anslat e t o a call sequence – heav y lift ing is done in C+ + code, j ust lik e a conv ent ional sim ulat or • Unit of t r anslat ion is t he basic block , or shor t er
• Most t r anslat ions ar e chained t o t heir successor t r anslat ions, r educing t he ov er head t o dispat ch in m ost cases
• Han d- coded assem bly for t he ot her cases, e. g. , t o find t he successor of indir ect j um ps and r et ur ns
Developing Device Models
• Sim Now ™ dev ice m odels ar e all w r it t en t o a com m on int er face, and com pile t o DLL’s on Window s, or . so on Linux • Sim Now ™ discov er s t hese DLL’s or . so’s at r unt im e, and has
no pr edet er m ined k now ledge about dev ice m odels
• Ther efor e, dev ice m odels can be w r it t en by a user w it hout k now ledge of Sim Now ™ int er nals, as long as t he com m on int er face is docum ent ed and under st ood
• We hav e cr eat ed a Sim Now ™ dev ice m odel SDK. Wit h t his SDK, a user can cr eat e a Sim Now ™ dev ice m odel w it h ou t needing t he sour ce code for t he r est of Sim Now ™