下载

1下载券

加入VIP
  • 专属下载特权
  • 现金文档折扣购买
  • VIP免费专区
  • 千万文档免费下载

上传资料

关闭

关闭

关闭

封号提示

内容

首页 InvSqrt

InvSqrt.pdf

InvSqrt

haha
2010-11-06 0人阅读 举报 0 0 0 暂无简介

简介:本文档为《InvSqrtpdf》,可适用于IT/计算机领域

FASTINVERSESQUAREROOTCHRISLOMONTAbstractComputingreciprocalsquarerootsisnecessaryinmanyapplications,suchasvectornormalizationinvideogamesOften,somelossofprecisionisacceptableforalargeincreaseinspeedThisnoteexaminesandimprovesafastmethodfoundinsourcecodeforseveralonlinelibraries,andprovidestheideastoderivesimilarmethodsforotherfunctionsIntroductionReadingthemathprogrammingforumonwwwgamedevnet,IranacrossaninterestingmethodtocomputeaninversesquarerootTheCcodewasessentially(mycomments):floatInvSqrt(floatx){floatxhalf=f*xinti=*(int*)xgetbitsforfloatingvaluei=xfdf(i>>)givesinitialguessyx=*(float*)iconvertbitsbacktofloatx=x*(fxhalf*x*x)Newtonstep,repeatingincreasesaccuracyreturnx}Theinterestingpartwastheconstantxfdf:wherediditcomefromandwhydoesthecodeworkSomequicktestinginVisualCNETshowedthecodeabovetoberoughlytimesfasterthanthenaive(float)(sqrt(x)),andthemaximumrelativeerroroverallfloatingpointnumberswas,sothemethodseemsveryusefulThreeimmediatequestionswere:)howdoesitwork,)canitbeimproved,and)whatbitmasterdesignedsuchanincrediblehackAquicksearchonGoogleforxfdfreturnedseveralhits,themostrelevantbeingareferencetoathreadaboutthiscodeoncompgraphicsalgorithmsfromJan,,andan(incorrect,MathematicsSubjectClassificationPrimaryG,YSecondaryWKeywordsandphrasesInversesquareroot,speedtradeoffs,IEEECHRISLOMONTbutclose)explanationbyDEberlyHoweverhisexplanationisilluminatingFurtherdiggingfoundnocorrectexplanationofthiscodeItappearsinthesourcecodeforQuake,writtenbylegendarygameprogrammerJohnCarmack,butsomeoneonthenet(Icannotnowfindthereference)creditedittoGaryTarolliwhowasatNvidiaCananyoneconfirmauthorshipItalsoappearsintheCrystalSpacesourcecode,theTitanEngine,andtheFastCodeLibrary,althougheachseemstoderivefromQuakeThemotivationtotrysuchanalgorithmismoreclearlyexplainedinEberly,whereheassumestheshiftcreatesalinearinterpolationtotheinversesquarerootNotethereareseveralwaystospeedupthiscode,butthisnotewillnotgointofurtheroptimizationsTherearealsofastermethods,perhapstablelookuptricks,butmostofthemhavelessaccuracythanthismethodThisnoteassumesPCarchitecture(bitfloatsandints)orsimilarInparticularthefloatingpointrepresentationisIEEEThisCcodehasbeenreportedtobeendianneutral(supposedlytesteditonaMacintosh)IhavenotverifiedthisSincethemethodworksonbitnumbersitseems(surprisingly)endianneutralItiseasytoextendthecodetoothersituationsandbitsizes(suchastypedouble)usingtheideasinthisnoteAnyway,ontotheproblems),),and)BackgroundFloatingpointnumbersarestoredinthePCasbitnumberssEMbit←bits→←bits→wheresisabitsign(denotesnegative),Eisanbitexponent,andMisabitmantissaTheexponentisbiasedbytoaccommodatepositiveandnegativeexponents,andthemantissadoesnotstoretheleading,sothinkofMasabinarynumberwiththedecimalpointtotheleft,thusMisavalueinI=,)Therepresentedvalueisx=(−)s(M)E−Thesebitscanbeviewedasthefloatingpointrepresentationofarealnumber,orthinkingonlyofbits,asanintegerThusMwillbeconsideredarealnumberinIorasaninteger,dependingoncontextMasarealnumberisMasanintegerdividedbyFASTINVERSESQUAREROOTTheAlgorithmThemainideaisNewtonapproximation,andthemagicconstantisusedtocomputeagoodinitialguessGivenafloatingpointvaluex>,wewanttocompute√xDefinef(y)=y−xThenthevalueweseekisthepositiverootoff(x)Newton’srootfindingmethod,givenasuitableapproximationyntotheroot,givesabetteroneynusingyn=yn−f(yn)f′(yn),n≥Forthef(y)given,thissimplifiestoyn=yn(−xyn)whichcorrespondstothelineofcodex=x*(fxhalf*x*x),wherexistheinitialguess,whichhereafterwillbecalledyThelineofcodei=xfdf(i>>)computesthisinitialguessy,roughlybymultiplyingtheexponentforxby−,andthenpickingbitstominimizeerrori>>istheshiftrightoperatorfromC,whichshiftsthebitsofanintegerrightoneplace,droppingtheleastsignificantbit,effectivelydividingbyEberly’sexplanationwasthatthisproducedlinearapproximation,butisincorrectwe’llseetheguessispiecewiselinear,andthefunctionbeingapproximatedisnotthesameinallcasesHoweverIwouldguesshisexplanationwastheinspirationforcreatingthisalgorithmTheMagicNumber(s)ThusweareleftwithfindinganinitialguessSupposewearegivenafloatingpointvaluex>,correspondingtothebitsEMasaboveLettheexponente=E−Thenx=(M)e,andthedesiredvaluey=√x=√M−e,treatingeandMasrealnumbers,NOTasintegersForthegeneralcasewetakeamagicconstantR,andanalyzey=R−(i>>),wherethesubtractionisasintegers,iisthebitrepresentationofx,butweviewyasarealnumberRistheintegerrepresentingafloatingpointvaluewithbitsRRie,RintheexponentfieldandRinthemantissafieldWhenweshiftirightonebit,wemayshiftanexponentbitintothemantissafield,sowewillanalyzetwocasesFortherestofthisnote,forarealnumberαletbαcdenotetheintegerlessthanorequaltoαCHRISLOMONTExponentEEvenInthiscase,whenweusei>>,nobitsfromtheexponentEareshiftedintothemantissaM,andbEc=EThetrueexponente=E−isodd,saye=d,danintegerThebitsrepresentingtheinitialguessgivethenewexponentR−bEc=R−E=R−e=R−d=R−−dWerequirethistobepositiveIfitwerenegativetheresultingsignbitwouldbe,andthemethodfailstoreturnapositivenumberIfthisresultis,thenthemantissapartcouldnotborrowfromtheexponent,whichwewillseebelowisnecessarySincethismustbetrueforanyevenE∈,werequireR≥SincetheexponentEiseven,nobitsfromitshiftintothemantissaunderi>>,sothenewmantissaisR−bMc(asintegers),assumingR≥bMcTheinitialguessistheny=(R−M)(R−−d)−=(R−M)R−−d=(R−M)R−−dwhereMreplacedbMc,sincetheresultingerrorisatmost−inthemantissa(asarealnumber),whichisinsignificantcomparedtoothererrorsinthemethodIfR<M,thenuponsubtractingR−(i>>)asintegers,thebitsinthemantissafieldwillhavetoborrowonefromtheexponentfield(thisiswhyweneededthenewexponentpositiveabove),thusdividingybytwoThebitsinthemantissafieldwillthenbe(R)−bMc,whicharestillinIThusifR<My=((R−M))(R−−d)−−=(R−M)R−−dwherewewritetheexponentthesameasinthenoncarryingcaseIfwedefineβRM={R−M:R≥MR−M:R<MFASTINVERSESQUAREROOTthenwecancombinethesetwoyequationsintoone:y=(βRM)R−−dNotethatβRMiscontinuous,anddifferentiableexceptalongR=MSubstitutinge=d,thevaluey=√xwearetryingtoapproximateis√M−e=√M−d−=√√M−dTherelativeerror∣∣∣y−yy∣∣∣,whichishowwewillmeasurethequalityofthemethod,is∣∣∣∣∣√√M−d−(βRM)R−−d√√M−d∣∣∣∣∣whichsimplifiesto|ε(M,R)|,ε(M,R)=−√√M(βRM)R−Notethatthisnolongerdependsond,andthatMcanbeanyvalueinI=,)ThisformulaisonlyvalidifEiseven,thushasahiddendependenceonESupposewewantRsothattherelativeerrormaxM∈I|ε|<Then>maxM∈I|ε|≥|ε(,R)|=∣∣∣−√(R)R−∣∣∣=∣∣−(R)R−∣∣Expanding,−<−(R)R−<≥(R)>R−>(R)>wherethelaststepusedthefactthatR∈I⇒(R)∈,)Takinglogandaddinggives>R>,soR==xbeistheuniqueintegerthathasanychanceofkeepingtherelativeerrorbelowforevenEInbitpositionsthrough,thisgivestheleading(xbe>>)=xfpartofthemagicconstantR(aswellasrequiringbittobe)CHRISLOMONTMyRMerrorRFigureFigureToillustrate,Figureshowsyasasolidlineandthefunction(M)−−dneedingapproximatedasadashedline,forR=,d=Itisclearthatyisanonlinearapproximationhowever,bybeingnonlinear,itactuallyfitsbetter!Figureshowstherelativeerror|ε(M,R)|forR∈IandM∈IItisclearthattheconstantRcrosssectionwiththesmallestmaximalerrorisapproximatelyR=ExponentEOddWiththepreviouscaseunderourbelt,weanalyzethehardercaseThedifferenceisthattheoddbitfromtheexponentEshiftsintothehighbitofthemantissafromthecodei>>Thisaddstotherealvalueoftheshiftedmantissa,whichbecomesbMc,wherethetruncationisasintegersandtheadditionisasrealnumberse=E−=disevenSimilartoabovethenewexponentisR−bEc=R−⌊e⌋=R−⌊d⌋=R−−dThismustbepositiveasinthepreviouscaseAgainwriteMinsteadofbMcforthereasonsaboveThenewmantissaisR−(M)asrealnumberswhenR≥M,requiringnoborrowfromtheexponentTheny=(R−(M))(R−−d)−=(R−M)R−−d=(R−M)R−−dFASTINVERSESQUAREROOTAgainwechoosethesameexponentforyasinthecasewhenEisevenIfR<M,thesubtractionR−(i<<)needstoborrowonefromthe(positive)exponent,whichdividesyby,andthebitsinthemantissafieldarethenR−(M)asrealnumbers,whichisstillinISoifR<Mwegetfortheinitialguessy=((R−(M)))(R−−d)−−=(R−M)R−−d=(R−M)R−−dDefining(similarlytoβRM)γRM={R−M:R≥MR−M:R<Mwecanwritebothoftheseyatonceasy=(γRM)R−−dNotethatγRMiscontinuous,anddifferentiableexceptalongR=M,soyisalsoUsinge=d,wewantytoapproximatey=√x=√M−e=√M−dNotethisisnotthesamefunctionwewereapproximatinginthecaseEevenitisoffbyafactorof√Therelativeerror(whichwewanttominimize)simplifiesasaboveto|ε(M,R)|,whereε(M,R)=−√M(γRM)R−againindependentofd(butwiththesamehiddendependenceonEasabove)Alsoasbefore,ifwewanttherelativeerrormaxM∈I|ε|<forEodd,wecantakeM=(orM=)andanalyze,butsincewewantthesameconstantRtoworkforEevenandforEodd,wetakeR=fromaboveforbothcasesNotethatthissatisfiestheearlierrequirementthatR≥SofortherestofthisnoteassumeR==xbeandredefinethetwoerrorfunctionsasε(M,R)=−√√M(βRM)ε(M,R)=−√M(γRM)CHRISLOMONTdependingonlyonMandR,eachtakingvaluesinIAtthispointwethoroughlyunderstandwhatishappening,andcomputertestsconfirmtheaccuracyofthemodelfortheapproximationstep,sowehaveachievedgoal)AnExampleWhenx=,E==,M=,andabitfromtheexponentshiftsintothemantissaSoafterthelinei=xfdf(i>>),theinitialguessyisxEDFThenewexponentis−bc==xd,whichcorrespondstoe=−=−Thenewmantissaneedstoborrowabitfromtheexponent,leavinge=−,andisxbdfx=xDF,correspondingtoM=Thusy=(M)−=,whichisfairlycloseto√x=OptimizingtheMantissaWewantthevalueofR∈IthatminimizesmaxM∈I{|ε(M,R)|,|ε(M,R)|}FixavalueofR,andwewillfindthevalue(s)ofMgivingthemaximalerrorSinceεandεarecontinuousandpiecewisedifferentiable,thisMwilloccuratanendpointofapieceoratcriticalpointonapieceMaximizing|ε|Startwiththeendpointsforε:M=,M=,andR=M(whereβRMisnotdifferentiable)WhenM=,βRM=RLetf(R)=ε(,R)=−√√(R)=−R√Similarly,whenR=M,βRM=f(R)=ε(R,R)=−√√R()=−√RWhenM=weneedtoconsiderthetwocasesforβRM,givingf(R)=ε(,R)=(−R):R≥(−R):R<CheckingthecriticalpointstakestwocasesAssumingR≥M,solving∂ε∂M=givesthecriticalpointM=R,givingf(R)=ε(R,R)=−(R)√FASTINVERSESQUAREROOTWhenR<M,thecriticalpointisM=(R)ThiscanonlyhappenifR∈,),sodefinef(R)=:R≥ε((R),R)=−(R)√:R<Maximizing|ε|Similaranalysisonεyieldsf(R)=ε(,R)={−R:R≥(−R):R<f(R)=ε(,R)={−√R:R≥−(R)√:R<AtthepointR=M,γRM=,butthiscanonlyoccurwhenR≥,sowegetf(R)={ε(R−,R)=−√R:R≥:R<ThecriticalpointsagainneedtwocasesIfR≥MthecriticalpointisM=R−ThisonlyoccursifR≥Similarly,forR<MweobtainthecriticalpointM=R,whichrequiresR<Sowedefinebothatoncewithf(R)={ε(R−,R)=−((R)):R≥ε(R,R)=−√(R):R<NoteallfiarecontinuousexceptfCombiningErrorsSo,foragivenR,theabovefunctionsgivevalues,andthemaxoftheabsolutevaluesisthemaxerrorforthisRSowenowvaryR∈ItofindthebestoneRRFigureFigureCHRISLOMONTFigureisaMathematicaplotoftheerrorfunctions|fi(R)|ThemaxoftheseislowestnearR=FigureisacloserlookThenumericalsolverFindMinimuminMathematicafindsthecommonvalueofRthatminimizesmaxi|fi|,returningr=wherethismanydigitsareretainedsincethisconstantcanbeappliedtoanybitlengthoffloatingpointnumber:bitdoubles,bitfutureformats,etcSeetheconclusionsectionNoterisveryclosetotheoriginalvalueinthecodewhichisapproximatelyrcorrespondstotheintegerR=brc=xfAttachingR=xbe,shiftedtoxf,theoptimalconstantbecomesR=xffTestingWetesttheanalysisusingthefollowingfunctionthatonlycomputesthelinearapproximationstep(theNewtonstepisremoved!)floatInvSqrtLinear(floatx,intmagic){floatxhalf=f*xinti=*(int*)xgetbitsforfloatingvaluei=magic(i>>)givesinitialguessyx=*(float*)iconvertbitsbacktofloatreturnx}Weapplythefunctiontotheinitialconstant,theconstantderivedabove,andtheconstantxfa(thereasonwillbeexplainedbelow)Also,wetesteachconstantwiththeNewtonstepperformedanditerations(theaddedtimetorunthesecondNewtonstepwasverysmall,yettheaccuracywasgreatlyincreased)EachtestisoverallfloatingpointvaluesThemaximalrelativeerrorpercentagesareValuePredictedTestedIterationIterationsxfdfexffexfaeSotheanalysiswascorrectinpredictingthatthenewconstantwouldapproximatebetterinpracticeYetsurprisingly,afteroneNewtoniteration,ithasahighermaximalrelativeerrorWhichagainraisesthequestion:howwastheoriginalcodeconstantderivedThereasonthebetterapproximationturnedoutworsemustbeintheNewtoniterationThenewconstantisclearlybetterintheoryandFASTINVERSESQUAREROOTpracticethantheoriginalforinitialapproximation,butafterorNewtonstepstheoriginalconstantperformedbetterOutofcuriosity,IsearchednumericallyforabetterconstantTheresultsofthetheoryimpliedanygoodapproximationsshouldbenearthosefewabove,thuslimitingtherangeofthesearchStartingattheinitialconstant,andtestingallconstantsaboveandbelowuntilthemaximalrelativeerrorexceedsgivesthethirdconstantxfaasthebestoneeachwastestedoverallfloatingpointvaluesThetableshowsithasasmallermaximalrelativeerrorthantheoriginaloneSothefinalversionIproposeisfloatInvSqrt(floatx){floatxhalf=f*xinti=*(int*)xgetbitsforfloatingvaluei=xfa(i>>)givesinitialguessyx=*(float*)iconvertbitsbacktofloatx=x*(fxhalf*x*x)Newtonstep,repeatingincreasesaccuracyreturnx}Thustheinitialgoal)isreached,althoughnotasdirectlyasplannedGoal)remainsopen:)TheCcodeandMathematicacodeareavailableonlineConclusionandOpenQuestionsThisnoteexplainsthe“magic”behindtheconstantinthecode,andattemptstounderstanditinordertoimproveitifpossibleThenewconstantxfaappearstoperformslightlybetterthantheoriginaloneSincebothareapproximations,eitherworkswellinpracticeIwouldliketofindtheoriginalauthorifpossible,andseeifthemethodwasderivedorjustguessedandtestedTheutilityofthisnoteisexplaininghowtogoaboutderivingsuchmethodsforotherfunctionsandplatformsWiththereasoningmethodsabove,almostanysuchmethodcanbeanalyzedHowevertheanalysisshouldbethoroughlytestedonarealmachine,sincehardwareoftendoesnotplaywellwiththeoryTheabovederivationsarealmostsizeneutral,socanbeappliedtobitdouble,orevenlongerpackedtypestomimicSIMDinstructionswithouthardwaresupportTheanalysisallowsthismethodtobeextendedtomanyplatformsandfunctionsForexample,thedoubletypehasanbitexponent,biasedby,andabitmantissaAfterderivingywiththesameformCHRISLOMONTforthemantissaasabove,onlytheboundonRneedsreworked,andtherestoftheanalysiswillstillholdThesameconstantrcanusedforanybitsizeinthismannerHereR=,andthebestinitialapproximationisR=Rbr∗c=xfeecededaAquicktestshowstherelativeerrortobearoundfortheinitialapproximation,andafterNewtoniterationAfinalquestionistoanalyzewhythebestinitialapproximationtotheanswerisnotthebestafteroneNewtoniterationHomework:Afewproblemstoworkon:Deriveasimilarmethodforsqrt(x)Whichisbetter(fastermoreaccurate):aversionthatworksfordoubleorNew

用户评价(0)

关闭

新课改视野下建构高中语文教学实验成果报告(32KB)

抱歉,积分不足下载失败,请稍后再试!

提示

试读已结束,如需要继续阅读或者下载,敬请购买!

评分:

/12

VIP

在线
客服

免费
邮箱

爱问共享资料服务号

扫描关注领取更多福利