关闭

关闭

封号提示

内容

首页 mdcc.pdf

mdcc.pdf

mdcc.pdf

上传者: hushu999 2012-04-02 评分 0 0 0 0 0 0 暂无简介 简介 举报

简介:本文档为《mdccpdf》,可适用于IT/计算机领域,主题内容包含MDCC:MultiDataCenterConsistencyTimKraskaGenePangMichaelJFranklinSamuelMadd符等。

MDCC:MultiDataCenterConsistencyTimKraskaGenePangMichaelJFranklinSamuelMaddenUCBerkeleyMIT{kraska,gpang,franklin}csberkeleyedumaddencsailmiteduABSTRACTReplicatingdataacrossmultipledatacentersnotonlyallowsmovingthedataclosertotheuserand,thus,reduceslatencyforapplications,butalsoincreasestheavailabilityintheeventofadatacenterfailureTherefore,itisnotsurprisingthatcompanieslikeGoogle,Yahoo,andNetflixalreadyreplicateuserdataacrossgeographicallydifferentregionsHowever,replicationacrossdatacentersisexpensiveInterdatacenternetworkdelaysareinthehundredsofmillisecondsandvarysignificantlySynchronouswideareareplicationisthereforeconsideredtobeunfeasiblewithstrongconsistencyandcurrentsolutionseithersettleforasynchronousreplicationwhichimpliestheriskoflosingdataintheeventoffailures,restrictconsistencytosmallpartitions,orgiveupconsistencyentirelyWithMDCC(MultiDataCenterConsistency),wedescribethefirstoptimisticcommitprotocol,thatdoesnotrequireamasterorpartitioning,andisstronglyconsistentatacostsimilartoeventuallyconsistentprotocolsMDCCcancommittransactionsinasingleroundtripacrossdatacentersinthenormaloperationalcaseWefurtherproposeanewprogrammingmodelwhichempowerstheapplicationdevelopertohandlelongerandunpredictablelatenciescausedbyinterdatacentercommunicationOurevaluationusingtheTPCWbenchmarkwithMDCCdeployedacrossgeographicallydiversedatacentersshowsthatMDCCisabletoachievethroughputandlatencysimilartoeventuallyconsistentquorumprotocolsandthatMDCCisabletosustainadatacenteroutagewithoutasignificantimpactonresponsetimeswhileguaranteeingstrongconsistencyINTRODUCTIONTolerancetotheoutageofasingledatacenterisnowconsideredessentialformanyonlineservices,,,Achievingthisinadatabasesystemrequiresreplicatingdataacrossmultipledatacenters,andkeepingthosereplicassynchronizedandconsistentForexample,Google’semailserviceGmailsynchronouslyreplicatesacrossfivedatacenterstosustaintwodatacenteroutages:oneplannedandoneunplannedReplicationacrossdatacenters,however,isexpensiveInterdatacenternetworkdelaysareinthehundredsofmillisecondsandvarysignificantlyasshowninFigureformessagedelaysbetweendifferentAmazonregionsTraditionalcommitprotocolsforensuringtransactionalconsistencyindistributeddatabases(eg,twophasecommit(PC))werenotdesignedfortheseandhighlyvariablelatenciesthatoccurinwideareanetworksFirst,existingprotocolsarepessimisticandarerequiredtoprepareallresourcesinvolvedinatransactionbyessentiallyacquiringalockontheresource,causinganadditionalmessagedelayandblockingtheresourceforotherconcurrenttransactionsTheselocksaretypicallyheldforatleasttwonetworkroundtriptimes,which,asshowninFigure,canoftenresultinseveralhundredmillisecondsadditionallatencySuchadditionallatencycansignificantlyimpacttheusabilityofwebsitesforexampleshowsthananadditionalmillisecondsoflatencycanresultinasignificantdropinusersatisfactionand“abandonment”ofwebsiteseJunJunJunJunJunRoundtrip(ms)DateWestEUEastEUWestTokyoEastTokyoFigure:RoundtripresponsetimesbetweenvariousregionsonAmazon’sECclusterSecond,theseprotocols(PCinparticular)relyonasinglecoordinatortodeterminetheoutcomeofatransaction,makingthemnonresilienttocoordinatorfailureInparticular,theyrequirelockstobehelduntilthecoordinatorrecoversfromafailureThisissuerenderssuchprotocolsalmostunusableinthewidearea,sincefailureswillsometimesoccur,anditisnotpracticaltostalltransactionsuntilrecoverycanbecompletedHence,thetraditionalwaysdatabaseshavehandledthisareeitherasynchronousreplicationfromamastertooneormorereplicas(eg,logshipping)orforfeitingconsistencyentirelyandusinganeventuallyconsistentprotocolIntheeventofafailure,bothapproachesmaylosecommittedtransactions,becomeunavailable,orviolateconsistencyInthispaper,wedescribeMDCC(shortfor"MultiDataCenterConsistency"),thefirstoptimisticcommitprotocolthatavoidsrearXiv:vcsDBMarlianceonacentralcoordinatorandprovidesstrongconsistencyatacostsimilartoeventuallyconsistentprotocolsSpecifically,MDCCrequiresonlyasinglewideareamessageroundtriptocommitatransactioninthecommoncase,andis"masterless",meaningitcanapplyreadsorupdatesfromanynodeinanydatacenterSimilartoPC,theMDCCcommitprotocolcanbecombinedwithdifferentreadguaranteesInitsdefaultconfiguration,itguaranteesreadcommittedconsistencywithoutlostupdatesbydetectingallwritewriteconflictsOntheTPCWbenchmarkdeployedacrossfiveAmazondatacenters,MDCCreducespertransactionlatenciesby(toms)ascomparedtoPC,withtransactionthroughputsthataretwiceashighInadditiontotheoptimisticcommitprotocol,MDCCprovidesanovelprogrammingmodelthatisdesignedtohandlethehighvarianceinroundtriplatenciesinthewidearea,whichcanresultinasmallfractionofmessagesbeinghighlydelayed(seeFigure)Theprogrammingmodelisservicelevelobjective(SLO)awareandallowsprogrammerstogiveusersdifferentresponsesdependingonwhetheranoperation(eg,a"purchase"inawebstore)hascommittedorisstillpendingaftersometime(ie,SLO)Specifically,weexposethedifferentstagesofatransactionbyprovidingacallbackmechanismfortheapplicationdeveloperThisallowswritinguserfacingapplicationswithoutsacrificingtheuserexperienceintheeventofhighlydelayedmessagesMDCCisnottheonlysystemthatperformswideareareplication,butitistheonlyonethatprovidesthecombinationoflowlatency(throughsinglemessagecommits)andstrongconsistency,withoutrequiringamasterorasignificantdatabaseredesign(eg,theuseofstaticpartitions,asinMegastore)ItisthefirstprotocoltouseGeneralizedPaxosasacommitprotocol,combiningitwithtechniquesfromthedatabasecommunity(escrowtransactionsanddemarcation)Atahighlevel,theprotocolisabletoachievesinglemessagecommitsby)ensuringthateverycommithasbeenreceivedbyaquorumofreplicas,and)piggybackingnotificationofcommitstateonsubsequenttransactionsAnumberofsubtletieshadtobeaddressedtocreatea"masterless"approach,includingsupportforcommutativeupdateswithvalueconstraints,andtohandlingconflictsthatoccurbetweenconcurrenttransactionsInsummary,thekeycontributionsofMDCCare:•Ournewoptimisticcommitprotocol,whichachieveswideareatransactionalconsistencywhilerequiringonlynetworkroundtripinthecommoncase•Anovelprogrammingmodelthatallowsprogrammerstoprovidefeedbacktousersaboutthestateoftheirtransactionswhentherearenetworkorconflictrelatedcommitdelays•PerformanceresultsbasedontheTPCWbenchmarkshowingthatMDCCprovidesstrongconsistencywithsimilarcoststoeventuallyconsistentprotocolsWefurthershowtheeffectsofMDCC’soptimizationsforthenormaloperationalcase,andtheperformanceimpactduringasimulateddatacenterfailureTheremainderofthispaperisorganizedasfollowsInsectionweshowtheoverallarchitectureofMDCCSectionpresentsthenewprogrammingmodelthathelpsdevelopershandleunpredictableandlongernetworkdelaysbetweendatacentersInsectionwedescribeMDCC’snewoptimisticcommitprotocolforthewideareanetworkSectiondiscussestheMDCC’sreadconsistencyguaranteesInsectionwepresentourexperimentsusingMDCCacrossdatacentersFinally,insectionwedescriberelatedwork,andconcludeinsectionARCHITECTUREOVERVIEWMDCCusesalibrarycentricapproachsimilartothearchitecturesofDBSorMegastore(asshowninFigure)ThisarchitectureseparatesthestatefulcomponentofadatabasesystemasadistributedrecordmanagerAllhigherlevelfunctionality(suchasqueryprocessingandortransactionmanagement)isprovidedthroughastatelessDBlibrary,whichcanbedeployedattheapplicationserverAsaresult,theonlystatefulcomponentofthearchitecture,thestoragenode,issignificantlysimplifiedandscalablethroughstandardtechniquessuchashashpartitioning,whereasallhigherlayersofthedatabasecanbereplicatedfreelywiththeapplicationtierbecausetheyarestatelessEverystoragenodeisresponsibleforoneormorehorizontalpartitionsofthedataandpartitionarecompletelytransparenttotheuserMDCCplacesstoragenodesindifferentdatacenters,whichareusuallygeographicallydistributedAlthoughnotrequired,weassumefortheremainderofthepaperthateverydatacentercontainsafullreplicaofthedata,whereasthedataitselfinsideasingledatacenterispartitionedacrossmachinesTheDBlibraryprovidesMDCC’sprogrammingmodelfortransactionsandismainlyresponsibleforcoordinatingthereplicationandconsistencyofthedatathroughoutthesystembyimplementingMDCC’scommitprotocolInaddition,theDBlibrarycaneithertalkdirectlytothestorageservers(reddottedarrowsinFigure)orcanchooseastoragenodetoactonitsbehalfandcoordinatethetransaction(blackarrowsinFigure)Thisleadstoaveryflexiblearchitectureinwhichstoragenodesorapplicationserverscanactasthemasterforarecord,dependingonthesituation(seeSection)DatacenterIVDatacenterIIIDatacenterIDatacenterIIApplicationServersStorageServersMasterServersFigure:MDCCarchitectureIntheremainderofthispaper,weconcentrateonthetransactionprogrammingmodelandtheMDCCcommitprotocolofthearchitectureOtherpartsofthesystem,suchasloadbalancingorthestoragenodedesignarebeyondthescopeofthispaperandtheinterestedreaderisreferredto,THEMDCCPROGRAMMINGMODELBecauseMDCCstoragenodesareindifferentdatacenters,transactionswillneedtoaccessdatafrommultipledatacentersLongandhighlyvariableroundtriplatenciesbetweendatacenterscanresultinsignificanttransactionexecutionlatenciesTohelpdeveloperscopewiththeselonglatencies,MDCCprovidesanewprogrammingmodelthatallowsdeveloperstospecifycertaincallbacksthatareexecuteddependingonthedifferentphasesofatransactionStateoftheArtCurrenttransactionprogrammingmodels,suchasJDBCorHibernate,providelittleornosupportforachievingresponsetimegoalsExistingprogrammingmodelsimplementa“fireandhope”paradigm,whereoncethetransactionisexecuted,theusercanonlySessionsess=factoryopenSession()Transactiontx=try{tx=sessbeginTransaction()txsetTimeout()Thetransactionoperationsbooleansuccess=txcommit()}catch(RuntimeExceptione){}finally{sessclose()}Listing:HibernateTransactionhopethatitwillfinishwithinthedesiredtimeframeIfthetransactiondoesnotreturnbeforetheapplication’sresponsetimelimit,itsoutcomeisentirelyunknownInsuchcases,mostapplicationschoosetodisplayavaguemessageaboutaservertimeoutListingshowsa“fireandhope”transactionwithHibernatethetimeoutsettomillisecondsThatis,withinmsthetransactioneitherreturnswiththeoutcomestoredinthevariablesuccess,oranexceptionisthrownInthecaseofanexception,theoutcomeofthetransactioniscompletelyunknownEssentially,developershavetwooptionstorecoverfromthisunknownstate:eithertheyperiodicallypollthedatabasetocheckiftheupdateswerewritten,or“hack”thedatabasetogetaccesstothepersistentlogThefirstoptionisalmostimpossibletoimplementasitisoftenunfeasibletodistinguishbetweenanapplication’sownchangesandchangesofotherconcurrenttransactionsThesecondoptionrequiresdetailedknowledgeabouttheinternalsofthedatabasesystemandisespeciallyhardinadistributeddatabasesystemwithnocentralizedlogLanguageMDCCaddressestheseshortcomingsbymakingservicelevelobjectives(SLO)anexplicitpartoftheprogrammingmodelandbyprovidingawayforapplicationstoregistercallbackswhichareexecuteddependingontheoutcomeofatransactionFurthermore,bymakingSLOsexplicit,thesystemcantakeadvantageofthisinformationandoptimizetransactionexecutionaccordinglyCodelistingshowsafullexampleofanMDCCtransactionintheScalaprogramminglanguageThedetailsofthislistingareexplainedintheremainderofthissectionDataModel,QueryLanguageConsistencyIngeneral,MDCC’sprogrammingmodelcanbeusedwithdifferentdatamodels,querylanguagesandconsistencyguarantees,similartoJDBCbeingusedwithSQLorXQueryifthedatabasesupportsitInourcurrentimplementation,weprovideasimpleobjectrelationaldatamodelObjects(ie,tuples)areorganizedintotables,withtheclassdefinitionastheschemaEveryclassisallowedprimaryaswellascomplextypesasattributesOneormoreattributeshavetobedeclaredasaprimarykeyoftheclassThedevelopermayalsoannotateattributeswithdomainconstraints,eg,thestockattributemustnotfallbelowFinally,attributescanbedeclaredtosupportcommutativeupdatesTheseattributescanthenbemodifiedusingdecrementandincrementmethods,inadditiontotheusualgetterandsettermethodsMDCCprovidesakeyvalueAPItoretrieveandstoreobjectsinsideatable,suchasget(key),put(object),andgetRange(startKey,endKey)Secondaryindexescanbecreatedandprobedthroughasimilarinterface(ie,indexNameget(secondaryKey))MDCCalsosupportsahigherdeclarativeSQLlikequerylanguage,calledPIQLPIQLqueriescompiledowntopreviouslymentionedvalt=newTx()({Transactionoperations,withget()andput()requests,orPIQLqueries})onFailure{Errorhandlingcode}onAccept{Showpendingstatuspage}onCommit(success=>{if(success)ShowsuccesspageelseShowfailurepage})finally((success,timeout)=>{Callback:UpdatestatusviaAJAX})finallyRemote((success,timeout)=>{Callback:Updatestatusviaemail})valstatus=tExecute()Listing:FullexampleinScalaput,get,getRangeoperationsandcanbestaticallyanalyzedfortheirSLOcompliance(see)Formoredetailsonthedataandquerysupport,theinterestedreaderisreferredtoInourimplementation,weallowqueriesandupdatestobeinterleavedwiththeapplicationlogicHowever,durabilityandtransactionalguaranteesareonlyprovidedforthetableoperations(ie,putandget)Thatis,asinmostdatabaseprograms,changestoapplicationvariablesnotstoredinthedatabaseareimmediatelyvisibletoconcurrenttransactions,arenotpersistent,andarenotundoneintheeventoftransactionfailureAsthetransactionblockexecutes,itgeneratesawriteandareadsetofthetransactionandallwritesarepostponeduntiltheendofthetransactionMDCC’sdefaultconsistencyguaranteeisreadcommittedwithoutthelostupdateproblemThatis,MDCCguaranteesthatonlycommittedvaluesarereadandpreventsallwritewriteconflictsAlthoughwepostponewritesuntiltheendofthetransaction,weallowduringthetransactiontoreadthewritesbyusingarecordcacheNote,thatthedefaultconsistencylevelinmostcommercialdatabasesystemsisstilltheweakerformofreadcommitted,,whichmightloseupdates(ie,overwritingavaluefromanothertransaction)SectiondiscussestheotherconsistencyguaranteesinmoredetailandhowtoachievehigherlevelsofconsistencySLOandStagesIntheMDCCprogrammingmodel,developersmustexplicitlyspecifyaresponsetimeSLOforeverytransactionTheSLOisatimeoutparameterforthetransactiontheMDCCsystemguaranteesthatexecutionwillreturnbacktotheapplicationwithinthespecifiedSLOtimeIntheexampleinlisting,thetimeoutvalueismillisecondsTheprogrammingmodelforcesthedevelopertoexplicitlyconsidertheacceptableresponsetimesfortheenduserThedevelopercanspecifyblocksofcodefordifferentstagesofthetransactionThesestagesofthetransactionareonFailure,onAccept,andonCommitThetransactionwillrunandattempttofinishasmanystagesaspossiblewithinthespecifiedtimeoutAttheendofthedeveloper’sspecifiedtimeoutperiod,thecodeforlatestreachedstageisrunIfnothingisknownaboutthetransactionatthetimeout,theonFailurecodeisexecutedIfthetransactionisacceptedandisstillexecutingbythedatabasesystem,butthecommitsuccessisunknownwhenthetimeouthappens,theonAcceptcodeisexecutedIfthetransactionfinishesbeforethetimeout,theonCommitcodeisexecutedInlisting,thetransactiondefinesthethreestages,onFailure,onAccept,andonCommitinordertohandleallthepossiblestatesofthetransactionwhenthetimeoutoccursBothonAcceptandonCommitareoptionalTheremaybesituationswhentheapplicationonlyneedstoknowwhenthetransactionvalt=newTx()({varorder=newOrder(custkey,date)ordersput(order)varproduct=productsget("Product")varorderline=newOrderLine(productid,)orderlinesput(orderline)productstock=productsput(product)})onFailure{Showerrormessage}onAccept{Showpage:Thanksforyourorder!}onCommit(success=>{if(success)ShowsuccesspageelseShowordernotsuccessfulpage})finally((success,timeout)=>{if(!timeout)UpdateviaAJAX})finallyRemote((success,timeout)=>{Emailuserthestatus})Listing:Webshopexamplewasacceptedbutnotyetcommitted,inwhichcasethedeveloperonlyneedstosupplyonAcceptThismeans,thetransactionwillneverbelostandwilleventualexecute,buttheapplicationcontinueswithoutwaitingforthecommitabortnotificationOtherapplicationsmayonlyneedtoknowthatthetransactioncompleted,inwhichcasethedeveloperwillonlysupplycodeforonCommitThetransactionwillwaitforthecommitbeforereturningbacktotheapplicationIfthetransactiondoesnotcommitbeforethetimeout,thentheonFailurecodewillbeexecutedinsteadInaddition,therearespecialcallbackstagesfinally,andfinallyRemoteThedeveloperprovidesthesecallbacks,whichareexecutedafterthetransactionhascompletelyexecutedThesecallbacksmaybeexecutedafterthedeveloperspecifiedtimeout,andareusedtoinformtheapplicationofthefinaloutcomeofthetransactionWedescribetheseclausesnextFinallyandFinallyRemoteBothstages,finallyandfinallyRemote,arespecialcallbackstagesusedtonotifytheapplicationoftheactualcommitdecisionofthetransactionTheyaredifferentfromtheotherstagesbecausetheyarenotrestrictedtothespecifiedtimeoutTheyarealwaysrunafterthetransactioncommits,whichmaybeafterthetimeoutIncontrasttothefinallycode

职业精品

精彩专题

上传我的资料

热门资料

资料评价:

/ 12
所需积分:0 立即下载

意见
反馈

返回
顶部

Q