首页 robustregression.pdf

robustregression.pdf

robustregression.pdf

上传者: 笨博士86 2013-04-30 评分1 评论0 下载1 收藏0 阅读量376 暂无简介 简介 举报

简介:本文档为《robustregressionpdf》,可适用于高等教育领域,主题内容包含RobustRegressionModelingwithSTATAlecturenotesRobertAYaffee,PhDStatistics,S符等。

RobustRegressionModelingwithSTATAlecturenotesRobertAYaffee,PhDStatistics,SocialScience,andMappingGroupAcademicComputingServicesOffice:ThirdAvenue,LevelCPhone:Email:yaffeenyueduWhatdoesRobustmeanDefinitionsdifferinscopeandcontentInthemostgeneralconstruction:RobustmodelspertainstostableandreliablemodelsStrictlyspeaking:ThreatstostabilityandreliabilityincludeinfluentialoutliersInfluentialoutliersplayedhavocwithstatisticalestimationSince,manyrobusttechniquesofestimationhavedevelopedthathavebeenresistanttotheeffectsofsuchoutliersSASProcRobustreginVersiondealswiththeseSPlusrobustlibraryinStatarreg,prais,andarimamodelsBroadlyspeaking:HeteroskedasticityHeteroskedasticallyconsistentvarianceestimatorsStataregressyxx,robustNonnormalresidualsNonparametricRegressionmodelsStataqreg,rregBootstrappedRegressionbstrapbsqregOutlineRegressionmodelingpreliminariesTestsformisspecificationOutlierinfluenceTestingfornormalityTestingforheterskedasticityAutocorrelationofresidualsRobustTechniquesRobustRegressionMedianorquantileregressionRegressionwithrobuststandarderrorsRobustautoregressionmodelsValidationandcrossvalidationResamplingSamplesplittingComparisonofSTATAwithSPLUSandSASPreliminaryTesting:Priortolinearregressionmodeling,useamatrixgraphtoconfirmlinearityofrelationshipsgraphyxx,matrixyxxTheindependentvariablesappeartobelinearlyrelatedwithyWetrytokeepthemodelssimpleIftherelationshipsarelinearthenwemodelthemwithlinearmodelsIftherelationshipsarenonlinear,thenwemodelthemwithnonlinearornonparametricmodelsTheoryofRegressionAnalysisWhatislinearregressionAnalysisFindingtherelationshipbetweenadependentandanindependentvariableGraphically,thiscanbedonewithasimpleCartesiangraphYabxe=TheMultipleRegressionFormulaYabxe=YisthedependentvariableaistheinterceptbistheregressioncoefficientxisthepredictorvariableGraphicalDecompositionofEffectsXYyˆabx=XYiY}ˆiiyyerror=yˆyregressioneffect=}{iyyTotalEffect=DecompositionofEffectsDerivationoftheInterceptnnniiiiiinnnniiiiiiiiniinnniiiiiiaybxnniiiiyabxeeyabxeyabxBecausebydefinitioneyabxnaybxaybx=====================DerivationoftheRegressionCoefficient:()()()()iiiiiinniiiiinniiiiininniiiiiiinniiiiiiniiiniiGivenyabxeeyabxeyabxeyabxexybxxbxybxxxybx==================•Ifwerecallthattheformulaforthecorrelationcoefficientcanbeexpressedasfollows:fromwhichitcanbeseenthattheregressioncoefficientb,isafunctionofr()()niiinniiiiiixyrxywherexxxyyy======niiijnixybx===*yjxsdbrsd=ExtendingthebivariatetothemultivariateCase*()yxyxxxyyxxxxxrrrsdrsdβ=*()yxyxxxyyxxxxxrrrsdrsdβ=()aYbxbx=ItisalsoeasytoextendthebivariateintercepttothemultivariatecaseasfollowsLinearMultipleRegression•SupposethatwehavethefollowingdatasetStataOLSregressionmodelsyntaxWenowseethatthesignificancelevelsrevealthatxandxarebothstatisticallysignificantTheRandadjustedRhavenotbeensignificantlyreduced,indicatingthatthismodelstillfitswellTherefore,weleavetheinteractiontermprunedfromthemodelWhataretheassumptionsofmultiplelinearregressionanalysisRegressionmodelingandtheassumptionsWhataretheassumptionslinearityHeteroskedasticityNoinfluentialoutliersinsmallsamplesNomulticollinearityNoautocorrelationofresidualsFixedindependentvariablesnomeasurementerrorNormalityofresidualsTestingthemodelformispecificationandrobustnessLinearitymatrixgraphsshownaboveMulticollinearityvifMisspecificationtestsheteroskedasticitytestsrvfplothettestresidualautocorrelationtestscorrgramoutlierdetectiontabulationofstandardizedresidualsinfluenceassessmentresidualnormalitytestssktestSpecificationtests(notcoveredinthislecture)Misspecificationtests•Weneedtotesttheresidualsfornormality•WecansavetheresidualsinSTATA,byissuingacommandthatcreatesthem,afterwehaveruntheregressioncommand•Thecommandtogeneratetheresidualsis•predictresid,residualsGenerationoftheregressionresidualsGenerationofstandardizedresiduals•Predictrstd,rstandardGenerationofstudentizedresiduals•Predictrstud,rstudentTestingtheResidualsforNormalityWeuseaSmirnovKolmogorovtestThecommandforthetestis:sktestresidThisteststhecumulativedistributionoftheresidualsagainstthatofthetheoreticalnormaldistributionwithachisquaretestTodeterminewhetherthereisastatisticallysignificantdifferenceThehypothesisisthatthereisnodifferenceWhentheprobabilityislessthan,wemustrejectthehypothesisandinferthattheresidualsarenonnormallydistributedTestingtheResidualsforheteroskedasticityWemaygraphthestandardizedorstudentizedresidualsagainstthepredictedscorestoobtainagraphicalindicationofheteroskedasticityTheCookWeisbergtestisusedtotesttheresidualsforheteroskedasticityAGraphicaltestofheteroskedasticity:rvfplot,borderyline()ThisdisplaysanyproblematicpatternsthatmightsuggestheteroskedasticityButitdoesn’ttelluswhichresidualsareoutliersCookWeisbergTest()exp()ˆ:iiiiidfpVareztwhereeerrorinregressionmodelzxorvariablelistsuppliedbyuserThetestiswhetherthettestestimatesthemodeleztSSofmodelitformsascoretestShSwherepnumberofparametersσβανχ========CookWeisbergtestsyntaxThecommandforthistestis:hettestresidAninsignificantresultindicateslackofheteroskedasticityThatis,ansucharesultindicatesthepresenceofequalvarianceoftheresidualsalongthepredictedlineThisconditionisotherwiseknownashomoskedasticityTestingtheresidualsforAutocorrelationOnecanusethecommand,dwstat,aftertheregressiontoobtaintheDurbinWatsondstatistictotestforfirstorderautocorrelationThereisabetterwayGenerateacasenumvariable:Gencasenum=nCreateatimedependentseriesRuntheLjungBoxQstatisticwhichtestspreviouslagsforautocorrelationandpartialautocorrelationThesignificanceoftheAC(Autocorrelation)andPAC(Partialautocorrelation)isshownintheProbcolumnNoneoftheseresidualshasanysignificantautocorrelationTheSTATAcommandis:corrgramresidOnecanrunAutoregressionintheeventofautocorrelationThiscanbedonewithneweyyxxxlag()timepraisyxxxOutlierdetection•Outlierdetectioninvolvesthedeterminationwhethertheresidual(error=predicted–actual)isanextremenegativeorpositivevalue•Wemayplottheresidualversusthefittedplottodeterminewhicherrorsarelarge,afterrunningtheregression•Thecommandsyntaxwasalreadydemonstratedwiththegraphonpage:rvfplot,borderyline()CreateStandardizedResiduals•Astandardizedresidualisonedividedbyitsstandarddeviationˆiistandardizedyyresidswheresstddevofresiduals==Standardizedresidualspredictresidstd,rstandardlistresidstdtabulateresidstdLimitsofStandardizedResidualsIfthestandardizedresidualshavevaluesinexcessofand,theyareoutliersIftheabsolutevaluesarelessthan,astheseare,thentherearenooutliersWhileoutliersbythemselvesonlydistortmeanpredictionwhenthesamplesizeissmallenough,itisimportanttogaugetheinfluenceofoutliersOutlierInfluence•Supposewehadadifferentdatasetwithtwooutliers•Wetabulatethestandardizedresidualsandobtainthefollowingoutput:OutlieradoesnotdistorttheregressionlinebutoutlierbdoesbaY=abxOutlierahasbadleverageandoutlieradoesnotInthisdataset,wehavetwooutliersOneisnegativeandtheotherispositiveStudentizedResiduals•Alternatively,wecouldformstudentizedresidualsThesearedistributedasatdistributionwithdf=np,thoughtheyarenotquiteindependentTherefore,wecanapproximatelydetermineiftheyarestatisticallysignificantornot•Belsleyetal()recommendedtheuseofstudentizedresidualsStudentizedResidual()()()isiiisiiieeshwhereestudentizedresidualsstandarddeviationwhereithobsisdeletedhleveragestatistic====Theseareusefulinestimatingthestatisticalsignificanceofaparticularobservation,ofwhichadummyvariableindicatorisformedThetvalueofthestudentizedresidualwillindicatewhetherornotthatobservationisasignificantoutlierThecommandtogeneratestudentizedresiduals,calledrstudtis:predictrstudt,rstudentInfluenceofOutliersLeverageismeasuredbythediagonalcomponentsofthehatmatrixThehatmatrixcomesfromtheformulafortheregressionofYˆ'(')''(')',,ˆYXXXXXYwhereXXXXthehatmatrixHThereforeYHYβ====LeverageandtheHatmatrixThehatmatrixtransformsYintothepredictedscoresThediagonalsofthehatmatrixindicatewhichvalueswillbeoutliersornotThediagonalsarethereforemeasuresofleverageLeverageisboundedbytwolimits:nandTheclosertheleverageistounity,themoreleveragethevaluehasThetraceofthehatmatrix=thenumberofvariablesinthemodelWhentheleverage>pnthenthereishighleverageaccordingtoBelsleyetal()citedinLong,JFModernMethodsofDataAnalysis(p)Forsmallersamples,VellmanandWelsch()suggestedthatpnisthecriterionCook’sDAnothermeasureofinfluenceThisisapopularoneTheformulaforitis:'()iiiiiheCooksDphsh=CookandWeisberg()suggestedthatvaluesofDthatexceededoftheFdistribution(df=p,np)arelargeUsingCook’sDinSTATA•Predictcook,cooksd•Findingtheinfluentialoutliers•Listcook,ifcook>n•Belsleysuggests(nk)asacutoffGraphicalExplorationofOutlierInfluence•Graphcookresidstd,xlabylabThetwoinfluentialoutlierscanbefoundeasilyhereintheupperrightDFbeta•OnecanusetheDFbetastoascertainthemagnitudeofinfluencethatanobservationhasonaparticularparameterestimateifthatobservationisdeleted()()ijjjjjjjbbuDFbetauhwhereuresidualsofregressionofxonremainingxs==ObtainingDFbetasinSTATARobuststatisticaloptionswhenassumptionsareviolatedNonlinearityTransformationtolinearityNonlinearregressionInfluentialOutliersRobustregressionwithrobustweightfunctionsrregyxxHeteroskedasticityofresidualsRegressionwithHuberWhiteSandwichvariancecovarianceestimatorsRegressyxx,robustResidualautocorrelationcorrectionAutoregressionwithpraisyxx,robustneweywestregressionNonnormalityofresidualsQuantileregression:qregyxxBootstrappingtheregressioncoefficientsNonlinearity:TransformationstolinearityWhentheequationisnotintrinsicallynonlinear,thedependentvariableorindependentvariablemaybetransformedtoeffectalinearizationoftherelationshipSemilog,translog,BoxCox,orpowertransformationsmaybeusedforthesepurposesBoxcoxregressionpermitsdeterminestheoptimalparametersformanyofthesetransformationsFixforNonlinearfunctionalform:NonlinearRegressionAnalysisxxnlexpyxestimatesYbbnlexpyxestimatesybbb==Examplesofexponentialgrowthcurvemodels,thefirstofwhichweestimatewithourdataNonlinearRegressioninStata•nlexpyx•(obs=)•Iteration:residualSS=•Iteration:residualSS=•Iteration:residualSS=•Iteration:residualSS=•SourceSSdfMSNumberofobs=•F(,)=•ModelProb>F=•ResidualRsquared=•AdjRsquared=•TotalRootMSE=•Resdev=•paramexpgrowthcurve,y=b*b^x••yCoefStdErrtP>tConfInterval••b•b••(SE's,Pvalues,CI's,andcorrelationsareasymptoticapproximations)•HeteroskedasticitycorrectionProfHalbertWhiteshowedthatheteroskedasticitycouldbehandledinaregressionwithaheteroskedasticityconsistentcovariancematrixestimator(DavidsonMcKinnon(),EstimationandInferenceinEconometrics,OxfordUPress,p)ThisvariancecovariancematrixunderordinaryleastsquaresisshownonthenextpageOLSCovarianceMatrixEstimator(')(')(')(')tXXXXXXwheresXXΣΣ=White’sHACestimatorWhite’sestimatorisforlargesamplesWhite’sheteroskedasticitycorrectedvarianceandstandarderrorscanbelargerorsmallerthantheOLSvariancesandstandarderrorsHeteroskedasticallyconsistentcovariancematrix“Sandwich”estimator(HWhite)(')(')('),:::::()ttttttttnXXnXXnXXewherehHowevertherearedifferentversionsHCenHCenkeHCheHChΩΩ=Ω=Ω=Ω=Ω=BreadMeat(tofu)BreadRegressionwithrobuststandarderrorsforheteroskedasticityRegressyxx,robustOptionsotherthanrobust,arehcandhcreferringtotheversionsmentionedbyDavidsonandMcKinnonaboveRobustoptionsfortheVCVmatrixinStata•Regressyxx,hc•Regressyxx,hc•ThesecorrespondtotheDavidsonandMcKinnon’sversionsoftheheteroskedasticallyconsistentvcvoptionsandProblemswithAutoregressiveErrorsProblemsinestimationwithOLSWhenthereisfirstorderautocorrelationoftheresiduals,et=DetvtEffectontheVarianceet=DetvtSourcesofAutocorrelationLaggedendogenousvariablesMisspecificationofthemodelSimultaneity,feedback,orreciprocalrelationshipsSeasonalityortrendinthemodelPraisWinstonTransformationcont’d,()()ttttvvethereforeeρρ==()tttvYabxρ=Itfollowsthat(((tttYabxvρρρ=***tttYabxv=Autocorrelationoftheresiduals:praisneweyregressionTotestwhetherthevariableisautocorrelated•Tssettime•corrgramy•praisyxx,robust•neweyyxx,lag()t(time)Testingforautocorrelationofresidualsregressmnalsumprcpredictresid,residualcorrgramresidPraisWinstonRegressionforAR()errorsUsingtherobustoptionhereguaranteesthattheWhiteheteroskedasticityconsistentsandwichvariancecovarianceestimatorwillbeusedintheautoregressionprocedureNeweyWestRobustStandarderrors•AnautocorrelationcorrectionisaddedtothemeatortofuintheWhiteSandwichestimatorbyNeweyWest(')(')('),:::::()ttttttttnXXnXXnXXewherehHowevertherearedifferentversionsHCenHCenkeHCheHChΩΩ=Ω=Ω=Ω=Ω=CentralPartofNeweyWestSandwichestimator()ˆ'ˆ'''neweywestwhitemiiiiiilXXXXnleexxxxnkmwhereknumberofpredictorsltimelagmmaximumtimelag=Ω=Ω===NeweyWestRobustStandarderrorsNeweyWeststandarderrorsarerobusttoautocorrelationandheteroskedasticitywithtimeseriesregressionmodelsAssumeOLSregression•Weregressyonxxx•WeobtainthefollowingoutputNextweexaminetheresidualsResidualAssessmentThedatasetistosmalltodropcase,soIuserobustregressionRobustregressionalgorithm:rregAregressionisperformedandabsoluteresidualsarecomputedTheseresidualsarecomputedandscaled:||iiiryxb=iiiirusyxbs==Scalingtheresiduals(|()|)iiMswhereMmedrmedr==TheresidualsarescaledbythemedianabsolutevalueofthemedianresidualEssentialAlgorithm•Theestimatoroftheparameterbminimizesthesumofalessrapidlyincreasingfunctionoftheresiduals(SASInstitute,TheRobustregProcedure,draftcopy,p,forthcoming):()niiiirQbwhereryxbisestimatedbysρσσ===Essentialalgorithmcont’dIfthiswereOLS,theρwouldbeaquadraticfunctionIfwecanascertains,wecanbytakingthederivativeswithrespecttob,findafirstordersolutionto,,,'niijirxswherejpψψρ====CaseweightsaredevelopedfromweightfunctionsCaseweightsareformedbasedonthoseresidualsWeightfunctionsforthosecaseweightsarefirsttheHuberweightsandthentheTukeybisquareweights:AweightedregressionisrerunwiththecaseweightsIterativelyreweightedleastsquares()()xwxxψ=•Thecaseweightw(x)isdefinedas:ItisupdatedateachiterationuntilitconvergesonavalueandthechangefromiterationtoiterationdeclinesbelowacriterionWeightsfunctionsforreducingoutlierinfluencecisthetuningconstantusedindeterminingthecaseweightsFortheHuberweightsc=bydefaultWeightFunctionsTukeybiweight(bisquare)CisalsothebiweighttuningconstantCissetatforthebiweightTuningC

职业精品

(汽车)产品营销策划书范文.doc

HH牙膏营销方案策划书.doc

加班管理人力资源考勤管理系统方案.doc

物品采购管理制度-正式.doc

用户评论

0/200
    暂无评论
上传我的资料

精彩专题

相关资料换一换

资料评价:

/ 93
所需积分:5 立即下载

意见
反馈

返回
顶部