关闭

关闭

关闭

封号提示

内容

首页 Bilevel Optimization Approach.pdf

Bilevel Optimization Approach.pdf

Bilevel Optimization Approach.p…

vgg狂熙宸ym 2018-02-07 评分 0 浏览量 0 0 0 0 暂无简介 简介 举报

简介:本文档为《Bilevel Optimization Approachpdf》,可适用于自然科学领域,主题内容包含GautamKunapuliABilevelOptimizationApproachtoMachineLearningAThesisSubmitte符等。

GautamKunapuliABilevelOptimizationApproachtoMachineLearningAThesisSubmittedtotheGraduateFacultyofRensselaerPolytechnicInstituteinPartialFulfillmentoftheRequirementsfortheDegreeofDoctorofPhilosophyinMathematicsApprovedbytheExaminingCommittee:KristinPBennett,ThesisAdviserJosephGEcker,MemberJohnEMitchell,MemberJongShiPang,MemberBulentYener,MemberRensselaerPolytechnicInstituteTroyNYFebruaryAcknowledgementThisworkwouldnothavebeenpossiblewithouttheinestimableguidanceofKristinBennett,mymentorandadvisorIwouldliketothankher,firstandforemost,forherconstantsupportandencouragementIliketothankJongShiPang,withwhomIcollaboratedextensivelyontheseprojects,forhisguidanceandsupportIwouldalsoliketothankJohnMitchell,JosephEckerandBulentYener,theothermembersofmyCommitteefortheircontributionsIwouldalsoliketothankmyfriendsforeverythingtheywereagreatsourceofsupportduringthegoodandnotsogoodtimes,butalwaystheretotheveryendNeedlesstosay,theywerealsoasourceoftheoccasional,albeitwarranteddistractionInparticular,IwouldliketoacknowledgeGuilhermedeOliveira,AyalaCnaan,JuanLatorre,MaralErol,SelmaSabanovic,MattFrancisco,LeeDeville,ZackGalbreath,TaraGallaway,LaurelReillyRaska,JonCollis,ElizabethKusel,CharlesBeckman,DarynRamsden,NadiaSidarous,JoshAttenberg,JohnMorris,MarijanaandBorisPokorni,RachelandGaryDale,SiddharthDevarajanandJairamKalyanasundharamToallmyfriendsIsay:thankyouforallthegoodtimesItisanopensecretamonggraduatestudentsthattheonlypartofadissertationthatiseverofinterestistheacknowledgementsSo,ifyouarereadingthisandrealizethatIhavenotadequatelythankedyou,youshouldprobablycallmeuptoremonstratewithmeIfnothingelse,itwillgiveusachancetocatchupIwouldalsoliketoacknowledgetheencouragementofseveralmembersofmyfamilyhereandbackhomeIwouldespeciallyliketothankmyparentstheirloveandsupportmadeitallworthwhileAndfinally,Idedicatethisthesistomybrother,whohasalwaysbelievedinmeandhasbeenmygreatestwellwisherGautamKunapuliFebruaryAbstractAkeystepinmanystatisticallearningmethodsusedinmachinelearninginvolvessolvingaconvexoptimizationproblemcontainingoneormorehyperparametersthatmustbeselectedbytheusersWhilecrossvalidationisacommonlyemployedandwidelyacceptedmethodforselectingtheseparameters,itsimplementationbyagridsearchprocedureintheparameterspaceeffectivelylimitsthedesirablenumberofhyperparametersinamodel,duetothecombinatorialexplosionofgridpointsinhighdimensionsAnovelparadigmbasedonbileveloptimizationapproachisproposedandgivesrisetoaunifyingframeworkwithinwhichissuessuchasmodelselectioncanbeaddressedThemachinelearningproblemisformulatedasabilevelprogramndashamathematicalprogramthathasconstraintswhicharefunctionsofoptimalsolutionsofanothermathematicalprogramcalledtheinnerlevelprogramThebilevelprogramistransformedtoanequivalentmathematicalprogramwithequilibriumconstraints(MPEC)TwoalternativebileveloptimizationalgorithmsaredevelopedtooptimizetheMPECandprovideasystematicsearchofthehyperparametersInthefirstapproach,theequilibriumconstraintsoftheMPECarerelaxedtoformanonlinearprogramwithlinearobjectiveandnonconvexquadraticinequalityconstraints,whichisthensolvedusingageneralpurposenonlinearprogrammingsolverInthesecondapproach,theequilibriumconstraintsaretreatedaspenaltytermsintheobjective,andtheresultingnonconvexquadraticprogramwithlinearconstraintsissolvedusingasuccessivelinearizationalgorithmTheflexibilityofthebilevelapproachtodealwithmultiplehyperparameters,makesitpowerfulapproachtoproblemssuchasparameterandfeatureselection(modelselection)Inthisthesis,threeproblemsarestudied:modelselectionforsupportvector(SV)classification,modelselectionforSVregressionandmissingvalueimputationforSVregressionExtensivecomputationalresultsestablishthatbothalgorithmicapproachesfindsolutionsthatgeneralizeaswellorbetterthanconventionalapproachesandaremuchmorecomputationallyefficientContentsIntroductionBackground:SupportVectorMachinesTheStatisticalNatureofLearningEmpiricalRiskMinimizationStructuralRiskMinimizationSVMsforLinearClassificationLinearRegressionKernelMethodsBackground:BilevelOptimizationStackelbergGamesandBilevelProgrammingBilevelProgramsandMPECsStationarityConceptsforMPECsLPECsOptimizationMethodsforBilevelModelsTheNeedforNewMethodologiesANewParadigmOrganizationoftheThesisSVClassificationandInexactCrossValidationIntroductionModelFormulationTheinnerlevelproblemsTheouterleveloptimizationBilevelClassificationVariationsOuterlevelobjectiveEnhancedfeatureselectionInexactandDiscretizedCrossValidationInexactcrossvalidationGridsearchNumericalTestsExperimentaldesignDiscussionChapterConclusionsSVRegressionandSuccessiveLinearizationIntroductionABilevelSupportVectorRegressionModelBilevelProblemsasMPECsVIIIContentsBilevelOptimizationMethodsARelaxedNLPReformulationPenaltyReformulationSuccessiveLinearizationAlgorithmforModelSelectionEarlyStoppingGridSearchExperimentalDesignSyntheticDataRealworldQSARDataPostprocessingComputationalResults:SyntheticDataScalabilityAnalysisFoldsAnalysisComputationalResults:QSARDataChapterConclusionsMissingValueImputationviaBilevelCrossValidationIntroductionNotationandmodelBilevelOptimizationMethodsInexactImputationPenaltyFormulationBoundingtheFeasibleRegionSuccessiveLinearizationOtherImputationApproachesNumericalExperimentsExperimentalDesignComputationalResultsandDiscussionChapterConclusionsConclusionsandFurtherApplicationsKernelBilevelCrossValidationApplyingtheKernelTrickDesigningUnknownKernelsSolvingtheKernelMPECSemisupervisedLearningSemisupervisedRegressionSemisupervisedClassificationIncorporatingMultitaskLearningReferencesAPPENDICESCodeFragmentsofAMPLModelsAModelSelectionforSVClassificationAModelSelectionforSVRegressionAMissingvalueImputationforSVRegressionListofFiguresTheeffectofcomplexityongeneralizationDifferenterrorsinamachinelearningtaskThreepointsinaplaneshatteredbyalinearfunctionStructuralRiskMinimizationCanonicalHyperplanesandMarginsEffectofmarginoncomplexityTransformationfromDtoDOverfittingbyaGaussiankernelTheStackelberggameandBilevelProgramsTfoldcrossvalidationschematicComputationalResultsforBilevelClassification:FoldsAnalysisComputationalResultsforBilevelRegression:ScalabilityComputationalResultsforBilevelRegression:FoldsAnalysisConvergenceofSLAforbilevelimputationComputationalResultsforBilevelImputation:SyntheticDataComputationalResultsforBilevelImputation:AutoMPGComputationalResultsforBilevelImputation:IrisDataListofTablesTheClassificationDataSetsComputationalResultsforBilevelClassificationTheChemoinformatics(QSAR)datasetsComputationalResultsforBilevelRegression:SyntheticddataComputationalResultsforBilevelRegression:SyntheticddataComputationalResultsforBilevelRegression:SyntheticddataComputationalResultsforBilevelRegression:QSARdataIntroductionMachineLearning,isatrulymultidisciplinaryfieldanddrawsconceptsfromsuchdiversedisciplinesasartificialintelligence,computationalcomplexity,controltheory,cognitivescience,philosophy,neuroscienceandstatisticsMachinelearningdealswithautomaticallybuildingamodelofaprocessorsystemthroughanalysisofexistingpatternsordatasuchthatthenewmodelisabletopredicttheresponsesofthesystemonpatternsthathaveyettobeobservedThisabilityoflearnertocorrectlypredictoutputsforunseeninputsiscalledgeneralizationandallmachinelearningmethodsseektoimprovethisperformanceMachinelearningproblemsareknowntobeillposedingeneralbecausedatasetsizesarefiniteandsometimessmall,nonuniformlysampledandhighdimensionalInaddition,theremaybeproblemssuchasmeasurementerrors,noiseandmissingvaluesinthedataThissituationisfurtherexacerbatedbylimitedparameterselectiontechniquesandinaccurateestimatesofgeneralizationmdashtwomodelselectionstrategiesmostlearningalgorithmsinevitablyentailThelatesandearlyssawtheemergenceofanelegantandpowerfulmachinelearningparadigm,onethataddressedmanyoftheissuesabove,calledStructuralRiskMinimization(SRM)TheprinciplesofSRMgiverisetopowerfullearnerscalledSupportVectorMachines(SVMs)SVMsareuniversalfeedforwardnetworks,ormorespecifically,arelinearlearningmachinesthatareabletoperformdifferentmachinelearningtasksThefirstSVMswereintroducedforbinarypatternclassification,,Subsequentincreasedattentionfromthemachinelearningcommunityleadtoseveraldevelopments:thecapabilitytohandlenonlineardatasetsviatheldquokerneltrickrdquoandothermachinelearningtaskssuchasregression,,,clustering,noveltydetectionsemisupervisedlearningandSVMvariantssuchasnormandlinearSVMs,,andnuSVMsTheseprogresseswereaccompaniedbythedevelopmentoffastalgorithmssuchassequentialminimaloptimization(SMO)mdashinthelatesmdashormorerecently,interiorpointmethodsformassivedatasets,andforlinearSVMs:cuttingplanealgorithmsanddecompositionbasedapproachesTheseadvancesmeanthatthecurrentstateoftheartSVMsarecapableofefficientlyhandlingdatasetsofseveralthousandstomillionsofdatapointsSVMshavebeensuccessfullyappliedtoawidevarietyofapplicationssuchasmedicine,computationalbiology,finance,robotics,computervision,image,objectandhandwritingrecognitionandtextprocessing,tonamejustafew,andhaveemergedasoneofthepreeminentmachinelearningtechniqueofourtimesHowever,despitethissuccessandpopularity,severalopenquestions,suchasoptimalmodelselection,stillremainTheneedfornewmethodologiestoaddresstheseissuesformstheprimarymotivationforthisthesisSectionprovidesabriefbackgroundontheprinciplesofstructuralriskminimization,SVMsforclassificationandregressionandtheirextensionintononlineardomainsviakernelsSectionintroducestheconceptsofbileveloptimizationandsomecommonapproachestosolvingsuchproblemsAreaderfamiliarwiththeseconceptscouldskipthediscussioninSectionsandSectiondiscussesseveralopenquestionsandissuesthatmotivatethisthesisSectionintroducesanovelparadigmmdashIntroductionbilevelprogrammingformachinelearningmdashthatformstheframeworkforthevariousmodelsdiscussedinthethesis,thelayoutofwhichisdetailedinSectionBackground:SupportVectorMachinesConsiderthatweareinterestedinmodelingaregressivesystemoftheformy=f(x),wherethedatapoint,x,anditslabel,yaredrawnfromsomeunknowndistributionP(x,y)Ingeneral,thehypothesisfunction,f(x)maydependonthelabelsyandmaybeparameterizedbyalphaisinLambdaandisvariouslywrittenasf(x,y),f(xalpha)orf(x,yalpha)asappropriateTheStatisticalNatureofLearningThetargetfunction,f,whichbelongstosometargetspace,isdeterministic,andtheintrinsicsystemerror,,isarandomexpectationalerrorthatrepresentsourignoranceofthedependenceofyonxThus,E=andEy=Ef(x)=f(x)Thegoalistochooseafunction,f,inahypothesisspaceofcandidatefunctionsthatisasclosetofaspossiblewithrespecttosomeerrormeasureSupposethatwehave`realizationsoflabelleddata,D={(xi,yi)}`i=,thatconstitutethetrainingsampleUsingD,wetrainafunction,y=f(x)thatminimizesthesquarederrorThus,theexpectedpredictederrororriskisRf=ED(yminusy)=ED(yminusf(x))ItiseasytoshowthatRf=(f(x)minusEDf(x))ED(yminusEDy),wheretheexpectedpredictederrordecomposesintothreeterms(seeFigure):bullthenoisevariance,,whichisthenoisethatisinherentinthesystemThistermcannotbeminimizedbythelearnerasitisindependentofthelearningprocessItisalsoreferredtoastheintrinsicerrorbullthesquaredbias,(f(x)minusEDf(x)),measuresthedifferencebetweenthetruefunctionandtheaverageofthemeasuredvaluesatxandrepresentstheinabilityofthelearnertoaccuratelyapproximatethetruefunctionThisindicatesthatthechoiceofmodelwaspoorbecausethehypothesisspaceistoosmalltocontainthetruefunctionThebiasisalsocalledtheapproximationerrorbulltheestimationvariance,ED(yminusEDy),measurestheerrorinestimatingthetruelabelsandhowsensitivethemodelistorandomfluctuationsinthedatasetThisisalsocalledtheestimationerrorToreducetrainingerroronthegivendata,thelearnermustminimizetheestimationvarianceoverthetrainingsetToreducegeneralizationerroronthefuture,unseendata,thelearnermustminimizethebiasoverthetrainingsetUnfortunately,itisnotpossibletodecreaseonewithoutincreasingtheotherasthereisanaturaltradeoffbetweenthebiasandthevarianceThisleadsto,whatiscommonlyreferredtoas,thebiasvariancedilemma,wherethelearnermustminimizesomecombinationofvarianceandbiasToseethis,considerFigure,whereadatasetisfitbyalinearfunctionandsomehighdegreepolynomialfunctionIfwerestrictedthechoiceoftargetfunctionstojustlinearfunctions,wewouldobtainlinearfitsforeverydatasetandwehave,consequently,introducedabiasintothelearningprocessHowever,ifweexpandedthesetoftargetfunctionstoincludehigherdegreepolynomials,wewouldobtainhighlynonlinearfitsthataresusceptibletosmallfluctuationsinthedataandconsequently,havehighvarianceRelatedtothisnotionofcomplexityisthecapacityofaclassoffunctionsthatconstitutethehypothesisspaceAhypothesisspacewithhighercapacityyieldsoverlycomplexmodels,whichoverfitthedata,whereassmallercapacitygivesoverlysimplemodelswhichunderfitthedataBackground:SupportVectorMachinesFigTheeffectofcomplexityongeneralizationFigDifferenterrorsthatariseduringthemodelingofatypicalMachineLearningtaskEmpiricalRiskMinimizationThissectionandthenextpresentabriefintroductiontoempiricalandstructuralriskminimizationsincemostofthemachinelearningmethodsconsideredhereafterarebasedontheseprinciplesAmoredetaileddiscussionofthistheorymaybefoundinWeassumethatthetrainingdata,xisinRn,aredrawnidenticallyandindependentlydistributed(iid)fromadistribution,P(x)Thecorrespondinglabels,y,areassumedtobedrawnfromaconditionaldistribution,P(y|x)Themachinelearningaimistofindafunction,f(x,alpha),parameterizedbyalphaisinLambda,thatbestapproximates(predicts)yThisisachievedbyminimizingthelossbetweenthegivenandpredictedlabels,ie,minimizingtheexpectedriskfunctional,Ralpha=intL(y,f(x,alpha))dP(x,y),()withF(x,y)=F(x)F(y|x)Thelossfunction,L(y,f(x,alpha))isanerrormeasurebetweentheexpectedandpredictedlabels(typicallythenumberofmisclassificationsforclassificationtasksandmeanaveragedeviationormeansquarederrorforregressiontasksmdashtherearemanyothers)However,computingtheexpectedriskisnoteasysincethedistribution,F(x,y),istypicallyunknownInstead,usinginductiveprinciples,wedefinetheempiricalrisk,whichisbasedonlyonthefinitedataset,X,thatwehavefortrainingthelearner,Rempalpha=`sumi=L(yi,f(xi,alpha))()and`isthenumberofdatapointsOnthesurface,ERMminimizestheempiricalrisk,usingitasanestimateoftheriskThelawoflargenumbers,whichformsthetheoreticalbasisfortheapplicationofseveralestimationapproaches,ensuresthattheempiricalriskRempconvergestotheexpectedriskRasthenumberofdatapointsapproachesinfinity:lim`rarrinfinRempalpha=Ra

用户评论(0)

0/200

精彩专题

上传我的资料

每篇奖励 +1积分

资料评分:

/103
1下载券 下载 加入VIP, 送下载券

意见
反馈

立即扫码关注

爱问共享资料微信公众号

返回
顶部

举报
资料