关闭

关闭

关闭

封号提示

内容

首页 李宏毅—1天搞懂深度学习

李宏毅—1天搞懂深度学习.pdf

李宏毅—1天搞懂深度学习

春玲的后花园
2017-11-14 0人阅读 0 0 0 暂无简介 举报

简介:本文档为《李宏毅—1天搞懂深度学习pdf》,可适用于人文社科领域

DeepLearningTutorial李宏毅HungyiLeeDeeplearningattractslotsofattentionbullIbelieveyouhaveseenlotsofexcitingresultsbeforeThistalkfocusesonthebasictechniquesDeeplearningtrendsatGoogleSource:SIGMODJeffDeanOutlineLectureIV:NextWaveLectureIII:VariantsofNeuralNetworkLectureII:TipsforTrainingDeepNeuralNetworkLectureI:IntroductionofDeepLearningLectureI:IntroductionofDeepLearningOutlineofLectureIIntroductionofDeepLearningWhyDeepldquoHelloWorldrdquoforDeepLearningLetrsquosstartwithgeneralmachinelearningMachineLearningasympLookingforaFunctionbullSpeechRecognitionbullImageRecognitionbullPlayingGobullDialogueSystemffffldquoCatrdquoldquoHowareyourdquoldquordquoldquoHellordquoldquoHirdquo(whattheusersaid)(systemresponse)(nextmove)FrameworkAsetoffunction,fffldquocatrdquofldquodogrdquofldquomoneyrdquofldquosnakerdquoModelfldquocatrdquoImageRecognition:FrameworkAsetoffunction,fffldquocatrdquoImageRecognition:ModelTrainingDataGoodnessoffunctionfBetter!ldquomonkeyrdquoldquocatrdquoldquodogrdquofunctioninput:functionoutput:SupervisedLearningFrameworkAsetoffunction,fffldquocatrdquoImageRecognition:ModelTrainingDataGoodnessoffunctionfldquomonkeyrdquoldquocatrdquoldquodogrdquo*fPicktheldquoBestrdquoFunctionUsingfldquocatrdquoTrainingTestingStepStepStepStep:defineasetoffunctionStep:goodnessoffunctionStep:pickthebestfunctionThreeStepsforDeepLearningDeepLearningissosimplehelliphellipStep:defineasetoffunctionStep:goodnessoffunctionStep:pickthebestfunctionThreeStepsforDeepLearningDeepLearningissosimplehelliphellipNeuralNetworkHumanBrainsbwawawazKKkkNeuralNetworkzwkwKwhellipakaKabzbiasaweightsNeuronhelliphelliphellipAsimplefunctionActivationfunctionNeuralNetworkzbiasActivationfunctionweightsNeuronzzzezSigmoidFunctionNeuralNetworkzzzzDifferentconnectionsleadstodifferentnetworkstructureWeightsandbiasesarenetworkparameters𝜃EachneuronscanhavedifferentvaluesofweightsandbiasesFullyConnectFeedforwardNetworkzzzezSigmoidFunctionFullyConnectFeedforwardNetworkFullyConnectFeedforwardNetwork𝑓=Givenparameters𝜃,defineafunction𝑓minus=ThisisafunctionInputvector,outputvectorGivennetworkstructure,defineafunctionsetOutputLayerHiddenLayersInputLayerFullyConnectFeedforwardNetworkInputOutputxxLayerhelliphellipNxhelliphellipLayerhelliphellipLayerLhelliphelliphelliphelliphelliphelliphelliphelliphelliphellipyyyMDeepmeansmanyhiddenlayersneuronOutputLayer(Option)bullSoftmaxlayerastheoutputlayerOrdinaryLayerzyzyzyzzzIngeneral,theoutputofnetworkcanbeanyvalueMaynotbeeasytointerpretOutputLayer(Option)bullSoftmaxlayerastheoutputlayerzzzSoftmaxLayereeezezezejzzjeeyjzjeasympProbability:𝑦𝑖𝑖𝑦𝑖=jzzjeeyjzzjeeyExampleApplicationInputOutputx=xxxhelliphellipInkrarrNoinkrarrhelliphellipyyyEachdimensionrepresentstheconfidenceofadigitisisishelliphellipTheimageisldquordquoExampleApplicationbullHandwritingDigitRecognitionMachineldquordquoxxxhelliphelliphelliphellipyyyisisishelliphellipWhatisneededisafunctionhelliphellipInput:dimvectoroutput:dimvectorNeuralNetworkOutputLayerHiddenLayersInputLayerExampleApplicationInputOutputxxLayerhelliphellipNxhelliphellipLayerhelliphellipLayerLhelliphelliphelliphelliphelliphelliphelliphellipldquordquohelliphellipyyyisisishelliphellipAfunctionsetcontainingthecandidatesforHandwritingDigitRecognitionYouneedtodecidethenetworkstructuretoletagoodfunctioninyourfunctionsetFAQbullQ:HowmanylayersHowmanyneuronsforeachlayerbullQ:CanthestructurebeautomaticallydeterminedTrialandErrorIntuitionStep:defineasetoffunctionStep:goodnessoffunctionStep:pickthebestfunctionThreeStepsforDeepLearningDeepLearningissosimplehelliphellipNeuralNetworkTrainingDatabullPreparingtrainingdata:imagesandtheirlabelsThelearningtargetisdefinedonthetrainingdataldquordquoldquordquoldquordquoldquordquoldquordquoldquordquoldquordquoldquordquoLearningTargetx=xxhelliphellipxhelliphelliphelliphelliphelliphelliphelliphellipInkrarrNoinkrarrhelliphellipyyyyhasthemaximumvalueThelearningtargetishelliphellipInput:yhasthemaximumvalueInput:isisisSoftmaxLossxxhelliphellipNxhelliphelliphelliphelliphelliphelliphelliphelliphelliphellipyyyLoss𝑙ldquordquohelliphelliphelliphellipLosscanbethedistancebetweenthenetworkoutputandtargettargetAscloseaspossibleAgoodfunctionshouldmakethelossofallexamplesassmallaspossibleGivenasetofparametersTotalLossxxxRNNNNNNhelliphelliphelliphellipyyyR𝑦𝑦𝑦𝑅𝑙helliphelliphelliphellipxNNy𝑦Foralltrainingdatahellip𝐿=𝑟=𝑅𝑙𝑟Findthenetworkparameters𝜽lowastthatminimizetotallossLTotalLoss:𝑙𝑙𝑙𝑅AssmallaspossibleFindafunctioninfunctionsetthatminimizestotallossLStep:defineasetoffunctionStep:goodnessoffunctionStep:pickthebestfunctionThreeStepsforDeepLearningDeepLearningissosimplehelliphellipNeuralNetworkHowtopickthebestfunctionFindnetworkparameters𝜽lowastthatminimizetotallossLNetworkparameters𝜃=𝑤,𝑤,𝑤,⋯,𝑏,𝑏,𝑏,⋯EnumerateallpossiblevaluesLayerlhelliphellipLayerlhelliphellipEgspeechrecognition:layersandneuronseachlayerneuronsneuronsweightsMillionsofparametersGradientDescentTotalLoss𝐿Random,RBMpretrainUsuallygoodenoughNetworkparameters𝜃=𝑤,𝑤,⋯,𝑏,𝑏,⋯wPickaninitialvalueforwFindnetworkparameters𝜽lowastthatminimizetotallossLGradientDescentTotalLoss𝐿Networkparameters𝜃=𝑤,𝑤,⋯,𝑏,𝑏,⋯wPickaninitialvalueforwCompute𝜕𝐿𝜕𝑤PositiveNegativeDecreasewIncreasewhttp:chicopixnetnetalbumphotoFindnetworkparameters𝜽lowastthatminimizetotallossLGradientDescentTotalLoss𝐿Networkparameters𝜃=𝑤,𝑤,⋯,𝑏,𝑏,⋯wPickaninitialvalueforwCompute𝜕𝐿𝜕𝑤minus𝜂𝜕𝐿𝜕𝑤etaiscalledldquolearningraterdquo𝑤larr𝑤minus𝜂𝜕𝐿𝜕𝑤RepeatFindnetworkparameters𝜽lowastthatminimizetotallossLGradientDescentTotalLoss𝐿Networkparameters𝜃=𝑤,𝑤,⋯,𝑏,𝑏,⋯wPickaninitialvalueforwCompute𝜕𝐿𝜕𝑤𝑤larr𝑤minus𝜂𝜕𝐿𝜕𝑤RepeatUntil𝜕𝐿𝜕𝑤isapproximatelysmall(whenupdateislittle)Findnetworkparameters𝜽lowastthatminimizetotallossLGradientDescent𝑤Compute𝜕𝐿𝜕𝑤minus𝜇𝜕𝐿𝜕𝑤𝑤Compute𝜕𝐿𝜕𝑤minus𝜇𝜕𝐿𝜕𝑤𝑏Compute𝜕𝐿𝜕𝑏minus𝜇𝜕𝐿𝜕𝑏helliphelliphelliphellip𝜃𝜕𝐿𝜕𝑤𝜕𝐿𝜕𝑤⋮𝜕𝐿𝜕𝑏⋮𝛻𝐿=gradientGradientDescent𝑤Compute𝜕𝐿𝜕𝑤minus𝜇𝜕𝐿𝜕𝑤minus𝜇𝜕𝐿𝜕𝑤Compute𝜕𝐿𝜕𝑤𝑤Compute𝜕𝐿𝜕𝑤minus𝜇𝜕𝐿𝜕𝑤minus𝜇𝜕𝐿𝜕𝑤Compute𝜕𝐿𝜕𝑤𝑏Compute𝜕𝐿𝜕𝑏minus𝜇𝜕𝐿𝜕𝑏minus𝜇𝜕𝐿𝜕𝑏Compute𝜕𝐿𝜕𝑏helliphelliphelliphelliphelliphelliphelliphelliphelliphellip𝜃𝑤𝑤GradientDescentColor:ValueofTotalLossLRandomlypickastartingpoint𝑤𝑤GradientDescentHopfully,wewouldreachaminimahellipCompute𝜕𝐿𝜕𝑤,𝜕𝐿𝜕𝑤(minus𝜂𝜕𝐿𝜕𝑤,minus𝜂𝜕𝐿𝜕𝑤)Color:ValueofTotalLossLGradientDescentDifficultybullGradientdescentneverguaranteeglobalminima𝐿𝑤𝑤DifferentinitialpointReachdifferentminima,sodifferentresultsTherearesometipstohelpyouavoidlocalminima,noguaranteeGradientDescent𝑤𝑤YouareplayingAgeofEmpireshellipCompute𝜕𝐿𝜕𝑤,𝜕𝐿𝜕𝑤(minus𝜂𝜕𝐿𝜕𝑤,minus𝜂𝜕𝐿𝜕𝑤)YoucannotseethewholemapGradientDescentThisistheldquolearningrdquoofmachinesindeeplearninghelliphellipEvenalphagousingthisapproachIhopeyouarenottoodisappointed:pPeopleimagehelliphellipActuallyhellipBackpropagationbullBackpropagation:anefficientwaytocompute𝜕𝐿𝜕𝑤bullRef:http:speecheentuedutw~tlkagkcoursesMLDSLectureDNNbackpropecmmpindexhtmlDonrsquotworryabout𝜕𝐿𝜕𝑤,thetoolkitswillhandleit台大周伯威同學開發Step:defineasetoffunctionStep:goodnessoffunctionStep:pickthebestfunctionConcludingRemarksDeepLearningissosimplehelliphellipOutlineofLectureIIntroductionofDeepLearningWhyDeepldquoHelloWorldrdquoforDeepLearningLayerXSizeWordErrorRate()LayerXSizeWordErrorRate()XkXkXkXkXkXXkXXkDeeperisBetterSeide,Frank,GangLi,andDongYuConversationalSpeechTranscriptionUsingContextDependentDeepNeuralNetworksInterspeechNotsurprised,moreparameters,betterperformanceUniversalityTheoremReferenceforthereason:http:neuralnetworksanddeeplearningcomchaphtmlAnycontinuousfunctionfM:RRfNCanberealizedbyanetworkwithonehiddenlayer(givenenoughhiddenneurons)WhyldquoDeeprdquoneuralnetworknotldquoFatrdquoneuralnetworkFatShortvsThinTallxxhelliphellipNxDeepxxhelliphellipNxhelliphellipShallowWhichoneisbetterThesamenumberofparametersFatShortvsThinTallSeide,Frank,GangLi,andDongYuConversationalSpeechTranscriptionUsingContextDependentDeepNeuralNetworksInterspeechLayerXSizeWordErrorRate()LayerXSizeWordErrorRate()XkXkXkXkXkXXkXXkWhyAnalogybullLogiccircuitsconsistsofgatesbullAtwolayersoflogicgatescanrepresentanyBooleanfunctionbullUsingmultiplelayersoflogicgatestobuildsomefunctionsaremuchsimplerbullNeuralnetworkconsistsofneuronsbullAhiddenlayernetworkcanrepresentanycontinuousfunctionbullUsingmultiplelayersofneuronstorepresentsomefunctionsaremuchsimplerThispageisforEEbackgroundlessgatesneededLogiccircuitsNeuralnetworklessparameterslessdata長髮男ModularizationbullDeeprarrModularizationGirlswithlonghairBoyswithshorthairBoyswithlonghairImageClassifierClassifierClassifier長髮女長髮女長髮女長髮女Girlswithshorthair短髮女短髮男短髮男短髮男短髮男短髮女短髮女短髮女ClassifierLittleexamplesweakModularizationbullDeeprarrModularizationImageLongorshortBoyorGirlClassifiersfortheattributes長髮男長髮女長髮女長髮女長髮女短髮女短髮男短髮男短髮男短髮男短髮女短髮女短髮女vs長髮男長髮女長髮女長髮女長髮女短髮女短髮男短髮男短髮男短髮男短髮女短髮女短髮女vsEachbasicclassifiercanhavesufficienttrainingexamplesBasicClassifierModularizationbullDeeprarrModularizationImageLongorshortBoyorGirlSharingbythefollowingclassifiersasmodulecanbetrainedbylittledataGirlswithlonghairBoyswithshorthairBoyswithlonghairClassifierClassifierClassifierGirlswithshorthairClassifierLittledatafineBasicClassifierModularizationbullDeeprarrModularizationxxhelliphellipNxhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellipThemostbasicclassifiersUsestlayerasmoduletobuildclassifiersUsendlayerasmodulehelliphellipThemodularizationisautomaticallylearnedfromdatararrLesstrainingdataModularizationbullDeeprarrModularizationxxhelliphellipNxhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellipThemostbasicclassifiersUsestlayerasmoduletobuildclassifiersUsendlayerasmodulehelliphellipReference:Zeiler,MD,Fergus,R()VisualizingandunderstandingconvolutionalnetworksInComputerVisionndashECCV(pp)OutlineofLectureIIntroductionofDeepLearningWhyDeepldquoHelloWorldrdquoforDeepLearningKeraskerashttp:speecheentuedutw~tlkagkcoursesMLDSLectureTheanoDNNecmmpindexhtmlhttp:speecheentuedutw~tlkagkcoursesMLDSLectureRNNtraining(v)ecmmpindexhtmlVeryflexibleNeedsomeefforttolearnEasytolearnanduse(stillhavesomeflexibility)YoucanmodifyitifyoucanwriteTensorFloworTheanoInterfaceofTensorFloworTheanoorIfyouwanttolearntheano:KerasbullFranccediloisCholletistheauthorofKerasbullHecurrentlyworksforGoogleasadeeplearningengineerandresearcherbullKerasmeanshorninGreekbullDocumentation:http:kerasiobullExample:https:githubcomfcholletkerastreemasterexamples使用Keras心得感謝沈昇勳同學提供圖檔ExampleApplicationbullHandwritingDigitRecognitionMachineldquordquoldquoHelloworldrdquofordeeplearningMNISTData:http:yannlecuncomexdbmnistKerasprovidesdatasetsloadingfunction:http:kerasiodatasetsxKerasyyyhelliphelliphelliphelliphelliphelliphelliphellipSoftmaxxKerasKerasStep:ConfigurationStep:Findtheoptimalnetworkparameters𝑤larr𝑤minus𝜂𝜕𝐿𝜕𝑤Trainingdata(Images)Labels(digits)NextlectureKerasStep:Findtheoptimalnetworkparametershttps:wwwtensorfloworgversionsrtutorialsmnistbeginnersindexhtmlNumberoftrainingexamplesnumpyarrayx=numpyarrayNumberoftrainingexampleshelliphelliphelliphellipKerashttp:kerasiogettingstartedfaq#howcanisaveakerasmodelHowtousetheneuralnetwork(testing):case:case:SaveandloadmodelsKerasbullUsingGPUtospeedtrainingbullWaybullTHEANOFLAGS=device=gpupythonYourCodepybullWay(inyourcode)bullimportosbullosenvironTHEANOFLAGS=device=gpuLiveDemoLectureII:TipsforTrainingDNNNeuralNetworkGoodResultsonTestingDataGoodResultsonTrainingDataStep:pickthebestfunctionStep:goodnessoffunctionStep:defineasetoffunctionYESYESNONOOverfitting!RecipeofDeepLearningDonotalwaysblameOverfittingTestingDataOverfittingTrainingDataNotwelltrainedNeuralNetworkGoodResultsonTestingDataGoodResultsonTrainingDataYESYESRecipeofDeepLearningDifferentapproachesfordifferentproblemsegdropoorgoodresultsontestingdataGoodResultsonTestingDataGoodResultsonTrainingDataYESYESRecipeofDeepLearningChoosingproperlossMinibatchNewactivationfunctionAdaptiveLearningRateMomentumChoosingProperLossxxhelliphellipxhelliphelliphelliphelliphelliphelliphelliphelliphelliphellipyyylossldquordquohelliphelliphelliphelliptargetSoftmax𝑖=𝑦𝑖minus𝑦𝑖SquareErrorCrossEntropyminus𝑖=𝑦𝑖𝑙𝑛𝑦𝑖Whichoneisbetter𝑦𝑦𝑦helliphellip==LetrsquostryitSquareErrorCrossEntropyLetrsquostryitAccuracySquareErrorCrossEntropyTrainingTesting:CrossEntropySquareErrorChoosingProperLossTotalLosswwCrossEntropySquareErrorWhenusingsoftmaxoutputlayer,choosecrossentropyhttp:jmlrorgproceedingspapersvglorotaglorotapdfGoodResultsonTestingDataGoodResultsonTrainingDataYESYESRecipeofDeepLearningChoosingproperlossMinibatchNewactivationfunctionAdaptiveLearningRateMomentumMinibatchxNNhelliphellipy𝑦𝑙xNNy𝑦𝑙xNNhelliphellipy𝑦𝑙xNNy𝑦𝑙PickthestbatchRandomlyinitializenetworkparametersPickthendbatchMinibatchMinibatch𝐿prime=𝑙𝑙⋯𝐿primeprime=𝑙𝑙⋯UpdateparametersonceUpdateparametersonceUntilallminibatcheshavebeenpickedhelliponeepochRepeattheaboveprocessWedonotreallyminimizetotalloss!MinibatchxNNhelliphellipy𝑦𝑙xNNy𝑦𝑙MinibatchPickthestbatchPickthendbatch𝐿prime=𝑙𝑙⋯𝐿primeprime=𝑙𝑙⋯UpdateparametersonceUpdateparametersonceUntilallminibatcheshavebeenpickedhelliponeepochexamplesinaminibatchRepeattimesMinibatchxNNhelliphellipy𝑦𝑙xNNy𝑦𝑙xNNhelliphellipy𝑦𝑙xNNy𝑦𝑙PickthestbatchRandomlyinitializenetworkparametersPickthendbatchMinibatchMinibatch𝐿prime=𝑙𝑙⋯𝐿primeprime=𝑙𝑙⋯UpdateparametersonceUpdateparametersoncehellipLisdifferenteachtimewhenweupdateparameters!Wedonotreallyminimizetotalloss!MinibatchOriginalGradientDescentWithMinibatchUnstable!!!ThecolorsrepresentthetotallossMinibatchisFasterepochSeeallexamplesSeeonlyonebatchUpdateafterseeingallexamplesIftherearebatches,updatetimesinoneepochOriginalGradientDescentWithMinibatchNotalwaystruewithparallelcomputingCanhavethesamespeed(notsuperlargedataset)Minibatchhasbetterperformance!MinibatchisBetter!AccuracyMinibatchNobatchTesting:EpochAccuracyMinibatchNobatchTrainingxNNhelliphellipy𝑦𝑙xNNy𝑦𝑙xNNhelliphellipy𝑦𝑙xNNy𝑦𝑙MinibatchMinibatchShufflethetrainingexamplesforeachepochEpochxNNhelliphellipy𝑦𝑙xNNy𝑦𝑙xNNhelliphellipy𝑦𝑙xNNy𝑦𝑙MinibatchMinibatchEpochDonrsquotworryThisisthedefaultofKerasGoodResultsonTestingDataGoodResultsonTrainingDataYESYESRecipeofDeepLearningChoosingproperlossMinibatchNewactivationfunctionAdaptiveLearningRateMomentumHardtogetthepowerofDeephellipDeeperusuallydoesnotimplybetterResultsonTrainingDataLetrsquostryitAccuracylayerslayersTesting:layerslayersTrainingVanishingGradientProblemLargergradientsAlmostrandomAlreadyconvergebasedonrandom!LearnveryslowLearnveryfastxxhelliphellipNxhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellipyyyMSmallergradientsVanishingGradientProblemxxhelliphellipNxhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip𝑦𝑦𝑦𝑀helliphellip𝑦𝑦𝑦𝑀𝑙Intuitivewaytocomputethederivativeshellip𝜕𝑙𝜕𝑤=∆𝑤∆𝑙∆𝑙∆𝑤SmallergradientsLargeinputSmalloutputHardtogetthepowerofDeephellipIn,peopleusedRBMpretrainingIn,peopleuseReLUReLUbullRectifiedLinearUnit(ReLU)Reason:FasttocomputeBiologicalreasonInfinitesigmoidwithdifferentbiasesVanishinggradientproblem𝑧𝑎𝑎=𝑧𝑎=𝜎𝑧XavierGlorot,AISTATSrsquoAndrewLMaas,ICMLrsquoKaimingHe,arXivrsquoReLUxxyy𝑧𝑎𝑎=𝑧𝑎=ReLUxxyyAThinnerlinearnetworkDonothavesmallergradients𝑧𝑎𝑎=𝑧𝑎=LetrsquostryitLetrsquostryitbulllayerslayersAccuracySigmoidReLUTrainingTesting:ReLUSigmoidReLUvariant𝑧𝑎𝑎=𝑧𝑎=𝑧𝐿𝑒𝑎𝑘𝑦𝑅𝑒𝐿𝑈𝑧𝑎𝑎=𝑧𝑎=𝛼𝑧𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑟𝑖𝑐𝑅𝑒𝐿𝑈alphaalsolearnedbygradientdescentMaxoutbullLearnableactivationfunctionIanJGoodfellow,ICMLrsquoMaxxxInputMaxminusMaxMaxReLUisaspecialcasesofMaxoutYoucanhavemorethanelementsinagroupneuronMaxoutbullLearnableactivationfunctionIanJGoodfellow,ICMLrsquobullActivationfunctioninmaxoutnetworkcanbeanypiecewiselinearconvexfunctionbullHowmanypiecesdependingonhowmanyelementsinagroupReLUisaspecialcasesofMaxoutelementsinagroupelementsinagroupGoodResultsonTestingDataGoodResultsonTrainingDataYESYESRecipeofDeepLearningChoosingproperlossMinibatchNewactivationfunctionAdaptiveLearningRateMomentum𝑤𝑤LearningRatesIflearningrateistoolargeTotallossmaynotdecreaseaftereachupdateSetthelearningrateetacarefully𝑤𝑤LearningRatesIflearningrateistoolargeSetthelearningrateetacarefullyIflearningrateistoosmallTrainingwouldbetooslowTotallossmaynotdecreaseaftereachupdateLearningRatesbullPopularSimpleIdea:ReducethelearningratebysomefactoreveryfewepochsbullAtthebeginning,wearefarfromthedestination,soweuselargerlearningratebullAfterseveralepochs,weareclosetothedestination,sowereducethelearningratebullEgtdecay:𝜂𝑡=𝜂𝑡bullLearningratecannotbeonesizefitsallbullGivingdifferentparametersdifferentlearningratesAdagradParameterdependentlearningratewlarr𝑤minus𝑤𝜕𝐿ߟ∕𝜕𝑤constant𝑔𝑖is𝜕𝐿∕𝜕𝑤obtainedattheithupdate𝑤ߟ=𝜂𝑖=𝑡𝑔𝑖Summationofthesquareofthepreviousderivatives𝑤larr𝑤minus𝜂𝜕𝐿∕𝜕𝑤Original:Adagrad:AdagradgghelliphelliphelliphellipgghelliphelliphelliphellipObservation:LearningrateissmallerandsmallerforallparametersSmallerderivatives,largerlearningrate,andviceversa𝜂𝜂𝜂𝜂=𝜂=𝜂=𝜂=𝜂Why𝑤ߟ=𝜂𝑖=𝑡𝑔𝑖Learningrate:Learningrate:𝑤𝑤SmallerDerivativesLargerLearningRateSmallerderivatives,largerlearningrate,andviceversaWhySmallerLearningRateLargerderivativesNotthewholestoryhelliphellipbullAdagradJohnDuchi,JMLRrsquobullRMSpropbullhttps:wwwyoutubecomwatchv=OsxAchxZUbullAdadeltaMatthewDZeiler,arXivrsquobullldquoNomorepeskylearningratesrdquoTomSchaul,arXivrsquobullAdaSecantCaglarGulcehre,

用户评价(0)

关闭

新课改视野下建构高中语文教学实验成果报告(32KB)

抱歉,积分不足下载失败,请稍后再试!

提示

试读已结束,如需要继续阅读或者下载,敬请购买!

评分:

/301

意见
反馈

立即扫码关注

爱问共享资料微信公众号

返回
顶部

举报
资料