关闭

关闭

关闭

封号提示

内容

首页 李宏毅—1天搞懂深度学习.pdf

李宏毅—1天搞懂深度学习.pdf

李宏毅—1天搞懂深度学习.pdf

上传者: 春玲的后花园 2017-11-14 评分 0 0 0 0 0 0 暂无简介 简介 举报

简介:本文档为《李宏毅—1天搞懂深度学习pdf》,可适用于人文社科领域,主题内容包含DeepLearningTutorial李宏毅HungyiLeeDeeplearningattractslotsofattentionbullIbe符等。

DeepLearningTutorial李宏毅HungyiLeeDeeplearningattractslotsofattentionbullIbelieveyouhaveseenlotsofexcitingresultsbeforeThistalkfocusesonthebasictechniquesDeeplearningtrendsatGoogleSource:SIGMODJeffDeanOutlineLectureIV:NextWaveLectureIII:VariantsofNeuralNetworkLectureII:TipsforTrainingDeepNeuralNetworkLectureI:IntroductionofDeepLearningLectureI:IntroductionofDeepLearningOutlineofLectureIIntroductionofDeepLearningWhyDeepldquoHelloWorldrdquoforDeepLearningLetrsquosstartwithgeneralmachinelearningMachineLearningasympLookingforaFunctionbullSpeechRecognitionbullImageRecognitionbullPlayingGobullDialogueSystemffffldquoCatrdquoldquoHowareyourdquoldquordquoldquoHellordquoldquoHirdquo(whattheusersaid)(systemresponse)(nextmove)FrameworkAsetoffunction,fffldquocatrdquofldquodogrdquofldquomoneyrdquofldquosnakerdquoModelfldquocatrdquoImageRecognition:FrameworkAsetoffunction,fffldquocatrdquoImageRecognition:ModelTrainingDataGoodnessoffunctionfBetter!ldquomonkeyrdquoldquocatrdquoldquodogrdquofunctioninput:functionoutput:SupervisedLearningFrameworkAsetoffunction,fffldquocatrdquoImageRecognition:ModelTrainingDataGoodnessoffunctionfldquomonkeyrdquoldquocatrdquoldquodogrdquo*fPicktheldquoBestrdquoFunctionUsingfldquocatrdquoTrainingTestingStepStepStepStep:defineasetoffunctionStep:goodnessoffunctionStep:pickthebestfunctionThreeStepsforDeepLearningDeepLearningissosimplehelliphellipStep:defineasetoffunctionStep:goodnessoffunctionStep:pickthebestfunctionThreeStepsforDeepLearningDeepLearningissosimplehelliphellipNeuralNetworkHumanBrainsbwawawazKKkkNeuralNetworkzwkwKwhellipakaKabzbiasaweightsNeuronhelliphelliphellipAsimplefunctionActivationfunctionNeuralNetworkzbiasActivationfunctionweightsNeuronzzzezSigmoidFunctionNeuralNetworkzzzzDifferentconnectionsleadstodifferentnetworkstructureWeightsandbiasesarenetworkparametersEachneuronscanhavedifferentvaluesofweightsandbiasesFullyConnectFeedforwardNetworkzzzezSigmoidFunctionFullyConnectFeedforwardNetworkFullyConnectFeedforwardNetwork=Givenparameters,defineafunctionminus=ThisisafunctionInputvector,outputvectorGivennetworkstructure,defineafunctionsetOutputLayerHiddenLayersInputLayerFullyConnectFeedforwardNetworkInputOutputxxLayerhelliphellipNxhelliphellipLayerhelliphellipLayerLhelliphelliphelliphelliphelliphelliphelliphelliphelliphellipyyyMDeepmeansmanyhiddenlayersneuronOutputLayer(Option)bullSoftmaxlayerastheoutputlayerOrdinaryLayerzyzyzyzzzIngeneral,theoutputofnetworkcanbeanyvalueMaynotbeeasytointerpretOutputLayer(Option)bullSoftmaxlayerastheoutputlayerzzzSoftmaxLayereeezezezejzzjeeyjzjeasympProbability:=jzzjeeyjzzjeeyExampleApplicationInputOutputx=xxxhelliphellipInkrarrNoinkrarrhelliphellipyyyEachdimensionrepresentstheconfidenceofadigitisisishelliphellipTheimageisldquordquoExampleApplicationbullHandwritingDigitRecognitionMachineldquordquoxxxhelliphelliphelliphellipyyyisisishelliphellipWhatisneededisafunctionhelliphellipInput:dimvectoroutput:dimvectorNeuralNetworkOutputLayerHiddenLayersInputLayerExampleApplicationInputOutputxxLayerhelliphellipNxhelliphellipLayerhelliphellipLayerLhelliphelliphelliphelliphelliphelliphelliphellipldquordquohelliphellipyyyisisishelliphellipAfunctionsetcontainingthecandidatesforHandwritingDigitRecognitionYouneedtodecidethenetworkstructuretoletagoodfunctioninyourfunctionsetFAQbullQ:HowmanylayersHowmanyneuronsforeachlayerbullQ:CanthestructurebeautomaticallydeterminedTrialandErrorIntuitionStep:defineasetoffunctionStep:goodnessoffunctionStep:pickthebestfunctionThreeStepsforDeepLearningDeepLearningissosimplehelliphellipNeuralNetworkTrainingDatabullPreparingtrainingdata:imagesandtheirlabelsThelearningtargetisdefinedonthetrainingdataldquordquoldquordquoldquordquoldquordquoldquordquoldquordquoldquordquoldquordquoLearningTargetx=xxhelliphellipxhelliphelliphelliphelliphelliphelliphelliphellipInkrarrNoinkrarrhelliphellipyyyyhasthemaximumvalueThelearningtargetishelliphellipInput:yhasthemaximumvalueInput:isisisSoftmaxLossxxhelliphellipNxhelliphelliphelliphelliphelliphelliphelliphelliphelliphellipyyyLossldquordquohelliphelliphelliphellipLosscanbethedistancebetweenthenetworkoutputandtargettargetAscloseaspossibleAgoodfunctionshouldmakethelossofallexamplesassmallaspossibleGivenasetofparametersTotalLossxxxRNNNNNNhelliphelliphelliphellipyyyRhelliphelliphelliphellipxNNyForalltrainingdatahellip==FindthenetworkparameterslowastthatminimizetotallossLTotalLoss:AssmallaspossibleFindafunctioninfunctionsetthatminimizestotallossLStep:defineasetoffunctionStep:goodnessoffunctionStep:pickthebestfunctionThreeStepsforDeepLearningDeepLearningissosimplehelliphellipNeuralNetworkHowtopickthebestfunctionFindnetworkparameterslowastthatminimizetotallossLNetworkparameters=,,,,,,,EnumerateallpossiblevaluesLayerlhelliphellipLayerlhelliphellipEgspeechrecognition:layersandneuronseachlayerneuronsneuronsweightsMillionsofparametersGradientDescentTotalLossRandom,RBMpretrainUsuallygoodenoughNetworkparameters=,,,,,wPickaninitialvalueforwFindnetworkparameterslowastthatminimizetotallossLGradientDescentTotalLossNetworkparameters=,,,,,wPickaninitialvalueforwComputePositiveNegativeDecreasewIncreasewhttp:chicopixnetnetalbumphotoFindnetworkparameterslowastthatminimizetotallossLGradientDescentTotalLossNetworkparameters=,,,,,wPickaninitialvalueforwComputeminusetaiscalledldquolearningraterdquolarrminusRepeatFindnetworkparameterslowastthatminimizetotallossLGradientDescentTotalLossNetworkparameters=,,,,,wPickaninitialvalueforwComputelarrminusRepeatUntilisapproximatelysmall(whenupdateislittle)FindnetworkparameterslowastthatminimizetotallossLGradientDescentComputeminusComputeminusComputeminushelliphelliphelliphellip=gradientGradientDescentComputeminusminusComputeComputeminusminusComputeComputeminusminusComputehelliphelliphelliphelliphelliphelliphelliphelliphelliphellipGradientDescentColor:ValueofTotalLossLRandomlypickastartingpointGradientDescentHopfully,wewouldreachaminimahellipCompute,(minus,minus)Color:ValueofTotalLossLGradientDescentDifficultybullGradientdescentneverguaranteeglobalminimaDifferentinitialpointReachdifferentminima,sodifferentresultsTherearesometipstohelpyouavoidlocalminima,noguaranteeGradientDescentYouareplayingAgeofEmpireshellipCompute,(minus,minus)YoucannotseethewholemapGradientDescentThisistheldquolearningrdquoofmachinesindeeplearninghelliphellipEvenalphagousingthisapproachIhopeyouarenottoodisappointed:pPeopleimagehelliphellipActuallyhellipBackpropagationbullBackpropagation:anefficientwaytocomputebullRef:http:speecheentuedutw~tlkagkcoursesMLDSLectureDNNbackpropecmmpindexhtmlDonrsquotworryabout,thetoolkitswillhandleit台大周伯威同學開發Step:defineasetoffunctionStep:goodnessoffunctionStep:pickthebestfunctionConcludingRemarksDeepLearningissosimplehelliphellipOutlineofLectureIIntroductionofDeepLearningWhyDeepldquoHelloWorldrdquoforDeepLearningLayerXSizeWordErrorRate()LayerXSizeWordErrorRate()XkXkXkXkXkXXkXXkDeeperisBetterSeide,Frank,GangLi,andDongYuConversationalSpeechTranscriptionUsingContextDependentDeepNeuralNetworksInterspeechNotsurprised,moreparameters,betterperformanceUniversalityTheoremReferenceforthereason:http:neuralnetworksanddeeplearningcomchaphtmlAnycontinuousfunctionfM:RRfNCanberealizedbyanetworkwithonehiddenlayer(givenenoughhiddenneurons)WhyldquoDeeprdquoneuralnetworknotldquoFatrdquoneuralnetworkFatShortvsThinTallxxhelliphellipNxDeepxxhelliphellipNxhelliphellipShallowWhichoneisbetterThesamenumberofparametersFatShortvsThinTallSeide,Frank,GangLi,andDongYuConversationalSpeechTranscriptionUsingContextDependentDeepNeuralNetworksInterspeechLayerXSizeWordErrorRate()LayerXSizeWordErrorRate()XkXkXkXkXkXXkXXkWhyAnalogybullLogiccircuitsconsistsofgatesbullAtwolayersoflogicgatescanrepresentanyBooleanfunctionbullUsingmultiplelayersoflogicgatestobuildsomefunctionsaremuchsimplerbullNeuralnetworkconsistsofneuronsbullAhiddenlayernetworkcanrepresentanycontinuousfunctionbullUsingmultiplelayersofneuronstorepresentsomefunctionsaremuchsimplerThispageisforEEbackgroundlessgatesneededLogiccircuitsNeuralnetworklessparameterslessdata長髮男ModularizationbullDeeprarrModularizationGirlswithlonghairBoyswithshorthairBoyswithlonghairImageClassifierClassifierClassifier長髮女長髮女長髮女長髮女Girlswithshorthair短髮女短髮男短髮男短髮男短髮男短髮女短髮女短髮女ClassifierLittleexamplesweakModularizationbullDeeprarrModularizationImageLongorshortBoyorGirlClassifiersfortheattributes長髮男長髮女長髮女長髮女長髮女短髮女短髮男短髮男短髮男短髮男短髮女短髮女短髮女vs長髮男長髮女長髮女長髮女長髮女短髮女短髮男短髮男短髮男短髮男短髮女短髮女短髮女vsEachbasicclassifiercanhavesufficienttrainingexamplesBasicClassifierModularizationbullDeeprarrModularizationImageLongorshortBoyorGirlSharingbythefollowingclassifiersasmodulecanbetrainedbylittledataGirlswithlonghairBoyswithshorthairBoyswithlonghairClassifierClassifierClassifierGirlswithshorthairClassifierLittledatafineBasicClassifierModularizationbullDeeprarrModularizationxxhelliphellipNxhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellipThemostbasicclassifiersUsestlayerasmoduletobuildclassifiersUsendlayerasmodulehelliphellipThemodularizationisautomaticallylearnedfromdatararrLesstrainingdataModularizationbullDeeprarrModularizationxxhelliphellipNxhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellipThemostbasicclassifiersUsestlayerasmoduletobuildclassifiersUsendlayerasmodulehelliphellipReference:Zeiler,MD,Fergus,R()VisualizingandunderstandingconvolutionalnetworksInComputerVisionndashECCV(pp)OutlineofLectureIIntroductionofDeepLearningWhyDeepldquoHelloWorldrdquoforDeepLearningKeraskerashttp:speecheentuedutw~tlkagkcoursesMLDSLectureTheanoDNNecmmpindexhtmlhttp:speecheentuedutw~tlkagkcoursesMLDSLectureRNNtraining(v)ecmmpindexhtmlVeryflexibleNeedsomeefforttolearnEasytolearnanduse(stillhavesomeflexibility)YoucanmodifyitifyoucanwriteTensorFloworTheanoInterfaceofTensorFloworTheanoorIfyouwanttolearntheano:KerasbullFranccediloisCholletistheauthorofKerasbullHecurrentlyworksforGoogleasadeeplearningengineerandresearcherbullKerasmeanshorninGreekbullDocumentation:http:kerasiobullExample:https:githubcomfcholletkerastreemasterexamples使用Keras心得感謝沈昇勳同學提供圖檔ExampleApplicationbullHandwritingDigitRecognitionMachineldquordquoldquoHelloworldrdquofordeeplearningMNISTData:http:yannlecuncomexdbmnistKerasprovidesdatasetsloadingfunction:http:kerasiodatasetsxKerasyyyhelliphelliphelliphelliphelliphelliphelliphellipSoftmaxxKerasKerasStep:ConfigurationStep:FindtheoptimalnetworkparameterslarrminusTrainingdata(Images)Labels(digits)NextlectureKerasStep:Findtheoptimalnetworkparametershttps:wwwtensorfloworgversionsrtutorialsmnistbeginnersindexhtmlNumberoftrainingexamplesnumpyarrayx=numpyarrayNumberoftrainingexampleshelliphelliphelliphellipKerashttp:kerasiogettingstartedfaq#howcanisaveakerasmodelHowtousetheneuralnetwork(testing):case:case:SaveandloadmodelsKerasbullUsingGPUtospeedtrainingbullWaybullTHEANOFLAGS=device=gpupythonYourCodepybullWay(inyourcode)bullimportosbullosenvironTHEANOFLAGS=device=gpuLiveDemoLectureII:TipsforTrainingDNNNeuralNetworkGoodResultsonTestingDataGoodResultsonTrainingDataStep:pickthebestfunctionStep:goodnessoffunctionStep:defineasetoffunctionYESYESNONOOverfitting!RecipeofDeepLearningDonotalwaysblameOverfittingTestingDataOverfittingTrainingDataNotwelltrainedNeuralNetworkGoodResultsonTestingDataGoodResultsonTrainingDataYESYESRecipeofDeepLearningDifferentapproachesfordifferentproblemsegdropoorgoodresultsontestingdataGoodResultsonTestingDataGoodResultsonTrainingDataYESYESRecipeofDeepLearningChoosingproperlossMinibatchNewactivationfunctionAdaptiveLearningRateMomentumChoosingProperLossxxhelliphellipxhelliphelliphelliphelliphelliphelliphelliphelliphelliphellipyyylossldquordquohelliphelliphelliphelliptargetSoftmax=minusSquareErrorCrossEntropyminus=Whichoneisbetterhelliphellip==LetrsquostryitSquareErrorCrossEntropyLetrsquostryitAccuracySquareErrorCrossEntropyTrainingTesting:CrossEntropySquareErrorChoosingProperLossTotalLosswwCrossEntropySquareErrorWhenusingsoftmaxoutputlayer,choosecrossentropyhttp:jmlrorgproceedingspapersvglorotaglorotapdfGoodResultsonTestingDataGoodResultsonTrainingDataYESYESRecipeofDeepLearningChoosingproperlossMinibatchNewactivationfunctionAdaptiveLearningRateMomentumMinibatchxNNhelliphellipyxNNyxNNhelliphellipyxNNyPickthestbatchRandomlyinitializenetworkparametersPickthendbatchMinibatchMinibatchprime=primeprime=UpdateparametersonceUpdateparametersonceUntilallminibatcheshavebeenpickedhelliponeepochRepeattheaboveprocessWedonotreallyminimizetotalloss!MinibatchxNNhelliphellipyxNNyMinibatchPickthestbatchPickthendbatchprime=primeprime=UpdateparametersonceUpdateparametersonceUntilallminibatcheshavebeenpickedhelliponeepochexamplesinaminibatchRepeattimesMinibatchxNNhelliphellipyxNNyxNNhelliphellipyxNNyPickthestbatchRandomlyinitializenetworkparametersPickthendbatchMinibatchMinibatchprime=primeprime=UpdateparametersonceUpdateparametersoncehellipLisdifferenteachtimewhenweupdateparameters!Wedonotreallyminimizetotalloss!MinibatchOriginalGradientDescentWithMinibatchUnstable!!!ThecolorsrepresentthetotallossMinibatchisFasterepochSeeallexamplesSeeonlyonebatchUpdateafterseeingallexamplesIftherearebatches,updatetimesinoneepochOriginalGradientDescentWithMinibatchNotalwaystruewithparallelcomputingCanhavethesamespeed(notsuperlargedataset)Minibatchhasbetterperformance!MinibatchisBetter!AccuracyMinibatchNobatchTesting:EpochAccuracyMinibatchNobatchTrainingxNNhelliphellipyxNNyxNNhelliphellipyxNNyMinibatchMinibatchShufflethetrainingexamplesforeachepochEpochxNNhelliphellipyxNNyxNNhelliphellipyxNNyMinibatchMinibatchEpochDonrsquotworryThisisthedefaultofKerasGoodResultsonTestingDataGoodResultsonTrainingDataYESYESRecipeofDeepLearningChoosingproperlossMinibatchNewactivationfunctionAdaptiveLearningRateMomentumHardtogetthepowerofDeephellipDeeperusuallydoesnotimplybetterResultsonTrainingDataLetrsquostryitAccuracylayerslayersTesting:layerslayersTrainingVanishingGradientProblemLargergradientsAlmostrandomAlreadyconvergebasedonrandom!LearnveryslowLearnveryfastxxhelliphellipNxhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellipyyyMSmallergradientsVanishingGradientProblemxxhelliphellipNxhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellipIntuitivewaytocomputethederivativeshellip=SmallergradientsLargeinputSmalloutputHardtogetthepowerofDeephellipIn,peopleusedRBMpretrainingIn,peopleuseReLUReLUbullRectifiedLinearUnit(ReLU)Reason:FasttocomputeBiologicalreasonInfinitesigmoidwithdifferentbiasesVanishinggradientproblem==XavierGlorot,AISTATSrsquoAndrewLMaas,ICMLrsquoKaimingHe,arXivrsquoReLUxxyy==ReLUxxyyAThinnerlinearnetworkDonothavesmallergradients==LetrsquostryitLetrsquostryitbulllayerslayersAccuracySigmoidReLUTrainingTesting:ReLUSigmoidReLUvariant====alphaalsolearnedbygradientdescentMaxoutbullLearnableactivationfunctionIanJGoodfellow,ICMLrsquoMaxxxInputMaxminusMaxMaxReLUisaspecialcasesofMaxoutYoucanhavemorethanelementsinagroupneuronMaxoutbullLearnableactivationfunctionIanJGoodfellow,ICMLrsquobullActivationfunctioninmaxoutnetworkcanbeanypiecewiselinearconvexfunctionbullHowmanypiecesdependingonhowmanyelementsinagroupReLUisaspecialcasesofMaxoutelementsinagroupelementsinagroupGoodResultsonTestingDataGoodResultsonTrainingDataYESYESRecipeofDeepLearningChoosingproperlossMinibatchNewactivationfunctionAdaptiveLearningRateMomentumLearningRatesIflearningrateistoolargeTotallossmaynotdecreaseaftereachupdateSetthelearningrateetacarefullyLearningRatesIflearningrateistoolargeSetthelearningrateetacarefullyIflearningrateistoosmallTrainingwouldbetooslowTotallossmaynotdecreaseaftereachupdateLearningRatesbullPopularSimpleIdea:ReducethelearningratebysomefactoreveryfewepochsbullAtthebeginning,wearefarfromthedestination,soweuselargerlearningratebullAfterseveralepochs,weareclosetothedestination,sowereducethelearningratebullEgtdecay:=bullLearningratecannotbeonesizefitsallbullGivingdifferentparametersdifferentlearningratesAdagradParameterdependentlearningratewlarrminusߟconstantisobtainedattheithupdateߟ==SummationofthesquareofthepreviousderivativeslarrminusOriginal:Adagrad:AdagradgghelliphelliphelliphellipgghelliphelliphelliphellipObservation:LearningrateissmallerandsmallerforallparametersSmallerderivatives,largerlearningrate,andviceversa====Whyߟ==Learningrate:Learningrate:SmallerDerivativesLargerLearningRateSmallerderivatives,largerlearningrate,andviceversaWhySmallerLearningRateLargerderivativesNotthewholestoryhelliphellipbullAdagradJohnDuchi,JMLRrsquobullRMSpropbullhttps:wwwyoutubecomwatchv=OsxAchxZUbullAdadeltaMatthewDZeiler,arXivrsquobullldquoNomorepeskylearningratesrdquoTomSchaul,arXivrsquobullAdaSecantCaglarGulcehre,

用户评论(0)

0/200

精彩专题

上传我的资料

每篇奖励 +2积分

资料评价:

/301
0下载券 下载 加入VIP, 送下载券

意见
反馈

立即扫码关注

爱问共享资料微信公众号

返回
顶部