关闭

关闭

封号提示

内容

首页 speech+and+language+processing.pdf

speech+and+language+processing.pdf

speech+and+language+processing.…

上传者: 52nlpcn 2011-12-27 评分1 评论0 下载3557 收藏1 阅读量4998 暂无简介 简介 举报

简介:本文档为《speech+and+language+processingpdf》,可适用于专题技术领域,主题内容包含DRAFTSpeechandLanguageProcessing:Anintroductiontonaturallanguageprocessing符等。

DRAFTSpeechandLanguageProcessing:Anintroductiontonaturallanguageprocessing,computationallinguistics,andspeechrecognitionDanielJurafskyJamesHMartinCopyrightc,AllrightsreservedDraftofJune,DonotcitewithoutpermissionINTRODUCTIONDaveBowman:Openthepodbaydoors,HALHAL:I’msorryDave,I’mafraidIcan’tdothatStanleyKubrickandArthurCClarke,screenplayof:ASpaceOdysseyThisbookisaboutanewinterdisciplinaryfieldvariouslycalledcomputerspeechandlanguageprocessingorhumanlanguagetechnologyornaturallanguageprocessingorcomputationallinguisticsThegoalofthisnewfieldistogetcomputerstoperformusefultasksinvolvinghumanlanguage,taskslikeenablinghumanmachinecommunication,improvinghumanhumancommunication,orsimplydoingusefulprocessingoftextorspeechOneexampleofausefulsuchtaskisaconversationalagentTheHALcomCONVERSATIONALAGENTputerinStanleyKubrick’sfilm:ASpaceOdysseyisoneofthemostrecognizablecharactersintwentiethcenturycinemaHALisanartificialagentcapableofsuchadvancedlanguageprocessingbehaviorasspeakingandunderstandingEnglish,andatacrucialmomentintheplot,evenreadinglipsItisnowclearthatHAL’screatorArthurCClarkewasalittleoptimisticinpredictingwhenanartificialagentsuchasHALwouldbeavailableButjusthowfaroffwasheWhatwouldittaketocreateatleastthelanguagerelatedpartsofHALWecallprogramslikeHALthatconversewithhumansvianaturallanguageconversationalagentsordialoguesystemsInthistextweCONVERSATIONALAGENTSDIALOGUESYSTEMSstudythevariouscomponentsthatmakeupmodernconversationalagents,includinglanguageinput(automaticspeechrecognitionandnaturallanguageunderstanding)andlanguageoutput(naturallanguagegenerationandspeechsynthesis)Let’sturntoanotherusefullanguagerelatedtask,thatofmakingavailabletononEnglishspeakingreadersthevastamountofscientificinformationontheWebinEnglishOrtranslatingforEnglishspeakersthehundredsofmillionsofWebpageswritteninotherlanguageslikeChineseThegoalofmachinetranslationistoautomaticallyMACHINETRANSLATIONtranslateadocumentfromonelanguagetoanotherMachinetranslationisfarfromasolvedproblemwewillcoverthealgorithmscurrentlyusedinthefield,aswellasimportantcomponenttasksManyotherlanguageprocessingtasksarealsorelatedtotheWebAnothersuchtaskisWebbasedquestionansweringThisisageneralizationofsimplewebsearch,QUESTIONANSWERINGwhereinsteadofjusttypingkeywordsausermightaskcompletequestions,rangingfromeasytohard,likethefollowing:DRAFTChapterIntroduction•Whatdoes“divergent”mean•WhatyearwasAbrahamLincolnborn•HowmanystateswereintheUnitedStatesthatyear•HowmuchChinesesilkwasexportedtoEnglandbytheendofthethcentury•WhatdoscientiststhinkabouttheethicsofhumancloningSomeofthese,suchasdefinitionquestions,orsimplefactoidquestionslikedatesandlocations,canalreadybeansweredbysearchenginesButansweringmorecomplicatedquestionsmightrequireextractinginformationthatisembeddedinothertextonaWebpage,ordoinginference(drawingconclusionsbasedonknownfacts),orsynthesizingandsummarizinginformationfrommultiplesourcesorwebpagesInthistextwestudythevariouscomponentsthatmakeupmodernunderstandingsystemsofthiskind,includinginformationextraction,wordsensedisambiguation,andsoonAlthoughthesubfieldsandproblemswe’vedescribedaboveareallveryfarfromcompletelysolved,theseareallveryactiveresearchareasandmanytechnologiesarealreadyavailablecommerciallyIntherestofthischapterwebrieflysummarizethekindsofknowledgethatisnecessaryforthesetasks(andotherslikespellcorrection,grammarchecking,andsoon),aswellasthemathematicalmodelsthatwillbeintroducedthroughoutthebookKNOWLEDGEINSPEECHANDLANGUAGEPROCESSINGWhatdistinguisheslanguageprocessingapplicationsfromotherdataprocessingsystemsistheiruseofknowledgeoflanguageConsidertheUnixwcprogram,whichisusedtocountthetotalnumberofbytes,words,andlinesinatextfileWhenusedtocountbytesandlines,wcisanordinarydataprocessingapplicationHowever,whenitisusedtocountthewordsinafileitrequiresknowledgeaboutwhatitmeanstobeaword,andthusbecomesalanguageprocessingsystemOfcourse,wcisanextremelysimplesystemwithanextremelylimitedandimpoverishedknowledgeoflanguageSophisticatedconversationalagentslikeHAL,ormachinetranslationsystems,orrobustquestionansweringsystems,requiremuchbroaderanddeeperknowledgeoflanguageTogetafeelingforthescopeandkindofrequiredknowledge,considersomeofwhatHALwouldneedtoknowtoengageinthedialoguethatbeginsthischapter,orforaquestionansweringsystemtoansweroneofthequestionsaboveHALmustbeabletorecognizewordsfromanaudiosignalandtogenerateanaudiosignalfromasequenceofwordsThesetasksofspeechrecognitionandspeechsynthesistasksrequireknowledgeaboutphoneticsandphonologyhowwordsarepronouncedintermsofsequencesofsounds,andhoweachofthesesoundsisrealizedacousticallyNotealsothatunlikeStarTrek’sCommanderData,HALiscapableofproducingcontractionslikeI’mandcan’tProducingandrecognizingtheseandothervariationsofindividualwords(eg,recognizingthatdoorsisplural)requiresknowledgeaboutmorphology,thewaywordsbreakdownintocomponentpartsthatcarrymeaningslikesingularversuspluralDRAFTSectionKnowledgeinSpeechandLanguageProcessingMovingbeyondindividualwords,HALmustusestructuralknowledgetoproperlystringtogetherthewordsthatconstituteitsresponseForexample,HALmustknowthatthefollowingsequenceofwordswillnotmakesensetoDave,despitethefactthatitcontainspreciselythesamesetofwordsastheoriginalI’mIdo,sorrythatafraidDaveI’mcan’tTheknowledgeneededtoorderandgroupwordstogethercomesundertheheadingofsyntaxNowconsideraquestionansweringsystemdealingwiththefollowingquestion:•HowmuchChinesesilkwasexportedtoWesternEuropebytheendofthethcenturyInordertoanswerthisquestionweneedtoknowsomethingaboutlexicalsemantics,themeaningofallthewords(export,orsilk)aswellascompositionalsemantics(whatexactlyconstitutesWesternEuropeasopposedtoEasternorSouthernEurope,whatdoesendmeanwhencombinedwiththethcenturyWealsoneedtoknowsomethingabouttherelationshipofthewordstothesyntacticstructureForexampleweneedtoknowthatbytheendofthethcenturyisatemporalendpoint,andnotadescriptionoftheagent,asthebyphraseisinthefollowingsentence:•HowmuchChinesesilkwasexportedtoWesternEuropebysouthernmerchantsWealsoneedthekindofknowledgethatletsHALdeterminethatDave’sutteranceisarequestforaction,asopposedtoasimplestatementabouttheworldoraquestionaboutthedoor,asinthefollowingvariationsofhisoriginalstatementREQUEST:HAL,openthepodbaydoorSTATEMENT:HAL,thepodbaydoorisopenINFORMATIONQUESTION:HAL,isthepodbaydooropenNext,despiteitsbadbehavior,HALknowsenoughtobepolitetoDaveItcould,forexample,havesimplyrepliedNoorNo,Iwon’topenthedoorInstead,itfirstembellishesitsresponsewiththephrasesI’msorryandI’mafraid,andthenonlyindirectlysignalsitsrefusalbysayingIcan’t,ratherthanthemoredirect(andtruthful)Iwon’tThisknowledgeaboutthekindofactionsthatspeakersintendbytheiruseofsentencesispragmaticordialogueknowledgeAnotherkindofpragmaticordiscourseknowledgeisrequiredtoanswerthequestion•HowmanystateswereintheUnitedStatesthatyearWhatyearisthatyearInordertointerpretwordslikethatyearaquestionansweringsystemneedtoexaminethetheearlierquestionsthatwereaskedinthiscasethepreviousquestiontalkedabouttheyearthatLincolnwasbornThusthistaskofcoreferenceresolutionmakesuseofknowledgeabouthowwordslikethatorpronounslikeitorsherefertopreviouspartsofthediscourseTosummarize,engagingincomplexlanguagebehaviorrequiresvariouskindsofknowledgeoflanguage:ForthoseunfamiliarwithHAL,itisneithersorrynorafraid,norisitincapableofopeningthedoorIthassimplydecidedinafitofparanoiatokillitscrewDRAFTChapterIntroduction•PhoneticsandPhonologyknowledgeaboutlinguisticsounds•Morphologyknowledgeofthemeaningfulcomponentsofwords•Syntaxknowledgeofthestructuralrelationshipsbetweenwords•Semanticsknowledgeofmeaning•Pragmaticsknowledgeoftherelationshipofmeaningtothegoalsandintentionsofthespeaker•DiscourseknowledgeaboutlinguisticunitslargerthanasingleutteranceAMBIGUITYAperhapssurprisingfactaboutthesecategoriesoflinguisticknowledgeisthatmosttasksinspeechandlanguageprocessingcanbeviewedasresolvingambiguityatoneAMBIGUITYoftheselevelsWesaysomeinputisambiguousiftherearemultiplealternativelinAMBIGUOUSguisticstructuresthatcanbebuiltforitConsiderthespokensentenceImadeherduckHere’sfivedifferentmeaningsthissentencecouldhave(seeifyoucanthinkofsomemore),eachofwhichexemplifiesanambiguityatsomelevel:()Icookedwaterfowlforher()Icookedwaterfowlbelongingtoher()Icreatedthe(plaster)ducksheowns()Icausedhertoquicklylowerherheadorbody()IwavedmymagicwandandturnedherintoundifferentiatedwaterfowlThesedifferentmeaningsarecausedbyanumberofambiguitiesFirst,thewordsduckandheraremorphologicallyorsyntacticallyambiguousintheirpartofspeechDuckcanbeaverboranoun,whilehercanbeadativepronounorapossessivepronounSecond,thewordmakeissemanticallyambiguousitcanmeancreateorcookFinally,theverbmakeissyntacticallyambiguousinadifferentwayMakecanbetransitive,thatis,takingasingledirectobject(),oritcanbeditransitive,thatis,takingtwoobjects(),meaningthatthefirstobject(her)gotmadeintothesecondobject(duck)Finally,makecantakeadirectobjectandaverb(),meaningthattheobject(her)gotcausedtoperformtheverbalaction(duck)Furthermore,inaspokensentence,thereisanevendeeperkindofambiguitythefirstwordcouldhavebeeneyeorthesecondwordmaidWewilloftenintroducethemodelsandalgorithmswepresentthroughoutthebookaswaystoresolveordisambiguatetheseambiguitiesForexampledecidingwhetherduckisaverboranouncanbesolvedbypartofspeechtaggingDecidingwhethermakemeans“create”or“cook”canbesolvedbywordsensedisambiguationResolutionofpartofspeechandwordsenseambiguitiesaretwoimportantkindsoflexicaldisambiguationAwidevarietyoftaskscanbeframedaslexicaldisambiguationproblemsForexample,atexttospeechsynthesissystemreadingthewordleadneedstodecidewhetheritshouldbepronouncedasinleadpipeorasinleadmeonBycontrast,decidingwhetherherandduckarepartofthesameentity(asin()or())oraredifferententity(asin())isanexampleofsyntacticdisambiguationandcanDRAFTSectionModelsandAlgorithmsbeaddressedbyprobabilisticparsingAmbiguitiesthatdon’tariseinthisparticularexample(likewhetheragivensentenceisastatementoraquestion)willalsoberesolved,forexamplebyspeechactinterpretationMODELSANDALGORITHMSOneofthekeyinsightsofthelastyearsofresearchinlanguageprocessingisthatthevariouskindsofknowledgedescribedinthelastsectionscanbecapturedthroughtheuseofasmallnumberofformalmodels,ortheoriesFortunately,thesemodelsandtheoriesarealldrawnfromthestandardtoolkitsofcomputerscience,mathematics,andlinguisticsandshouldbegenerallyfamiliartothosetrainedinthosefieldsAmongthemostimportantmodelsarestatemachines,rulesystems,logic,probabilisticmodels,andvectorspacemodelsThesemodels,inturn,lendthemselvestoasmallnumberofalgorithms,amongthemostimportantofwhicharestatespacesearchalgorithmssuchasdynamicprogramming,andmachinelearningalgorithmssuchasclassifiersandEMandotherlearningalgorithmsIntheirsimplestformulation,statemachinesareformalmodelsthatconsistofstates,transitionsamongstates,andaninputrepresentationSomeofthevariationsofthisbasicmodelthatwewillconsideraredeterministicandnondeterministicfinitestateautomataandfinitestatetransducersCloselyrelatedtothesemodelsaretheirdeclarativecounterparts:formalrulesystemsAmongthemoreimportantoneswewillconsiderareregulargrammarsandregularrelations,contextfreegrammars,featureaugmentedgrammars,aswellasprobabilisticvariantsofthemallStatemachinesandformalrulesystemsarethemaintoolsusedwhendealingwithknowledgeofphonology,morphology,andsyntaxThethirdmodelthatplaysacriticalroleincapturingknowledgeoflanguageislogicWewilldiscussfirstorderlogic,alsoknownasthepredicatecalculus,aswellassuchrelatedformalismsaslambdacalculus,featurestructures,andsemanticprimitivesTheselogicalrepresentationshavetraditionallybeenusedformodelingsemanticsandpragmatics,althoughmorerecentworkhasfocusedonmorerobusttechniquesdrawnfromnonlogicallexicalsemanticsProbabilisticmodelsarecrucialforcapturingeverykindoflinguisticknowledgeEachoftheothermodels(statemachines,formalrulesystems,andlogic)canbeaugmentedwithprobabilitiesForexamplethestatemachinecanbeaugmentedwithprobabilitiestobecometheweightedautomatonorMarkovmodelWewillspendasignificantamountoftimeonhiddenMarkovmodelsorHMMs,whichareusedeverywhereinthefield,inpartofspeechtagging,speechrecognition,dialogueunderstanding,texttospeech,andmachinetranslationThekeyadvantageofprobabilisticmodelsistheirabilitytotosolvethemanykindsofambiguityproblemsthatwediscussedearlieralmostanyspeechandlanguageprocessingproblemcanberecastas:“givenNchoicesforsomeambiguousinput,choosethemostprobableone”Finally,vectorspacemodels,basedonlinearalgebra,underlieinformationretrievalandmanytreatmentsofwordmeaningsProcessinglanguageusinganyofthesemodelstypicallyinvolvesasearchthroughDRAFTChapterIntroductionaspaceofstatesrepresentinghypothesesaboutaninputInspeechrecognition,wesearchthroughaspaceofphonesequencesforthecorrectwordInparsing,wesearchthroughaspaceoftreesforthesyntacticparseofaninputsentenceInmachinetranslation,wesearchthroughaspaceoftranslationhypothesesforthecorrecttranslationofasentenceintoanotherlanguageFornonprobabilistictasks,suchasstatemachines,weusewellknowngraphalgorithmssuchasdepthfirstsearchForprobabilistictasks,weuseheuristicvariantssuchasbestfirstandA*search,andrelyondynamicprogrammingalgorithmsforcomputationaltractabilityFormanylanguagetasks,werelyonmachinelearningtoolslikeclassifiersandsequencemodelsClassifierslikedecisiontrees,supportvectormachines,GaussianMixtureModelsandlogisticregressionareverycommonlyusedAhiddenMarkovmodelisonekindofsequencemodelotherareMaximumEntropyMarkovModelsorConditionalRandomFieldsAnothertoolthatisrelatedtomachinelearningismethodologicaltheuseofdistincttrainingandtestsets,statisticaltechniqueslikecrossvalidation,andcarefulevaluationofourtrainedsystemsLANGUAGE,THOUGHT,ANDUNDERSTANDINGTomany,theabilityofcomputerstoprocesslanguageasskillfullyaswehumansdowillsignalthearrivaloftrulyintelligentmachinesThebasisofthisbeliefisthefactthattheeffectiveuseoflanguageisintertwinedwithourgeneralcognitiveabilitiesAmongthefirsttoconsiderthecomputationalimplicationsofthisintimateconnectionwasAlanTuring()Inthisfamouspaper,TuringintroducedwhathascometobeknownastheTuringTestTuringbeganwiththethesisthatthequestionofwhatitTURINGTESTwouldmeanforamachinetothinkwasessentiallyunanswerableduetotheinherentimprecisioninthetermsmachineandthinkInstead,hesuggestedanempiricaltest,agame,inwhichacomputer’suseoflanguagewouldformthebasisfordeterminingifitcouldthinkIfthemachinecouldwinthegameitwouldbejudgedintelligentInTuring’sgame,therearethreeparticipants:twopeopleandacomputerOneofthepeopleisacontestantandplaystheroleofaninterrogatorTowin,theinterrogatormustdeterminewhichoftheothertwoparticipantsisthemachinebyaskingaseriesofquestionsviaateletypeThetaskofthemachineistofooltheinterrogatorintobelievingitisapersonbyrespondingasapersonwouldtotheinterrogator’squestionsThetaskofthesecondhumanparticipantistoconvincetheinterrogatorthattheotherparticipantisthemachine,andthattheyarehumanThefollowinginteractionfromTuring’spaperillustratesthekindofinteractionshehadinmindClearlyaconvincingsimulationofhumanitydoesn’trequirebeinganexpertinalldomains:Q:PleasewritemeasonnetonthetopicoftheForthBridgeA:CountmeoutonthisoneInevercouldwritepoetryQ:AddtoA:(Pauseaboutsecondsandthengiveansweras)TuringpredictedthatroughlybytheendofthetwentiethcenturyamachinewithDRAFTSectionLanguage,Thought,andUnderstandinggigabytesofmemorywouldhavearoundachanceoffoolingahumaninterrogatorafterminutesofquestioningGiventhefactthatyoucanfoolsomeofthepeopleallthetime,itisnotclearhowrigorousthisparticularstandardisNevertheless,thecriticalissueforTuringwasthatusinglanguageashumansdoissufficient,byitself,asanoperationaltestforintelligenceSurprisingly,manyoftheissuesraisedbyTuringbecamerelevantasearlyasinthecontextoftheELIZAprogram(Weizenbaum,)ELIZAwasanearlynaturallanguageprocessingsystemcapableofcarryingonalimitedformofconversationwithauserConsiderthefollowingsessionwithaversionofELIZAthatimitatedtheresponsesofaRogerianpsychotherapistUser:YouarelikemyfatherinsomewaysELIZA:WHATRESEMBLANCEDOYOUSEEUser:YouarenotveryaggressivebutIthinkyoudon’twantmetonoticethatELIZA:WHATMAKESYOUTHINKIAMNOTAGGRESSIVEUser:Youdon’targuewithmeELIZA:WHYDOYOUTHINKIDON’TARGUEWITHYOUUser:YouareafraidofmeELIZA:DOESITPLEASEYOUTOBELIEVEIAMAFRAIDOFYOUAswewilldiscussinmoredetailinCh,ELIZAisaremarkablysimpleprogramthatmakesuseofpatternmatchingtoprocesstheinputandtranslateitintosuitableoutputsThesuccessofthissimpletechniqueinthisdomainisduetothefactthatELIZAdoesn’tactuallyneedtoknowanythingtomimicaRogerianpsychotherapistAsWeizenbaumnotes,thisisoneofthefewdialo

类似资料

编辑推荐

利息与价格.pdf

速写教程.pdf

MX Component Ver 3编程手册.pdf

二次世界大战史(上下).pdf

[闺秀诗三百首][1].李鹤鸣.扫描版.pdf

职业精品

精彩专题

鱼的家常做法大全

为您提供鱼的做法大全:黄花鱼怎么做好吃,熏鱼的家常做法,清蒸鲳鱼的做法,不管是怎样的做法,我们这里鱼的家常菜谱全都有哦。快一起来看看鱼怎么做才好吃吧!

用户评论

0/200
    暂无评论
上传我的资料

精选资料

热门资料排行换一换

  • 值分布论及其新研究.pdf

  • 值分布论及其新研究.pdf

  • 运动稳定性理论与应用.pdf

  • 德国文化史.pdf

  • 亚纯函数唯一性理论.pdf

  • 瑞士简史.pdf

  • 线性模型参数的估计理论.pdf

  • 无穷维随机分析引论.pdf

  • Study of the Kin…

  • 资料评价:

    / 1037
    所需积分:0 立即下载

    意见
    反馈

    返回
    顶部