首页 speech+and+language+processing.pdf

speech+and+language+processing.pdf

speech+and+language+processing.…

上传者: 52nlpcn 2011-12-27 评分1 评论0 下载3490 收藏1 阅读量4915 暂无简介 简介 举报

简介:本文档为《speech+and+language+processingpdf》,可适用于专题技术领域,主题内容包含DRAFTSpeechandLanguageProcessing:Anintroductiontonaturallanguageprocessing符等。

DRAFTSpeechandLanguageProcessing:Anintroductiontonaturallanguageprocessing,computationallinguistics,andspeechrecognitionDanielJurafskyJamesHMartinCopyrightc,AllrightsreservedDraftofJune,DonotcitewithoutpermissionINTRODUCTIONDaveBowman:Openthepodbaydoors,HALHAL:I’msorryDave,I’mafraidIcan’tdothatStanleyKubrickandArthurCClarke,screenplayof:ASpaceOdysseyThisbookisaboutanewinterdisciplinaryfieldvariouslycalledcomputerspeechandlanguageprocessingorhumanlanguagetechnologyornaturallanguageprocessingorcomputationallinguisticsThegoalofthisnewfieldistogetcomputerstoperformusefultasksinvolvinghumanlanguage,taskslikeenablinghumanmachinecommunication,improvinghumanhumancommunication,orsimplydoingusefulprocessingoftextorspeechOneexampleofausefulsuchtaskisaconversationalagentTheHALcomCONVERSATIONALAGENTputerinStanleyKubrick’sfilm:ASpaceOdysseyisoneofthemostrecognizablecharactersintwentiethcenturycinemaHALisanartificialagentcapableofsuchadvancedlanguageprocessingbehaviorasspeakingandunderstandingEnglish,andatacrucialmomentintheplot,evenreadinglipsItisnowclearthatHAL’screatorArthurCClarkewasalittleoptimisticinpredictingwhenanartificialagentsuchasHALwouldbeavailableButjusthowfaroffwasheWhatwouldittaketocreateatleastthelanguagerelatedpartsofHALWecallprogramslikeHALthatconversewithhumansvianaturallanguageconversationalagentsordialoguesystemsInthistextweCONVERSATIONALAGENTSDIALOGUESYSTEMSstudythevariouscomponentsthatmakeupmodernconversationalagents,includinglanguageinput(automaticspeechrecognitionandnaturallanguageunderstanding)andlanguageoutput(naturallanguagegenerationandspeechsynthesis)Let’sturntoanotherusefullanguagerelatedtask,thatofmakingavailabletononEnglishspeakingreadersthevastamountofscientificinformationontheWebinEnglishOrtranslatingforEnglishspeakersthehundredsofmillionsofWebpageswritteninotherlanguageslikeChineseThegoalofmachinetranslationistoautomaticallyMACHINETRANSLATIONtranslateadocumentfromonelanguagetoanotherMachinetranslationisfarfromasolvedproblemwewillcoverthealgorithmscurrentlyusedinthefield,aswellasimportantcomponenttasksManyotherlanguageprocessingtasksarealsorelatedtotheWebAnothersuchtaskisWebbasedquestionansweringThisisageneralizationofsimplewebsearch,QUESTIONANSWERINGwhereinsteadofjusttypingkeywordsausermightaskcompletequestions,rangingfromeasytohard,likethefollowing:DRAFTChapterIntroduction•Whatdoes“divergent”mean•WhatyearwasAbrahamLincolnborn•HowmanystateswereintheUnitedStatesthatyear•HowmuchChinesesilkwasexportedtoEnglandbytheendofthethcentury•WhatdoscientiststhinkabouttheethicsofhumancloningSomeofthese,suchasdefinitionquestions,orsimplefactoidquestionslikedatesandlocations,canalreadybeansweredbysearchenginesButansweringmorecomplicatedquestionsmightrequireextractinginformationthatisembeddedinothertextonaWebpage,ordoinginference(drawingconclusionsbasedonknownfacts),orsynthesizingandsummarizinginformationfrommultiplesourcesorwebpagesInthistextwestudythevariouscomponentsthatmakeupmodernunderstandingsystemsofthiskind,includinginformationextraction,wordsensedisambiguation,andsoonAlthoughthesubfieldsandproblemswe’vedescribedaboveareallveryfarfromcompletelysolved,theseareallveryactiveresearchareasandmanytechnologiesarealreadyavailablecommerciallyIntherestofthischapterwebrieflysummarizethekindsofknowledgethatisnecessaryforthesetasks(andotherslikespellcorrection,grammarchecking,andsoon),aswellasthemathematicalmodelsthatwillbeintroducedthroughoutthebookKNOWLEDGEINSPEECHANDLANGUAGEPROCESSINGWhatdistinguisheslanguageprocessingapplicationsfromotherdataprocessingsystemsistheiruseofknowledgeoflanguageConsidertheUnixwcprogram,whichisusedtocountthetotalnumberofbytes,words,andlinesinatextfileWhenusedtocountbytesandlines,wcisanordinarydataprocessingapplicationHowever,whenitisusedtocountthewordsinafileitrequiresknowledgeaboutwhatitmeanstobeaword,andthusbecomesalanguageprocessingsystemOfcourse,wcisanextremelysimplesystemwithanextremelylimitedandimpoverishedknowledgeoflanguageSophisticatedconversationalagentslikeHAL,ormachinetranslationsystems,orrobustquestionansweringsystems,requiremuchbroaderanddeeperknowledgeoflanguageTogetafeelingforthescopeandkindofrequiredknowledge,considersomeofwhatHALwouldneedtoknowtoengageinthedialoguethatbeginsthischapter,orforaquestionansweringsystemtoansweroneofthequestionsaboveHALmustbeabletorecognizewordsfromanaudiosignalandtogenerateanaudiosignalfromasequenceofwordsThesetasksofspeechrecognitionandspeechsynthesistasksrequireknowledgeaboutphoneticsandphonologyhowwordsarepronouncedintermsofsequencesofsounds,andhoweachofthesesoundsisrealizedacousticallyNotealsothatunlikeStarTrek’sCommanderData,HALiscapableofproducingcontractionslikeI’mandcan’tProducingandrecognizingtheseandothervariationsofindividualwords(eg,recognizingthatdoorsisplural)requiresknowledgeaboutmorphology,thewaywordsbreakdownintocomponentpartsthatcarrymeaningslikesingularversuspluralDRAFTSectionKnowledgeinSpeechandLanguageProcessingMovingbeyondindividualwords,HALmustusestructuralknowledgetoproperlystringtogetherthewordsthatconstituteitsresponseForexample,HALmustknowthatthefollowingsequenceofwordswillnotmakesensetoDave,despitethefactthatitcontainspreciselythesamesetofwordsastheoriginalI’mIdo,sorrythatafraidDaveI’mcan’tTheknowledgeneededtoorderandgroupwordstogethercomesundertheheadingofsyntaxNowconsideraquestionansweringsystemdealingwiththefollowingquestion:•HowmuchChinesesilkwasexportedtoWesternEuropebytheendofthethcenturyInordertoanswerthisquestionweneedtoknowsomethingaboutlexicalsemantics,themeaningofallthewords(export,orsilk)aswellascompositionalsemantics(whatexactlyconstitutesWesternEuropeasopposedtoEasternorSouthernEurope,whatdoesendmeanwhencombinedwiththethcenturyWealsoneedtoknowsomethingabouttherelationshipofthewordstothesyntacticstructureForexampleweneedtoknowthatbytheendofthethcenturyisatemporalendpoint,andnotadescriptionoftheagent,asthebyphraseisinthefollowingsentence:•HowmuchChinesesilkwasexportedtoWesternEuropebysouthernmerchantsWealsoneedthekindofknowledgethatletsHALdeterminethatDave’sutteranceisarequestforaction,asopposedtoasimplestatementabouttheworldoraquestionaboutthedoor,asinthefollowingvariationsofhisoriginalstatementREQUEST:HAL,openthepodbaydoorSTATEMENT:HAL,thepodbaydoorisopenINFORMATIONQUESTION:HAL,isthepodbaydooropenNext,despiteitsbadbehavior,HALknowsenoughtobepolitetoDaveItcould,forexample,havesimplyrepliedNoorNo,Iwon’topenthedoorInstead,itfirstembellishesitsresponsewiththephrasesI’msorryandI’mafraid,andthenonlyindirectlysignalsitsrefusalbysayingIcan’t,ratherthanthemoredirect(andtruthful)Iwon’tThisknowledgeaboutthekindofactionsthatspeakersintendbytheiruseofsentencesispragmaticordialogueknowledgeAnotherkindofpragmaticordiscourseknowledgeisrequiredtoanswerthequestion•HowmanystateswereintheUnitedStatesthatyearWhatyearisthatyearInordertointerpretwordslikethatyearaquestionansweringsystemneedtoexaminethetheearlierquestionsthatwereaskedinthiscasethepreviousquestiontalkedabouttheyearthatLincolnwasbornThusthistaskofcoreferenceresolutionmakesuseofknowledgeabouthowwordslikethatorpronounslikeitorsherefertopreviouspartsofthediscourseTosummarize,engagingincomplexlanguagebehaviorrequiresvariouskindsofknowledgeoflanguage:ForthoseunfamiliarwithHAL,itisneithersorrynorafraid,norisitincapableofopeningthedoorIthassimplydecidedinafitofparanoiatokillitscrewDRAFTChapterIntroduction•PhoneticsandPhonologyknowledgeaboutlinguisticsounds•Morphologyknowledgeofthemeaningfulcomponentsofwords•Syntaxknowledgeofthestructuralrelationshipsbetweenwords•Semanticsknowledgeofmeaning•Pragmaticsknowledgeoftherelationshipofmeaningtothegoalsandintentionsofthespeaker•DiscourseknowledgeaboutlinguisticunitslargerthanasingleutteranceAMBIGUITYAperhapssurprisingfactaboutthesecategoriesoflinguisticknowledgeisthatmosttasksinspeechandlanguageprocessingcanbeviewedasresolvingambiguityatoneAMBIGUITYoftheselevelsWesaysomeinputisambiguousiftherearemultiplealternativelinAMBIGUOUSguisticstructuresthatcanbebuiltforitConsiderthespokensentenceImadeherduckHere’sfivedifferentmeaningsthissentencecouldhave(seeifyoucanthinkofsomemore),eachofwhichexemplifiesanambiguityatsomelevel:()Icookedwaterfowlforher()Icookedwaterfowlbelongingtoher()Icreatedthe(plaster)ducksheowns()Icausedhertoquicklylowerherheadorbody()IwavedmymagicwandandturnedherintoundifferentiatedwaterfowlThesedifferentmeaningsarecausedbyanumberofambiguitiesFirst,thewordsduckandheraremorphologicallyorsyntacticallyambiguousintheirpartofspeechDuckcanbeaverboranoun,whilehercanbeadativepronounorapossessivepronounSecond,thewordmakeissemanticallyambiguousitcanmeancreateorcookFinally,theverbmakeissyntacticallyambiguousinadifferentwayMakecanbetransitive,thatis,takingasingledirectobject(),oritcanbeditransitive,thatis,takingtwoobjects(),meaningthatthefirstobject(her)gotmadeintothesecondobject(duck)Finally,makecantakeadirectobjectandaverb(),meaningthattheobject(her)gotcausedtoperformtheverbalaction(duck)Furthermore,inaspokensentence,thereisanevendeeperkindofambiguitythefirstwordcouldhavebeeneyeorthesecondwordmaidWewilloftenintroducethemodelsandalgorithmswepresentthroughoutthebookaswaystoresolveordisambiguatetheseambiguitiesForexampledecidingwhetherduckisaverboranouncanbesolvedbypartofspeechtaggingDecidingwhethermakemeans“create”or“cook”canbesolvedbywordsensedisambiguationResolutionofpartofspeechandwordsenseambiguitiesaretwoimportantkindsoflexicaldisambiguationAwidevarietyoftaskscanbeframedaslexicaldisambiguationproblemsForexample,atexttospeechsynthesissystemreadingthewordleadneedstodecidewhetheritshouldbepronouncedasinleadpipeorasinleadmeonBycontrast,decidingwhetherherandduckarepartofthesameentity(asin()or())oraredifferententity(asin())isanexampleofsyntacticdisambiguationandcanDRAFTSectionModelsandAlgorithmsbeaddressedbyprobabilisticparsingAmbiguitiesthatdon’tariseinthisparticularexample(likewhetheragivensentenceisastatementoraquestion)willalsoberesolved,forexamplebyspeechactinterpretationMODELSANDALGORITHMSOneofthekeyinsightsofthelastyearsofresearchinlanguageprocessingisthatthevariouskindsofknowledgedescribedinthelastsectionscanbecapturedthroughtheuseofasmallnumberofformalmodels,ortheoriesFortunately,thesemodelsandtheoriesarealldrawnfromthestandardtoolkitsofcomputerscience,mathematics,andlinguisticsandshouldbegenerallyfamiliartothosetrainedinthosefieldsAmongthemostimportantmodelsarestatemachines,rulesystems,logic,probabilisticmodels,andvectorspacemodelsThesemodels,inturn,lendthemselvestoasmallnumberofalgorithms,amongthemostimportantofwhicharestatespacesearchalgorithmssuchasdynamicprogramming,andmachinelearningalgorithmssuchasclassifiersandEMandotherlearningalgorithmsIntheirsimplestformulation,statemachinesareformalmodelsthatconsistofstates,transitionsamongstates,andaninputrepresentationSomeofthevariationsofthisbasicmodelthatwewillconsideraredeterministicandnondeterministicfinitestateautomataandfinitestatetransducersCloselyrelatedtothesemodelsaretheirdeclarativecounterparts:formalrulesystemsAmongthemoreimportantoneswewillconsiderareregulargrammarsandregularrelations,contextfreegrammars,featureaugmentedgrammars,aswellasprobabilisticvariantsofthemallStatemachinesandformalrulesystemsarethemaintoolsusedwhendealingwithknowledgeofphonology,morphology,andsyntaxThethirdmodelthatplaysacriticalroleincapturingknowledgeoflanguageislogicWewilldiscussfirstorderlogic,alsoknownasthepredicatecalculus,aswellassuchrelatedformalismsaslambdacalculus,featurestructures,andsemanticprimitivesTheselogicalrepresentationshavetraditionallybeenusedformodelingsemanticsandpragmatics,althoughmorerecentworkhasfocusedonmorerobusttechniquesdrawnfromnonlogicallexicalsemanticsProbabilisticmodelsarecrucialforcapturingeverykindoflinguisticknowledgeEachoftheothermodels(statemachines,formalrulesystems,andlogic)canbeaugmentedwithprobabilitiesForexamplethestatemachinecanbeaugmentedwithprobabilitiestobecometheweightedautomatonorMarkovmodelWewillspendasignificantamountoftimeonhiddenMarkovmodelsorHMMs,whichareusedeverywhereinthefield,inpartofspeechtagging,speechrecognition,dialogueunderstanding,texttospeech,andmachinetranslationThekeyadvantageofprobabilisticmodelsistheirabilitytotosolvethemanykindsofambiguityproblemsthatwediscussedearlieralmostanyspeechandlanguageprocessingproblemcanberecastas:“givenNchoicesforsomeambiguousinput,choosethemostprobableone”Finally,vectorspacemodels,basedonlinearalgebra,underlieinformationretrievalandmanytreatmentsofwordmeaningsProcessinglanguageusinganyofthesemodelstypicallyinvolvesasearchthroughDRAFTChapterIntroductionaspaceofstatesrepresentinghypothesesaboutaninputInspeechrecognition,wesearchthroughaspaceofphonesequencesforthecorrectwordInparsing,wesearchthroughaspaceoftreesforthesyntacticparseofaninputsentenceInmachinetranslation,wesearchthroughaspaceoftranslationhypothesesforthecorrecttranslationofasentenceintoanotherlanguageFornonprobabilistictasks,suchasstatemachines,weusewellknowngraphalgorithmssuchasdepthfirstsearchForprobabilistictasks,weuseheuristicvariantssuchasbestfirstandA*search,andrelyondynamicprogrammingalgorithmsforcomputationaltractabilityFormanylanguagetasks,werelyonmachinelearningtoolslikeclassifiersandsequencemodelsClassifierslikedecisiontrees,supportvectormachines,GaussianMixtureModelsandlogisticregressionareverycommonlyusedAhiddenMarkovmodelisonekindofsequencemodelotherareMaximumEntropyMarkovModelsorConditionalRandomFieldsAnothertoolthatisrelatedtomachinelearningismethodologicaltheuseofdistincttrainingandtestsets,statisticaltechniqueslikecrossvalidation,andcarefulevaluationofourtrainedsystemsLANGUAGE,THOUGHT,ANDUNDERSTANDINGTomany,theabilityofcomputerstoprocesslanguageasskillfullyaswehumansdowillsignalthearrivaloftrulyintelligentmachinesThebasisofthisbeliefisthefactthattheeffectiveuseoflanguageisintertwinedwithourgeneralcognitiveabilitiesAmongthefirsttoconsiderthecomputationalimplicationsofthisintimateconnectionwasAlanTuring()Inthisfamouspaper,TuringintroducedwhathascometobeknownastheTuringTestTuringbeganwiththethesisthatthequestionofwhatitTURINGTESTwouldmeanforamachinetothinkwasessentiallyunanswerableduetotheinherentimprecisioninthetermsmachineandthinkInstead,hesuggestedanempiricaltest,agame,inwhichacomputer’suseoflanguagewouldformthebasisfordeterminingifitcouldthinkIfthemachinecouldwinthegameitwouldbejudgedintelligentInTuring’sgame,therearethreeparticipants:twopeopleandacomputerOneofthepeopleisacontestantandplaystheroleofaninterrogatorTowin,theinterrogatormustdeterminewhichoftheothertwoparticipantsisthemachinebyaskingaseriesofquestionsviaateletypeThetaskofthemachineistofooltheinterrogatorintobelievingitisapersonbyrespondingasapersonwouldtotheinterrogator’squestionsThetaskofthesecondhumanparticipantistoconvincetheinterrogatorthattheotherparticipantisthemachine,andthattheyarehumanThefollowinginteractionfromTuring’spaperillustratesthekindofinteractionshehadinmindClearlyaconvincingsimulationofhumanitydoesn’trequirebeinganexpertinalldomains:Q:PleasewritemeasonnetonthetopicoftheForthBridgeA:CountmeoutonthisoneInevercouldwritepoetryQ:AddtoA:(Pauseaboutsecondsandthengiveansweras)TuringpredictedthatroughlybytheendofthetwentiethcenturyamachinewithDRAFTSectionLanguage,Thought,andUnderstandinggigabytesofmemorywouldhavearoundachanceoffoolingahumaninterrogatorafterminutesofquestioningGiventhefactthatyoucanfoolsomeofthepeopleallthetime,itisnotclearhowrigorousthisparticularstandardisNevertheless,thecriticalissueforTuringwasthatusinglanguageashumansdoissufficient,byitself,asanoperationaltestforintelligenceSurprisingly,manyoftheissuesraisedbyTuringbecamerelevantasearlyasinthecontextoftheELIZAprogram(Weizenbaum,)ELIZAwasanearlynaturallanguageprocessingsystemcapableofcarryingonalimitedformofconversationwithauserConsiderthefollowingsessionwithaversionofELIZAthatimitatedtheresponsesofaRogerianpsychotherapistUser:YouarelikemyfatherinsomewaysELIZA:WHATRESEMBLANCEDOYOUSEEUser:YouarenotveryaggressivebutIthinkyoudon’twantmetonoticethatELIZA:WHATMAKESYOUTHINKIAMNOTAGGRESSIVEUser:Youdon’targuewithmeELIZA:WHYDOYOUTHINKIDON’TARGUEWITHYOUUser:YouareafraidofmeELIZA:DOESITPLEASEYOUTOBELIEVEIAMAFRAIDOFYOUAswewilldiscussinmoredetailinCh,ELIZAisaremarkablysimpleprogramthatmakesuseofpatternmatchingtoprocesstheinputandtranslateitintosuitableoutputsThesuccessofthissimpletechniqueinthisdomainisduetothefactthatELIZAdoesn’tactuallyneedtoknowanythingtomimicaRogerianpsychotherapistAsWeizenbaumnotes,thisisoneofthefewdialo

职业精品

设备采购招标文件范本.doc

行政公文写作规范与技巧培训课件.ppt

新入职员工培训经典教材---办公室公文管理讲座.ppt

产品销售合作协议书.doc

用户评论

0/200
    暂无评论
上传我的资料

精彩专题

相关资料换一换

资料评价:

/ 1037
所需积分:0 立即下载

意见
反馈

返回
顶部