加入VIP
  • 专属下载特权
  • 现金文档折扣购买
  • VIP免费专区
  • 千万文档免费下载

上传资料

关闭

关闭

关闭

封号提示

内容

首页 SAS_-_Getting.Started.With.SAS.9.1.Text.Miner

SAS_-_Getting.Started.With.SAS.9.1.Text.Miner.pdf

SAS_-_Getting.Started.With.SAS.…

王永宏
2013-04-28 0人阅读 举报 0 0 暂无简介

简介:本文档为《SAS_-_Getting.Started.With.SAS.9.1.Text.Minerpdf》,可适用于高等教育领域

GettingStartedwithSAS®TextMinerThecorrectbibliographiccitationforthismanualisasfollows:SASInstituteIncGettingStartedwithSAS®TextMinerCary,NC:SASInstituteIncGettingStartedwithSAS®TextMinerCopyright©,SASInstituteInc,Cary,NC,USAISBNAllrightsreservedProducedintheUnitedStatesofAmericaNopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmitted,inanyformorbyanymeans,electronic,mechanical,photocopying,orotherwise,withoutthepriorwrittenpermissionofthepublisher,SASInstituteIncUSGovernmentRestrictedRightsNoticeUse,duplication,ordisclosureofthissoftwareandrelateddocumentationbytheUSgovernmentissubjecttotheAgreementwithSASInstituteandtherestrictionssetforthinFAR–CommercialComputerSoftwareRestrictedRights(June)SASInstituteInc,SASCampusDrive,Cary,NorthCarolinastprinting,JanuarySASPublishingprovidesacompleteselectionofbooksandelectronicproductstohelpcustomersuseSASsoftwaretoitsfullestpotentialFormoreinformationaboutourebooks,elearningproducts,CDs,andhardcopybooks,visittheSASPublishingWebsiteatsupportsascompubsorcallSAS®andallotherSASInstituteIncproductorservicenamesareregisteredtrademarksortrademarksofSASInstituteIncintheUSAandothercountries®indicatesUSAregistrationOtherbrandandproductnamesareregisteredtrademarksortrademarksoftheirrespectivecompaniesContentsChapterIntroductiontoTextMiningandSASTextMinerWhatIsTextMiningWhatIsSASTextMinerTheTextMiningProcessTipsforTextMiningChapterWalkingthroughSASTextMinerwithanExampleExampleOverviewCreatingaNewProjectCreatingtheProcessFlowDiagramViewingtheResultsChapterPreparingtoAnalyzeDocuments:FilePreprocessingWhatTextSourcesCanSASTextMinerUseUsingthetmfilterMacrotoConvertTextFilestoDataSetsChapterBeginningtheAnalysis:TextParsingWhatIsTextParsingExamplesofParsedTermsUsingPartofSpeechandEntityCategoriesStopandStartListsHandlingEquivalentTerms:StemsandSynonymsCustomizingEquivalentTermsChapterProducingaQuantitativeRepresentationoftheDocumentCollectionFromTexttoNumbersWeightingFunctionsTransformation(DimensionReduction)ChapterExplorationandPredictionOverviewHierarchicalClusteringExpectationMaximizationClusteringClassificationandPredictionChapterExamplesExample:ClassifyingNewsArticlesExample:ScoringNewDocumentsAppendixReferencesReferencesAppendixRecommendedReadingRecommendedReadingIndexivCHAPTERIntroductiontoTextMiningandSASTextMinerWhatIsTextMiningWhatIsSASTextMinerTheTextMiningProcessTipsforTextMiningWhatIsTextMiningThepurposeoftextminingistohelpyouunderstandwhatthetexttellsyouwithouthavingtoreadeverywordTextminingapplicationsfallintotwoareas:exploringthetextualdataforitscontent,andthenusingtheinformationtoimprovetheexistingprocessesBothareimportant,andcanbereferredtoasdescriptiveminingandpredictiveminingDescriptivemininginvolvesdiscoveringthethemesandconceptsthatexistinatextualcollectionForexample,manycompaniescollectcustomers’commentsfromsourcesthatincludetheWeb,email,andacallcenterMiningthetextualcommentsincludesprovidingdetailedinformationabouttheterms,phrases,andotherentitiesinthetextualcollection,clusteringthedocumentsintomeaningfulgroups,andreportingtheconceptsthatarediscoveredintheclustersTheresultenablesyoutobetterunderstandthecollectionPredictivemininginvolvesclassifyingthedocumentsintocategoriesandusingtheinformationthatisimplicitinthetextfordecisionmakingYoumightwanttoidentifythecustomerswhoaskstandardquestionssothattheyreceiveanautomatedanswerOryoumightwanttopredictwhetheracustomerislikelytobuyagain,orevenifyoushouldspendmoreeffortinkeepinghimorherasacustomerIndataminingterminology,thisisknownaspredictivemodelingPredictivemodelinginvolvesexaminingpastdatatopredictfutureresultsYoumighthaveadatasetthatcontainsinformationaboutpastbuyingbehaviors,alongwithcommentsthatthecustomersmadeYoucanthenbuildapredictivemodelthatcanbeusedtoscorenewcustomers:thatis,inthepastthesecustomersdidthis,soifnewcustomershavesimilarcomments,theyarelikelytodothesamethingForexample,ifyouarearesearcherforapharmaceuticalcompany,youknowthathandcodingadversereactionsfromdoctors’reportsinaclinicalstudyisalaborious,errorpronejobInstead,youcouldtrainamodelbyusingallyourhistoricaltextualdata,notingwhichdoctors’reportscorrespondtowhichadversereactionsWhenthemodelisconstructed,processingthetextualdatacanbedoneautomaticallybyscoringnewrecordsthatcomeinYouwouldjusthavetoexaminethe“hardtoclassify”examples,andletthecomputerhandlealltherestBothoftheaboveaspectsoftextminingsharesomeofthesamerequirementsNamely,textdocumentsthathumanbeingscaneasilyunderstandmustfirstberepresentedinaformthatcanbeminedTherawdocumentsneedprocessingbeforethepatternsandrelationshipsthattheycontaincanbediscoveredAlthoughtheWhatIsSASTextMinerChapterhumanmindcomprehendschapters,paragraphs,andsentences,computersrequirestructured(quantitativeorqualitative)dataAsaresult,anunstructureddocumentmustbeconvertedtoastructuredformbeforeitcanbeminedWhatIsSASTextMinerSASTextMinercontainsasophisticatedTextMinernodethatcanbeembeddedintoaSASEnterpriseMinerprocessflowdiagramThenodeanalyzestextthatexistsinaSASdataset,thatisinanexternaldatabasethroughSASACCESS,orasfilesinafilesystemTheTextMinernodeencompassestheparsingandexplorationaspectoftextmining,andsetsupthedataforpredictiveminingandfurtherexplorationusingtherestoftheEnterpriseMinernodesThisenablesyoutoanalyzethenewstructuredinformationthatyouhaveacquiredfromthetexthoweveryouwant,combiningitwithotherstructureddataasdesiredThenodeishighlycustomizableandallowsavarietyofparsingoptionsItispossibletoparsedocumentsfordetailedinformationabouttheterms,phrases,andotherentitiesinthecollectionYoucanalsoclusterthedocumentsintomeaningfulgroupsandreporttheconceptsthatyoudiscoverintheclustersAllofthisisdoneinanenvironmentthatenablesyoutointeractwiththecollectionSorting,searching,filtering(subsetting),andfindingsimilartermsordocumentsallenhancetheexplorationprocessTheTextMinernode’sextensiveparsingcapabilitiesinclude�stemming�automaticrecognitionofmultiplewordterms�normalizationofvariousentitiessuchasdates,currency,percent,andyear�partofspeechtagging�extractionofentitiessuchasorganizations,products,socialsecuritynumbers,time,titles,andmore�supportforsynonymsAsecondarytoolthatTextMinerusesisaSASmacrothatiscalledtmfilterThismacroaccomplishesatextpreprocessingstepandallowsSASdatasetstobecreatedfromdocumentsthatresideinyourfilesystemorontheWebpagesThesedocumentscanexistinanumberofproprietaryformatsSASTextMinerispartofEnterpriseMinerEnterpriseMinerprovidesarichsetofdataminingtoolsthatfacilitatethepredictionaspectoftextminingTheintegrationofTextMinerwithinEnterpriseMinerenablesthecombiningoftextualdatawithtraditionaldataminingvariablesWithallofthisfunctionality,SASTextMinerbecomesaveryflexibletoolthatcanbeusedtosolveavarietyofproblemsBelowaresomeexamplesoftasksthatcanbeaccomplished�filteringemail�groupingdocumentsbytopicintopredefinedcategories�routingnewsitems�clusteringanalysisofresearchpapersinadatabase�clusteringanalysisofsurveydata�clusteringanalysisofcustomercomplaintsandcomments�predictingstockmarketpricesfrombusinessnewsannouncements�predictingcustomersatisfactionfromcustomercomments�predictingcost,basedoncallcenterlogsIntroductiontoTextMiningandSASTextMinerTipsforTextMiningTheTextMiningProcessWhetheryouintendtousetextualdatafordescriptivepurposes,predictivepurposes,orboth,thesameprocessingstepstakeplace,asshowninthefollowingtableTableTheGeneralOrderforTextMiningActionResultFilepreprocessingCreatesasingleSASdatasetfromyourdocumentcollectionTheSASdatasetwillbeusedasinputfortheTextMiningnode(ThisisanoptionalstepDothisifthetextisnotalreadyinaSASdatasetorexternaldatabase)TextparsingDecomposestextualdataandgeneratesaquantitativerepresentationsuitablefordataminingpurposesTransformation(dimensionreduction)TransformsthequantitativerepresentationintoacompactandinformativeformatDocumentanalysisPerformsclusteringorclassificationofthedocumentcollectionFinally,therulesforclusteringorpredictionscanbeusedtoscoreanewcollectionofdocumentsatanytimeYoumightormightnotincludeallofthesestepsinyouranalysis,anditmightbenecessarytotryadifferentcombinationoftextparsingoptionsbeforeyouaresatisfiedwiththeresultsTipsforTextMiningUsingtheTextMinernodetoprocessaverylargecollectionofdocumentscanrequirealotofcomputingtimeandresourcesIfyouhavelimitedresources,itmightbenecessarytotakeoneormoreofthefollowingactions:�useasampleofthedocumentcollection�deselectingsomeoptionsintheTextMinerSettingswindow,suchasstemmingandentityextraction,andthesearchforwordsthatoccurinasingledocument�reducethenumberofSVDdimensionsorrolluptermsIfyouhavememoryproblemswhenyouusetheSVDapproach,youcanrollupacertainnumberofterms,anddroptheremainingtermsIfyoudothatandperformSVDatthesametime,onlytherolleduptermsareusedinthecalculationofSVDThiswayyoucanreducethesizeoftheproblem�limitparsingtohighinformationwordsbydeselectingthepartsofspeechotherthannouns,propernouns,noungroups,andverbsCHAPTERWalkingthroughSASTextMinerwithanExampleExampleOverviewCreatingaNewProjectCreatingtheProcessFlowDiagramViewingtheResultsExampleOverviewThefollowingexampleisdesignedtohelpyoustartbuildingaprocessflowdiagramSeveralkeyfeaturesaredemonstrated:�specifyingtheinputdataset�configuringtheTextMinernodesettingsfortextparsingandfordimensionreductionofthetermbydocumentfrequencymatrix�clusteringthedocumentsSupposethatyouworkwithalargecollectionofSUGIpapersandthegoalsaretounderstandwhatthesepapersareaboutandtoidentifythepapersthatarerelatedtodatawarehousingissuesApossibleapproachistoparsethedocumentsintoterms,andthengroupthedocumentsbasedontheparsedtermthroughaclusteringanalysisTheSAMPSIOABSTRACTdatasetcontainsinformationabout,papersthatwerepreparedformeetingsoftheSASUsersGroupInternationalfromthrough(SUGIthrough)Thefollowingdisplayshowsapartialprofileofthedataset:Thedatasetcontainstwovariables:�TITLEisthetitleoftheSUGIpaper�TEXTcontainstheabstractoftheSUGIpaperCreatingaNewProjectChapterCreatingaNewProjectTostartEnterpriseMiner,youmustfirsthaveasessionofSASrunningYouopenEnterpriseMinerbytypingminerinthecommandwindowintheupperleftcornerofanopenSASsessionSelectNewProjectfromtheEnterpriseMinermainmenuTheCreateNewProjectwindowopensTypetheprojectnameintheNameentryfieldAprojectlocationwillbesuggestedYoucantypeadifferentlocationforstoringtheprojectintheLocationentryfield,orclickBrowsetosearchforalocationbyusingaGUIinterfaceAftersettingyourprojectlocation,clickCreateEnterpriseMinercreatestheprojectthatcontainsadefaultdiagramlabeled“Untitled”Torenamethediagram,rightclickthediagramiconorlabelandselectRenameTypeanewdiagramnameWalkingthroughSASTextMinerwithanExampleCreatingtheProcessFlowDiagramCreatingtheProcessFlowDiagramFollowthesestepstocreatetheProcessFlowDiagram:InputDataSourceNodeAddanInputDataSourcenodetothediagramworkspaceOpenthenodeandsetSAMPSIOABSTRACTasthesourcedatasetSelecttheVariablestabandassignaroleofinputtothevariableTEXTTextMinerNodeAddaTextMinernodetothediagramworkspaceandconnectittotheInputDataSourcenodeDoubleclicktheTextMinernodetoopentheTextMinerSettingswindowSelecttheParsetaboftheTextMinerSettingswindow�EnsurethatVariabletobeparsedandLanguagearesettoTEXTandEnglish,respectively�UsetheIdentifyasTermsareatospecifytheitemsthatareconsideredastermsintheanalysisInthisexample,wordsthatoccurinasingledocument,punctuationmarks,andnumbersareexcludedEnsurethatonlytheSamewordasdifferentpartofspeech,Stemmedwordsasrootform,Noungroups,andEntities:Names,Address,etccheckboxesareselected�UsetheInitialwordlistsareatospecifythedatasetsforastoporstartlist,andasynonymlistInthisexample,useSAMPSIOSUGISTOPasthestoplistAllthewordsthatareinthestoplistdatasetareexcludedfromtheanalysisViewingtheResultsChapterSelecttheTransformtaboftheTextMinerSettingswindow�SelecttheGenerateSVDdimensionswhenrunningnodecheckboxandensurethatthevalueofMaximumDimensionsissetto�UsetheWeightareatospecifytheweightingmethodsInthisexample,usethedefaultsettingsClickOKtosavethechangesClicktheRuntoolicontoruntheTextMinernodeViewingtheResultsAfterthenodeisrunsuccessfully,opentheTextMinerResultsBrowserTheTextMinerResultsBrowserdisplaystwosections:�Documentstable�TermstableTheDocumentstabledisplaysinformationaboutthedocumentcollectionInthisexample,theDocumentstabledisplaystheabstractandthetitleofSUGIpapersTheTermstabledisplaysinformationaboutallthetermsthatarefoundinthedocumentcollectionUsetheDisplaydroppedtermsandDisplaykepttermscheckboxestospecifythetypeoftermsthataredisplayedintheTermstableYoualsocansorttheTermstablebyclickingthecolumnheadingInthisexample,cleartheselectionofDisplaydroppedtermsandsortthetablebytheTermcolumnFollowingisanexampledisplayoftheTextMinerResultsBrowserIntheTermstable,atermthathasaplussignhasagroupofequivalenttermsthatyoucanviewSelectthetermability,andfromthemainmenuselectEditViewandEditEquivalentTermsAwindowopenstodisplayalistoftermsthatareequivalenttoabilityWalkingthroughSASTextMinerwithanExampleViewingtheResultsIfyouwanttodropanyoftheequivalentterms,selectthetermandclickOKInthisexample,clickCanceltoreturntotheTextMinerResultsBrowserExaminingtheparsedtermsenablesyoutofindthosetermsthatshouldbetreatedthesameInthisexample,scrolldowntheTermstable,andyoufindthatthetermscalculateandcomputehavesimilarmeaningandshouldbetreatedequivalentlyThefrequenciesofthesetermsinthedocumentcollectionareand,respectivelyYoucanpressCTRLandthenclicktohighlightthesetermsandselectfromthemainmenuEditTreatasEquivalentTermsTheCreateEquivalentTermswindowpromptsyoutoselecttherepresentativetermSelectthetermcomputeandclickOKAsaresult,thetermcomputeanditsrowmightbehighlightedanddisplayedatthetopoftheTermstableThefrequencyofcomputeis,whichisthesumofandNotethattheweightofthetermcomputeisnotreallyzeroTheweightisupdatedwhenyourequestanactionthatdependsonit,suchasperformingaclusteranalysisViewingtheResultsChapterIfthetermcomputeisnotdisplayedatthetopofthetable,rightclickintheTermstableandselectFindTypecomputeintheFindTERMcontainingentryfieldandclickOKThetermsthathaveakeepstatusofNarenotusedinthecalculationofSVDdimensionsAfterexaminingtheTermstable,youmightwanttoeliminatesometermsfromtheanalysissuchas“sasinstitute”and“sasinstituteinc”becausealltheSUGIpapersareSASrelatedSelectthesetermsintheTermstableandselectfromthemainmenuEditToggleKeepStatustochangethekeepstatusofthesetermsfromYtoNNow,youhavefinishedmakingchangestothetermsThenextstepistogroupthedocumentsbyapplyingaclusteringanalysisSelectfromthemainmenuToolsClustertoopentheClusterSettingswindowIntheClusterSettingswindow,youspecifytheclusteringmethodtouse,eitherhierarchicalclusteringorexpectationmaximizationclustering,andtheinputsfortheclusteringanalysis,eitherSVDdimensionsorrolluptermsInthisexample,usetheSVDdimensionsasinputsfortheexpectationmaximizationclusteringanalysis,selectExactratherthanMaximum,andchangethenumberofclusterstoAlso,changethenumberoftermstodescribeclusterstoClickOKtogeneratetheclustersTheClusterstabledisplaysthedescriptivetermsforeachoftheeightclustersAlso,theDocumentstabledisplaysanothervariable,ClusterID,whichrepresentstheclusterthatadocumentisgroupedintoWalkingthroughSASTextMinerwithanExampleViewingtheResultsInthisexample,allthedocumentsaregroupedintotenclustersByexaminingthedescriptivetermsineachcluster,youseethefollowingclustersinthecollectionofSUGIabstractsTableClustersExtractedfromtheAbstractsDescriptiveTermsClusterstatement,macro,format,option,macro,program,code,set,table,step,variable,number,programer,report,writeprogramminglanguage,sasaf,scl,gui,screen,entry,frame,control,developer,object,run,build,interface,development,environmentSASAFissueswarehouse,administrator,saswarehouse,warehouse,olap,warehousing,warehouse,datawarehouse,enterprise,support,build,support,management,product,businessdatawarehousingissuesdelivery,technology,organization,year,business,management,development,solution,’s,problem,develop,good,process,provide,reportspuriousclustersasintrnet,output,html,web,output,graph,output,delivery,page,create,sasgraph,web,browser,ods,versionoutputissuesbusiness,customer,decision,relationship,process,problem,solution,between,management,approach,suchas,set,usedto,technique,throughCRMissueswindows,server,feature,nt,client,version,java,server,performance,platform,window

用户评价(0)

关闭

新课改视野下建构高中语文教学实验成果报告(32KB)

抱歉,积分不足下载失败,请稍后再试!

提示

试读已结束,如需要继续阅读或者下载,敬请购买!

文档小程序码

使用微信“扫一扫”扫码寻找文档

1

打开微信

2

扫描小程序码

3

发布寻找信息

4

等待寻找结果

我知道了
评分:

/12

SAS_-_Getting.Started.With.SAS.9.1.Text.Miner

仅供在线阅读

VIP

在线
客服

免费
邮箱

爱问共享资料服务号

扫描关注领取更多福利