关闭

关闭

关闭

封号提示

内容

首页 interactive visual clustering.pdf

interactive visual clustering.pdf

interactive visual clustering.p…

上传者: reginald_zhong 2012-07-26 评分 0 0 0 0 0 0 暂无简介 简介 举报

简介:本文档为《interactive visual clusteringpdf》,可适用于IT/计算机领域,主题内容包含FULLPAPERInternationalJournalofRecentTrendsinEngineering,Vol,No,NovemberEV符等。

FULLPAPERInternationalJournalofRecentTrendsinEngineering,Vol,No,NovemberEVISTA–InteractiveVisualClusteringSystemKThangavel,PAlagambigaiDepartmentofComputerScience,PeriyarUniversity,Salem,Tamilnadu,IndiaEmail:drktveluyahoocomDepartmentofComputerApplications,EaswariEngineeringCollege,Chennai,Tamilnadu,IndiaEmail:alagambigaiyahoocoinAbstractDuetotheenormousincreaseinthedata,exploringandanalyzingthemisincreasinglyimportantbutdifficulttoachieveInformationvisualizationandvisualdataminingcanhelptodealwiththisVisualdataexplorationhasahighpotentialandmanyapplicationssuchasfrauddetectionanddataminingwilluseinformationvisualizationtechnologyforanimproveddataanalysisTheadvantageofvisualdataexplorationisthattheuserisdirectlyinvolvedinthedataminingprocessTherearealargenumberofinformationvisualizationtechniqueswhichhavebeendevelopedoverthelastdecadetosupporttheexplorationoflargedatasetsVISTAisaninteractivevisualclusterrenderingsystemwhichinviteshumanintotheclusteringprocess,buttherearesomelimitationsinidentifyingtheclusterdistributionandhumancomputerinteractionInthispaper,weproposeanEnhancedVISTA(EVISTA)whichaddressesthesedrawbacksEVISTAimprovesthevisualizationintwoways:firstitusestheweightedvectornormalizationinsteadofmaxminnormalization,whichimprovesthedatavisualizationsuchthattheusercanunderstandtheunderlyingpatternwithouthumaninterventionSecondlyitcompletelyeliminatestheuseofαtuning,whichreducesthecomplexityinvisualdistancecomputationandeasesthehumancomputerinteractioninabetterwayTheexperimentresultsshowthatEVISTAexploretheunderlyingpatternofthedataseteffectivelyandreducestheuseroperationburdengreatlyIndexTermsClustering,EVISTA,Humancomputerinteraction,Informationvisualization,VisualdataminingIINTRODUCTIONDatavisualizationisessentialforunderstandingtheconceptofmultidimensionalspacesItallowstheusertoexplorethedataindifferentwaysatdifferentlevelsofabstractiontofindtherightlevelsofdetailsThereforetechniquesaremostusefuliftheyarehighlyinteractive,permitdirectmanipulationandincludearapidresponsetimeVisualizationisdefinedbywareas“agraphicalrepresentationofdataorconcepts”whichiseitheran“internalconstructofthemind”oran“externalartifactsupportingdecisionmaking”VisualizationprovidesvaluableassistancetothehumanbyrepresentinginformationvisuallyThisassistancemaybecalledcognitivesupportVisualizationcanprovidecognitivesupportthroughanumberofmechanismssuchasgroupingrelatedinformationforeasysearchandaccess,representinglargevolumesofdatainasmallspaceandimposingstructureondataandtaskscanreducetimecomplexity,allowinginteractiveexplorationthroughmanipulationofparametervaluesVisualizationtechniquescouldenhancethecurrentknowledgeanddatadiscoverymethodsbyincreasingtheuserinvolvementintheinteractiveprocessMorerecentlytherearealotofdiscussionsonvisualizationfordataminingVisualdataminingcanbeviewedasanintegrationofdatavisualizationanddatamining,Consideringvisualizationasasupportingtechnologyindatamining,fourpossibleapproachesarestatedinThefirstapproachistheusageofvisualizationtechniquetopresenttheresultsthatareobtainedfromminingthedatainthedatabaseSecondapproachisapplyingthedataminingtechniquetovisualizationbycapturingessentialsemanticsvisuallyThethirdapproachistousevisualizationtechniquestocomplementthedataminingtechniquesThefourthapproachusesvisualizationtechniquetosteerminingprocessIngeneral,visualizationcanbeusedtoexploredatatoconfirmahypothesisortomanipulateaviewExploratoryvisualizationcreatesadynamicscenarioinwhichinteractioniscriticalTheusernotnecessarilyknowthatwhathesheislookingfor,cansearchforstructuresortrendsandisattemptingtoarriveatsomehypothesisTheconfirmatoryvisualization,inwhichthesystemparametersareoftenpredeterminedandthevisualizationtoolsareusedtoconfirmorrefutethehypothesisThemanipulativevisualizationfocusesonrefiningthevisualizationtooptimizethepresentationVisualizationhasbeencategorizedintotwomajorareas:i)scientificvisualization–whichfocusesprimarilyonphysicaldatasuchashumanbody,etcii)Informationvisualization–whichfocusesonabstractnonphysicaldatasuchastext,hierarchiesandstatisticaldataDataminingtechniquesprimarilyorientedoninformationvisualizationBothscientificvisualizationandinformationvisualizationcreategraphicalmodelsandvisualrepresentationsfromdatathatsupportdirectuserinteractionforinteractionforexploringandacquiringinsightintousefulinformationembeddedintheunderlyingdata,Eventhoughvisualizationtechniqueshaveadvantagesoverautomaticmethods,itbringsupsomespecificproblemssuchaslimitationinvisibility,visualbiasduetomappingofdatasettoDDrepresentation,easytousevisualinterfaceoperationsandreliablehumancomputerinteractionInmostofthevisualizationmethodsthehumancomputerinteractioncoststhanautomatedIngeneral,thevisualdataminingisdifferentfromscientificvisualizationandithasthefollowingcharacteristics:ƒWiderangeofusersƒWidechoiceofvisualizationtechniquesandƒImportantdialogfunctionTheusersofscientificvisualizationarescientistsandengineerswhocanendurethedifficultyinusingthesystemforlittleatmost,whereasavisualdataminingmusthavetheACADEMYPUBLISHERFULLPAPERInternationalJournalofRecentTrendsinEngineering,Vol,No,NovemberpossibilitythatthegeneralpersonsuseswidelyandsooneasilyByconsideringthisissue,thispaperproposesanovelinformationvisualizationtechniquecalledenhancedvisualclusteringsystem(EVISTA),anextensionversionofVISTAVISTA,adynamicdatavisualizationmodelwhichinvitehumanintotheclusteringprocessEventhoughVISTAprovedtobeanefficientinteractivevisualclusterrenderingsystem,itrequiresacompleteuserinteractionthroughouttheclusteringprocessWhenthenumberofdimensionincreases,thehumancomputerinteractionbecomestediousEVISTAdesignedinsuchawaytoprovideanefficientdatavisualizationsuchthattheusercanabletounderstandtheunderlyingpatternofthegivendatasetwithouthumaninterventionTherestofthepaperisorganizedasfollows:SectiondiscussesreviewsoftherelatedworksinthedomainofinformationvisualizationSectiondealswiththeEVISTASectiondiscussestheexperimentalanalysisSectionconcludesthepaperIIRELATEDWORKSVariouseffortsaremadetovisualizemultidimensionaldatasets,,,TheearlyresearchongeneralplotbaseddatavisualizationisGrandTourandProjectionPursuitThepurposeoftheGrandTourandProjectionPursuitistoguideusertofindtheinterestingprojectionsLYangutilizestheGrandTourtechniquetoshowprojectionsofdatasetsinananimationTheyprojectthedimensionstocoordinateinaDspaceHowever,whentheDspaceisshownonaDscreen,someaxesmaybeoverlappedbyotheraxes,whichmakeithardtoperformdirectinteractionsondimensionsStarcoordinateisaninteractivevisualizationmodelwhichtreatsdimensionsuniformly,inwhichdataarerepresentedcoarselyandbysimpleandmorespaceefficientpoints,whichresultinlessclutteredvisualizationforlargedatasetsInteractivevisualclustering(IVC)combinesspringembeddedgraphlayouttechniqueswithuserinteractionandconstrainedclusteringVISTA,isarecentvisualizationmodelsutilizesstarcoordinatesystemprovidesimilarmappingfunctionlikestarcoordinatesystemsTherearetwotypesofclusterrenderinginVISTAmodelTheformeroneisunguidedrenderingandthelatterisguidedrenderingIIIENHANCEDVISUALCLUSTERINGSYSTEMEnhancedVISTA(EVISTA)isaninformationvisualizationframeworksemploysimproveddatavisualizationandrevealthehiddenpatternsincomplexhighdimensionaldatasets,withouthumaninterventionTheEVISTAmodelisdesignedbasedonthestarcoordinatesStarcoordinatesystemisatraditionalmultivariatedatavisualizationtechniqueinwhichthekaxisisdefinedbyanorigin),(yxO=randkcoordinatekSSSS,,,,representsthekdimensionsinDspacesThekcoordinatesareequidistantlydistributedonthecircumferenceofthecircleC,wheretheunitvectorsareobtainedbykikikiSi,,,)),sin(),(cos(==ππr()AndtheDpoint),(yxQisobtainedby,}{===,sin')(cos')(,ykiykcxkixkcQyQxkikiiiππ()iiiiwtxwtx='()whereixrepresentsthegivendataobject,'ixrepresentsthenormalizeddatavaluebasedonweightedvectoriwtand==njiijxwt,,,()EVISTAemploysthedesignofVISTAvisualclusterrenderingproposedbyKeKeChenandLLiuprovidesanintuitivewaytovisualizeclusterswithinteractivefeedbackstoencouragedomainexpertstoparticipateintheclusteringrevisionandclustervalidationprocessItallowstheusertointeractivelyobservepotentialclustersinaseriesofcontinuouslychangingvisualizationsthroughαMoreimportantly,itcanincludealgorithmicclusteringresultsandserveasaneffectivevalidationandrefinementtoolforirregularlyshapedclustersTheVISTAsystemhastwouniquefeaturesFirst,itimplementsalinearandreliablevisualizationmodeltointeractivelyvisualizethemultidimensionaldatasetsinaDstarcoordinatespaceSecond,itprovidesarichestsetofuserfriendlyinteractiverenderingoperations,allowinguserstovalidateandrefinetheclusterstructurebasedontheirvisualexperienceaswellastheirdomainknowledgeTheVISTAvisualizationmodelconsistsoftwolinearmappings:MaxminnormalizationfollowedbyαmappingEquation()representstheMaxMinnormalization:isusedtonormalizethecolumnsinthedatasetssoastoeliminatethedominatingeffectoflargevaluedcolumns=minmaxmin)(vvi()wherevistheoriginalandivisthenormalizedvalueTheαmappingmapskdimensionalpointsontotwodimensionalvisualspaceswiththeconvenienceofvisualparametertuningTheproposedvisualizationmodelEVISTAutilizestheweightedvectornormalizationwhichisperformedonrowsinsteadofcolumns,suchthatthevisualizationmodeldefinesthereliablepositionof),(yxQEVISTAcompletelyeliminatestheusageofαtuning,sinceαmappingistediouswhenthenumberofdimensionsishighAndeachchangeinαvaluesrequiresafreshvisualdistancecomputationAsthenumberofdimensionsincreases,visualdistancecomputationprocessmaycreatetimecomplexitySimilareffectsmayoccurwhenthenumberofdataobjectsincreasesThismakesACADEMYPUBLISHERFULLPAPERInternationalJournalofRecentTrendsinEngineering,Vol,No,NovemberthehumancomputerinteractionineffectiveandaffectstheapplicabilityofVISTAIVEXPERIMENTALANLYSISToillustratetheefficiencyofourproposedvisualization,empiricalanalysesareconductedonnumberofbenchmarkdatasetsavailableintheUCImachinelearningdatarepositoryTheperformanceofEVISTAiscomparedagainstVISTAsystemandtheautomaticclusteringalgorithmKMeansTheexperimentsinVISTAareconductedbysettingαvalueasThedetailedinformationofthedatasetsisshowninTableITABLEIDETAILSOFDATASETSAClustervalidationValidationofclustersisveryimportantinclusteranalysis,becauseclusteringmethodstendtogenerateclusteringevenforfairlyhomogeneousdatasetsThequalityofclustersobtainedthroughvisualclusteringismeasuredintermsofthreeclassicalmethodsproposedin•TheRandindexandJaccardcoefficientvalidationsarebasedontheagreementbetweenclusteringresultsandthe“groundtruth”TheclassicalvaliditymeasuresareheavilyrelatedtothegeometryordensitynatureofclustersandtheydonotworkwellforarbitraryshapedclustersInsuchcases,visualperceptionplaysanimportantindecidingrightclustersIrisData:IrisdatasetisabenchmarkdatasetwidelyusedinpatternrecognitionandclusteringItisformedbyfourdimensionalinstancesofthethreeclassesofplantsclassifiedaccordingtothesepallengthandwidthandthepetallengthandwidthTheirisdatasetconsistsofthreeclusterswithequaldistributionOneclusterislinearlyseparablefromtheothertwothelattertwoarenotexactlylinearlyseparablefromeachotherFigureshowstheinitialvisualizationofirisdatasetinVISTAmodel,whereweobservethepossibilityofthreeclustersAnditisobservedfromthefigurethat,oneclusteriscompletelyseparatedfromtheothertwo,wheretheremainingtwoarefoundtobeoverlappedAfterperforminginteractivevisualclusteringwithsuitableαtuningthevisualboundariesbetweentheclustersbecomeclearerFigureshowthevisualizationofirisdatasetafterαtuningAstheliteratureofirisdatasetspecified,thetwoclustersarenotlinearlyseparableInVISTAitcouldbeobservedafterthefinetuningofαAndthesmallregionwhichconsistingoftheoverlappingdatapointsarealsoobservedAndmoreimportantlytheseparationoftwoclustersfoundtobedifficultfortheusersBResultsandDiscussionFigureVisualizationofIrisDatasetusingVISTAsystemFigureVisualizationofIrisDatasetafterαtuningusingVISTAsystemFigureVisualizationofIrisDatasetusingEVISTAsystemInVISTA,thedomainknowledgeplaysavitalroleinfindingtheoptimumnumberofclustersIngeneral,thedomainknowledgeintheformoflabeleditemsobtainedbytraditionalautomaticclusteringalgorithmssuchasKMeanscanbeincorporatedintothevisualclusteringprocessAndauserwithoutdomainknowledgemayfailinfindingtheoptimumclusters,sinceαtuningchangethedatapointdistributionMostoftheautomatedclusteringalgorithmsrequirethenumberofclusterstobespecifiedprior,thatmaynotcoincidewithrealclusterdistributionofthedatasetThisincreasesthecomplexityofclusteringprocessEVISTAreducesthecomplexityofclusteringbyeliminatingtheusageofαFigureshowtheirisdatasetvisualizationbasedonEVISTAmodelFromtheresults,itisobservedthatoneclusteriscompletelyseparatedfromtheothersandthevisualboundariesbetweentheothertwoclustersareclearlyidentifiedItisalsonoticedthatthereareonlytwodatapointsareoverlappedSinceEVISTAdoesn’tpossessαtuningtheprocessofvisualdistancecomputationprocessiscompletelyeliminated,whichreducesthetimecomplexityEVISTAdoesn’trequirethedomainknowledgeinanyform,whicheasesthehumancomputerinteractionanditvisualizestheexactpatternofthegivendatasetwithouthumaninterventionAustralianData:AustralianDatasetconcernswithcreditcardapplicationsThisdatasetisinterestingbecausethereisagoodmixofattributescontinuous,nominalwithsmallnumbersofvalues,andnominalwithlargernumbersofvaluesThisdatasetalsohasmissingvaluesSuitablestatisticalbasedcomputationisappliedforfindingthemissingvaluesIthasSNoDataSetNoofAttributesNoofClassesNoofInstancesIrisBreastCancerHepatitisBupaPimaAustralianACADEMYPUBLISHERFULLPAPERInternationalJournalofRecentTrendsinEngineering,Vol,No,NovembertwoclassesTheclassdistributionisforclassAandforclassBFigureshowthevisualizationofAustraliandatasetinVISTA,wherepossiblyonesingleclusterisobservedDuringαtuning,theusercanabletoidentifythetwoclustersIftheαtuningisnotperformedcarefully,theusermaygetdifferentpatternwhichmayleadsconfusionFigureshowtheprocessofαtuning,whereitisobservedfourclusterdistributionThisleadsapoorclusterqualityInsuchcase,domainknowledgeistheonlyaidtoidentifytheoptimumnumberofclustersFigureshowtheclusterdistributionusingEVISTAwheretwopotentialclustersareobservedSinceαtuningisnotincludedintheEVISTAmodel,theclusterdistributioncanbeclearlyvisualizedEventhoughtheuserdoesn’thaveenoughdomainknowledgeinanyoftheformsuchas:numberofclusters,clusterdistribution,visualizationmodelEVISTAsuitablyidentifiestheoptimumnumberofclustersPimaDataPimaDatasetisanIndianDiabetesDatabasewithdataobjectsIthastwoclasseswithclassdistributionasandItconsistsofattributessuchasnumberoftimespregnant,Plasmaglucoseconcentration,Diastolicbloodpressure(mmHg),Tricepsskinfoldthickness(mm),Diabetespedigreefunction,etcFigureshowtheVISTAvisualizationofpimaIndiandatasetWhenthepimadatasetisvisualizedusingVISTA,onepossibleclusterisobservedEventhesuitableαtuningdoesn’tdistinguishtheclustersTheboundaryregionsofthetwoclustersarepossiblynotidentifiedWhereasEVISTAvisualizationofpimadatasetclearlyshowstwopotentialclustersFromFigitisobservedthatpimadatasetcontainstwopotentialclusters,andfewdataobjectsarescatteredaroundthepotentialareaSinceEVISTAdoesn’trequireαtuningtheusermayfinditveryflexibleinfindingtheunderlyingpatternofthedatasetwithouthumaninterventionAndwithsuitablegeometrictransformationsuchasscalingandrotationtheusermayabletoobservetheclusterdistributionaccordingtotheirvisualperceptionCComparativeAnalysisThispartofthesectioncomparestheresultsofEVISTAwithVISTAandthecentroidbasedautomaticclusteringalgorithmKMeansInEVISTAtheclusterlabelingisperformedusingfreehanddrawingTheareawithpotentialdatapointsarecoveredbyconvexhullandthedatapointsintheconvexhullarelabeledasonesingleclusterTheclusterresultsareevaluatedbasedonRandIndexandJaccardcoefficientsareshowninTableIIandTableIIITheresultsofVISTAareobtainedbyconductingtheexperimentsonseveralrunsandtheaverageofthemistakenforexperimentalanalysisVCONCLUSI

用户评论(0)

0/200

精彩专题

上传我的资料

每篇奖励 +2积分

资料评价:

/5
1下载券 下载 加入VIP, 送下载券

意见
反馈

立即扫码关注

爱问共享资料微信公众号

返回
顶部