关闭

关闭

关闭

封号提示

内容

首页 interactive visual clustering

interactive visual clustering.pdf

interactive visual clustering

reginald_zhong
2012-07-26 0人阅读 0 0 0 暂无简介 举报

简介:本文档为《interactive visual clusteringpdf》,可适用于IT/计算机领域

FULLPAPERInternationalJournalofRecentTrendsinEngineering,Vol,No,NovemberEVISTA–InteractiveVisualClusteringSystemKThangavel,PAlagambigaiDepartmentofComputerScience,PeriyarUniversity,Salem,Tamilnadu,IndiaEmail:drktveluyahoocomDepartmentofComputerApplications,EaswariEngineeringCollege,Chennai,Tamilnadu,IndiaEmail:alagambigaiyahoocoinAbstractDuetotheenormousincreaseinthedata,exploringandanalyzingthemisincreasinglyimportantbutdifficulttoachieveInformationvisualizationandvisualdataminingcanhelptodealwiththisVisualdataexplorationhasahighpotentialandmanyapplicationssuchasfrauddetectionanddataminingwilluseinformationvisualizationtechnologyforanimproveddataanalysisTheadvantageofvisualdataexplorationisthattheuserisdirectlyinvolvedinthedataminingprocessTherearealargenumberofinformationvisualizationtechniqueswhichhavebeendevelopedoverthelastdecadetosupporttheexplorationoflargedatasetsVISTAisaninteractivevisualclusterrenderingsystemwhichinviteshumanintotheclusteringprocess,buttherearesomelimitationsinidentifyingtheclusterdistributionandhumancomputerinteractionInthispaper,weproposeanEnhancedVISTA(EVISTA)whichaddressesthesedrawbacksEVISTAimprovesthevisualizationintwoways:firstitusestheweightedvectornormalizationinsteadofmaxminnormalization,whichimprovesthedatavisualizationsuchthattheusercanunderstandtheunderlyingpatternwithouthumaninterventionSecondlyitcompletelyeliminatestheuseofαtuning,whichreducesthecomplexityinvisualdistancecomputationandeasesthehumancomputerinteractioninabetterwayTheexperimentresultsshowthatEVISTAexploretheunderlyingpatternofthedataseteffectivelyandreducestheuseroperationburdengreatlyIndexTermsClustering,EVISTA,Humancomputerinteraction,Informationvisualization,VisualdataminingIINTRODUCTIONDatavisualizationisessentialforunderstandingtheconceptofmultidimensionalspacesItallowstheusertoexplorethedataindifferentwaysatdifferentlevelsofabstractiontofindtherightlevelsofdetailsThereforetechniquesaremostusefuliftheyarehighlyinteractive,permitdirectmanipulationandincludearapidresponsetimeVisualizationisdefinedbywareas“agraphicalrepresentationofdataorconcepts”whichiseitheran“internalconstructofthemind”oran“externalartifactsupportingdecisionmaking”VisualizationprovidesvaluableassistancetothehumanbyrepresentinginformationvisuallyThisassistancemaybecalledcognitivesupportVisualizationcanprovidecognitivesupportthroughanumberofmechanismssuchasgroupingrelatedinformationforeasysearchandaccess,representinglargevolumesofdatainasmallspaceandimposingstructureondataandtaskscanreducetimecomplexity,allowinginteractiveexplorationthroughmanipulationofparametervaluesVisualizationtechniquescouldenhancethecurrentknowledgeanddatadiscoverymethodsbyincreasingtheuserinvolvementintheinteractiveprocessMorerecentlytherearealotofdiscussionsonvisualizationfordataminingVisualdataminingcanbeviewedasanintegrationofdatavisualizationanddatamining,Consideringvisualizationasasupportingtechnologyindatamining,fourpossibleapproachesarestatedinThefirstapproachistheusageofvisualizationtechniquetopresenttheresultsthatareobtainedfromminingthedatainthedatabaseSecondapproachisapplyingthedataminingtechniquetovisualizationbycapturingessentialsemanticsvisuallyThethirdapproachistousevisualizationtechniquestocomplementthedataminingtechniquesThefourthapproachusesvisualizationtechniquetosteerminingprocessIngeneral,visualizationcanbeusedtoexploredatatoconfirmahypothesisortomanipulateaviewExploratoryvisualizationcreatesadynamicscenarioinwhichinteractioniscriticalTheusernotnecessarilyknowthatwhathesheislookingfor,cansearchforstructuresortrendsandisattemptingtoarriveatsomehypothesisTheconfirmatoryvisualization,inwhichthesystemparametersareoftenpredeterminedandthevisualizationtoolsareusedtoconfirmorrefutethehypothesisThemanipulativevisualizationfocusesonrefiningthevisualizationtooptimizethepresentationVisualizationhasbeencategorizedintotwomajorareas:i)scientificvisualization–whichfocusesprimarilyonphysicaldatasuchashumanbody,etcii)Informationvisualization–whichfocusesonabstractnonphysicaldatasuchastext,hierarchiesandstatisticaldataDataminingtechniquesprimarilyorientedoninformationvisualizationBothscientificvisualizationandinformationvisualizationcreategraphicalmodelsandvisualrepresentationsfromdatathatsupportdirectuserinteractionforinteractionforexploringandacquiringinsightintousefulinformationembeddedintheunderlyingdata,Eventhoughvisualizationtechniqueshaveadvantagesoverautomaticmethods,itbringsupsomespecificproblemssuchaslimitationinvisibility,visualbiasduetomappingofdatasettoDDrepresentation,easytousevisualinterfaceoperationsandreliablehumancomputerinteractionInmostofthevisualizationmethodsthehumancomputerinteractioncoststhanautomatedIngeneral,thevisualdataminingisdifferentfromscientificvisualizationandithasthefollowingcharacteristics:ƒWiderangeofusersƒWidechoiceofvisualizationtechniquesandƒImportantdialogfunctionTheusersofscientificvisualizationarescientistsandengineerswhocanendurethedifficultyinusingthesystemforlittleatmost,whereasavisualdataminingmusthavethe©ACADEMYPUBLISHERFULLPAPERInternationalJournalofRecentTrendsinEngineering,Vol,No,NovemberpossibilitythatthegeneralpersonsuseswidelyandsooneasilyByconsideringthisissue,thispaperproposesanovelinformationvisualizationtechniquecalledenhancedvisualclusteringsystem(EVISTA),anextensionversionofVISTAVISTA,adynamicdatavisualizationmodelwhichinvitehumanintotheclusteringprocessEventhoughVISTAprovedtobeanefficientinteractivevisualclusterrenderingsystem,itrequiresacompleteuserinteractionthroughouttheclusteringprocessWhenthenumberofdimensionincreases,thehumancomputerinteractionbecomestediousEVISTAdesignedinsuchawaytoprovideanefficientdatavisualizationsuchthattheusercanabletounderstandtheunderlyingpatternofthegivendatasetwithouthumaninterventionTherestofthepaperisorganizedasfollows:SectiondiscussesreviewsoftherelatedworksinthedomainofinformationvisualizationSectiondealswiththeEVISTASectiondiscussestheexperimentalanalysisSectionconcludesthepaperIIRELATEDWORKSVariouseffortsaremadetovisualizemultidimensionaldatasets,,,TheearlyresearchongeneralplotbaseddatavisualizationisGrandTourandProjectionPursuitThepurposeoftheGrandTourandProjectionPursuitistoguideusertofindtheinterestingprojectionsLYangutilizestheGrandTourtechniquetoshowprojectionsofdatasetsinananimationTheyprojectthedimensionstocoordinateinaDspaceHowever,whentheDspaceisshownonaDscreen,someaxesmaybeoverlappedbyotheraxes,whichmakeithardtoperformdirectinteractionsondimensionsStarcoordinateisaninteractivevisualizationmodelwhichtreatsdimensionsuniformly,inwhichdataarerepresentedcoarselyandbysimpleandmorespaceefficientpoints,whichresultinlessclutteredvisualizationforlargedatasetsInteractivevisualclustering(IVC)combinesspringembeddedgraphlayouttechniqueswithuserinteractionandconstrainedclusteringVISTA,isarecentvisualizationmodelsutilizesstarcoordinatesystemprovidesimilarmappingfunctionlikestarcoordinatesystemsTherearetwotypesofclusterrenderinginVISTAmodelTheformeroneisunguidedrenderingandthelatterisguidedrenderingIIIENHANCEDVISUALCLUSTERINGSYSTEMEnhancedVISTA(EVISTA)isaninformationvisualizationframeworksemploysimproveddatavisualizationandrevealthehiddenpatternsincomplexhighdimensionaldatasets,withouthumaninterventionTheEVISTAmodelisdesignedbasedonthestarcoordinatesStarcoordinatesystemisatraditionalmultivariatedatavisualizationtechniqueinwhichthekaxisisdefinedbyanorigin),(yxO=randkcoordinatekSSSS,,,,representsthekdimensionsinDspacesThekcoordinatesareequidistantlydistributedonthecircumferenceofthecircleC,wheretheunitvectorsareobtainedbykikikiSi,,,)),sin(),(cos(==ππr()AndtheDpoint),(yxQisobtainedby,}{⎭⎬⎫−∑⎩⎨⎧−∑===,sin')(cos')(,ykiykcxkixkcQyQxkikiiiππ()iiiiwtxwtx−='()whereixrepresentsthegivendataobject,'ixrepresentsthenormalizeddatavaluebasedonweightedvectoriwtand∑==njiijxwt,,,()EVISTAemploysthedesignofVISTAvisualclusterrenderingproposedbyKeKeChenandLLiuprovidesanintuitivewaytovisualizeclusterswithinteractivefeedbackstoencouragedomainexpertstoparticipateintheclusteringrevisionandclustervalidationprocessItallowstheusertointeractivelyobservepotentialclustersinaseriesofcontinuouslychangingvisualizationsthroughαMoreimportantly,itcanincludealgorithmicclusteringresultsandserveasaneffectivevalidationandrefinementtoolforirregularlyshapedclustersTheVISTAsystemhastwouniquefeaturesFirst,itimplementsalinearandreliablevisualizationmodeltointeractivelyvisualizethemultidimensionaldatasetsinaDstarcoordinatespaceSecond,itprovidesarichestsetofuserfriendlyinteractiverenderingoperations,allowinguserstovalidateandrefinetheclusterstructurebasedontheirvisualexperienceaswellastheirdomainknowledgeTheVISTAvisualizationmodelconsistsoftwolinearmappings:MaxminnormalizationfollowedbyαmappingEquation()representstheMaxMinnormalization:isusedtonormalizethecolumnsinthedatasetssoastoeliminatethedominatingeffectoflargevaluedcolumns⎥⎦⎤⎢⎣⎡−−−=minmaxmin)(vvi()wherevistheoriginalandivisthenormalizedvalueTheαmappingmapskdimensionalpointsontotwodimensionalvisualspaceswiththeconvenienceofvisualparametertuningTheproposedvisualizationmodelEVISTAutilizestheweightedvectornormalizationwhichisperformedonrowsinsteadofcolumns,suchthatthevisualizationmodeldefinesthereliablepositionof),(yxQEVISTAcompletelyeliminatestheusageofαtuning,sinceαmappingistediouswhenthenumberofdimensionsishighAndeachchangeinαvaluesrequiresafreshvisualdistancecomputationAsthenumberofdimensionsincreases,visualdistancecomputationprocessmaycreatetimecomplexitySimilareffectsmayoccurwhenthenumberofdataobjectsincreasesThismakes©ACADEMYPUBLISHERFULLPAPERInternationalJournalofRecentTrendsinEngineering,Vol,No,NovemberthehumancomputerinteractionineffectiveandaffectstheapplicabilityofVISTAIVEXPERIMENTALANLYSISToillustratetheefficiencyofourproposedvisualization,empiricalanalysesareconductedonnumberofbenchmarkdatasetsavailableintheUCImachinelearningdatarepositoryTheperformanceofEVISTAiscomparedagainstVISTAsystemandtheautomaticclusteringalgorithmKMeansTheexperimentsinVISTAareconductedbysettingαvalueasThedetailedinformationofthedatasetsisshowninTableITABLEIDETAILSOFDATASETSAClustervalidationValidationofclustersisveryimportantinclusteranalysis,becauseclusteringmethodstendtogenerateclusteringevenforfairlyhomogeneousdatasetsThequalityofclustersobtainedthroughvisualclusteringismeasuredintermsofthreeclassicalmethodsproposedin•TheRandindexandJaccardcoefficientvalidationsarebasedontheagreementbetweenclusteringresultsandthe“groundtruth”TheclassicalvaliditymeasuresareheavilyrelatedtothegeometryordensitynatureofclustersandtheydonotworkwellforarbitraryshapedclustersInsuchcases,visualperceptionplaysanimportantindecidingrightclustersIrisData:IrisdatasetisabenchmarkdatasetwidelyusedinpatternrecognitionandclusteringItisformedbyfourdimensionalinstancesofthethreeclassesofplantsclassifiedaccordingtothesepallengthandwidthandthepetallengthandwidthTheirisdatasetconsistsofthreeclusterswithequaldistributionOneclusterislinearlyseparablefromtheothertwothelattertwoarenotexactlylinearlyseparablefromeachotherFigureshowstheinitialvisualizationofirisdatasetinVISTAmodel,whereweobservethepossibilityofthreeclustersAnditisobservedfromthefigurethat,oneclusteriscompletelyseparatedfromtheothertwo,wheretheremainingtwoarefoundtobeoverlappedAfterperforminginteractivevisualclusteringwithsuitableαtuningthevisualboundariesbetweentheclustersbecomeclearerFigureshowthevisualizationofirisdatasetafterαtuningAstheliteratureofirisdatasetspecified,thetwoclustersarenotlinearlyseparableInVISTAitcouldbeobservedafterthefinetuningofαAndthesmallregionwhichconsistingoftheoverlappingdatapointsarealsoobservedAndmoreimportantlytheseparationoftwoclustersfoundtobedifficultfortheusersBResultsandDiscussionFigureVisualizationofIrisDatasetusingVISTAsystemFigureVisualizationofIrisDatasetafterαtuningusingVISTAsystemFigureVisualizationofIrisDatasetusingEVISTAsystemInVISTA,thedomainknowledgeplaysavitalroleinfindingtheoptimumnumberofclustersIngeneral,thedomainknowledgeintheformoflabeleditemsobtainedbytraditionalautomaticclusteringalgorithmssuchasKMeanscanbeincorporatedintothevisualclusteringprocessAndauserwithoutdomainknowledgemayfailinfindingtheoptimumclusters,sinceαtuningchangethedatapointdistributionMostoftheautomatedclusteringalgorithmsrequirethenumberofclusterstobespecifiedprior,thatmaynotcoincidewithrealclusterdistributionofthedatasetThisincreasesthecomplexityofclusteringprocessEVISTAreducesthecomplexityofclusteringbyeliminatingtheusageofαFigureshowtheirisdatasetvisualizationbasedonEVISTAmodelFromtheresults,itisobservedthatoneclusteriscompletelyseparatedfromtheothersandthevisualboundariesbetweentheothertwoclustersareclearlyidentifiedItisalsonoticedthatthereareonlytwodatapointsareoverlappedSinceEVISTAdoesn’tpossessαtuningtheprocessofvisualdistancecomputationprocessiscompletelyeliminated,whichreducesthetimecomplexityEVISTAdoesn’trequirethedomainknowledgeinanyform,whicheasesthehumancomputerinteractionanditvisualizestheexactpatternofthegivendatasetwithouthumaninterventionAustralianData:AustralianDatasetconcernswithcreditcardapplicationsThisdatasetisinterestingbecausethereisagoodmixofattributescontinuous,nominalwithsmallnumbersofvalues,andnominalwithlargernumbersofvaluesThisdatasetalsohasmissingvaluesSuitablestatisticalbasedcomputationisappliedforfindingthemissingvaluesIthasSNoDataSetNoofAttributesNoofClassesNoofInstancesIrisBreastCancerHepatitisBupaPimaAustralian©ACADEMYPUBLISHERFULLPAPERInternationalJournalofRecentTrendsinEngineering,Vol,No,NovembertwoclassesTheclassdistributionisforclassAandforclassBFigureshowthevisualizationofAustraliandatasetinVISTA,wherepossiblyonesingleclusterisobservedDuringαtuning,theusercanabletoidentifythetwoclustersIftheαtuningisnotperformedcarefully,theusermaygetdifferentpatternwhichmayleadsconfusionFigureshowtheprocessofαtuning,whereitisobservedfourclusterdistributionThisleadsapoorclusterqualityInsuchcase,domainknowledgeistheonlyaidtoidentifytheoptimumnumberofclustersFigureshowtheclusterdistributionusingEVISTAwheretwopotentialclustersareobservedSinceαtuningisnotincludedintheEVISTAmodel,theclusterdistributioncanbeclearlyvisualizedEventhoughtheuserdoesn’thaveenoughdomainknowledgeinanyoftheformsuchas:numberofclusters,clusterdistribution,visualizationmodelEVISTAsuitablyidentifiestheoptimumnumberofclustersPimaDataPimaDatasetisanIndianDiabetesDatabasewithdataobjectsIthastwoclasseswithclassdistributionasandItconsistsofattributessuchasnumberoftimespregnant,Plasmaglucoseconcentration,Diastolicbloodpressure(mmHg),Tricepsskinfoldthickness(mm),Diabetespedigreefunction,etcFigureshowtheVISTAvisualizationofpimaIndiandatasetWhenthepimadatasetisvisualizedusingVISTA,onepossibleclusterisobservedEventhesuitableαtuningdoesn’tdistinguishtheclustersTheboundaryregionsofthetwoclustersarepossiblynotidentifiedWhereasEVISTAvisualizationofpimadatasetclearlyshowstwopotentialclustersFromFigitisobservedthatpimadatasetcontainstwopotentialclusters,andfewdataobjectsarescatteredaroundthepotentialareaSinceEVISTAdoesn’trequireαtuningtheusermayfinditveryflexibleinfindingtheunderlyingpatternofthedatasetwithouthumaninterventionAndwithsuitablegeometrictransformationsuchasscalingandrotationtheusermayabletoobservetheclusterdistributionaccordingtotheirvisualperceptionCComparativeAnalysisThispartofthesectioncomparestheresultsofEVISTAwithVISTAandthecentroidbasedautomaticclusteringalgorithmKMeansInEVISTAtheclusterlabelingisperformedusingfreehanddrawingTheareawithpotentialdatapointsarecoveredbyconvexhullandthedatapointsintheconvexhullarelabeledasonesingleclusterTheclusterresultsareevaluatedbasedonRandIndexandJaccardcoefficientsareshowninTableIIandTableIIITheresultsofVISTAareobtainedbyconductingtheexperimentsonseveralrunsandtheaverageofthemistakenforexperimentalanalysisVCONCLUSI

用户评价(0)

关闭

新课改视野下建构高中语文教学实验成果报告(32KB)

抱歉,积分不足下载失败,请稍后再试!

提示

试读已结束,如需要继续阅读或者下载,敬请购买!

评分:

/5

意见
反馈

立即扫码关注

爱问共享资料微信公众号

返回
顶部

举报
资料