关闭

关闭

封号提示

内容

首页 1a. Linear vs Logistic.pdf

1a. Linear vs Logistic.pdf

1a. Linear vs Logistic.pdf

上传者: 咕叽咕叽琦琦 2013-09-15 评分 4.5 0 83 11 377 暂无简介 简介 举报

简介:本文档为《1a. Linear vs Logisticpdf》,可适用于高等教育领域,主题内容包含APPLIEDANDENVIRONMENTALMICROBIOLOGY,$DOI:AEM–May,p–Vol,NoCopyright,America符等。

APPLIEDANDENVIRONMENTALMICROBIOLOGY,$DOI:AEM–May,p–Vol,NoCopyright,AmericanSocietyforMicrobiologyAllRightsReservedComparisonofLogisticRegressionandLinearRegressioninModelingPercentageDataLIHUIZHAO,YUHUANCHEN,†ANDDONALDWSCHAFFNER*DepartmentofFoodScience,CookCollege,theNewJerseyAgriculturalExperimentStation,Rutgers,TheStateUniversityofNewJersey,NewBrunswick,NewJerseyReceivedNovemberAcceptedFebruaryPercentageiswidelyusedtodescribedifferentresultsinfoodmicrobiology,eg,probabilityofmicrobialgrowth,percentinactivated,andpercentofpositivesamplesFoursetsofpercentagedata,percentgrowthpositive,germinationextent,probabilityforonecelltogrow,andmaximumfractionofpositivetubes,wereobtainedfromourownexperimentsandtheliteratureThesedataweremodeledusinglinearandlogisticregressionFivemethodswereusedtocomparethegoodnessoffitofthetwomodels:percentageofpredictionsclosertoobservations,rangeofthedifferences(predictedvalueminusobservedvalue),deviationofthemodel,linearregressionbetweentheobservedandpredictedvalues,andbiasandaccuracyfactorsLogisticregressionwasabetterpredictorofatleastoftheobservationsinallfourdatasetsInallcases,thedeviationoflogisticmodelswasmuchsmallerThelinearcorrelationbetweenobservationsandlogisticpredictionswasalwaysstrongerValidation(accomplishedusingpartofonedataset)alsodemonstratedthatthelogisticmodelwasmoreaccurateinpredictingnewdatapointsBiasandaccuracyfactorswerefoundtobelessinformativewhenevaluatingmodelsdevelopedforpercentagedata,sinceneitheroftheseindicescancomparepredictionsatzeroModelsimplificationforthelogisticmodelwasdemonstratedwithonedatasetThesimplifiedmodelwasaspowerfulinmakingpredictionsasthefulllinearmodel,anditalsogaveclearerinsightindeterminingthekeyexperimentalfactorsMicrobialdataexpressedaspercentageshavebeenmodeledformanyyearsPercentagedatamayhaveverydifferentbiologicalmeaningsandexpressionsIn,Genigeorgisetalinitiatedtheconceptofprobabilityforonecelltogrowandproducetoxin,presentedastheratioofRGoverRI,whereRGisthenumberofcellsinitiatinggrowth,andRIisthenumberofcellsintheinoculum()Inatimetoturbiditymodel,WhitingandOriente()describedthemaximumprobabilityofgrowthwiththeparameterPmax,thisvaluebeingobtainedfromfittingthegrowthcurvewiththelogisticequationCheaetalmodeledtheextentofsporegerminationusingtheplateauvalueofthegerminationcurve()ThepercentgrowthpositiveparameterdescribesthemaximumproportionofwellsthatexhibitedgrowthundervariousenvironmentalconditionsinastudyusingmicroplatesinoculatedwithClostridiumbotulinumspores()AconventionalapproachappliedtomodelingpercentagedataistouselinearregressionwithpolynomialtermsThismethodusuallyresultsinmoderate(R,)(,,,,,,)topoor(R,)()goodnessoffitGenerally,theaccuracyoflinearmodelsformodelingboundedvariables(eg,percentagedata)isnotasgoodasforotherunboundedvariablesobtainedinthesameexperiment,andtheresultinglinearmodelalsopredictspoorlyatvaluesclosetoand(,,)Aninsurmountablelimitationofthelinearapproachisthatthemodelcanpredictpercentagesoutsidetheprobabilityrange,ie,valuesof,or(,,,)Generally,allpredictednegativevaluesareforcedto,andthoseareforcedtoEvenwithoutthismodification,itisnotmeaningfultocomparetheseconditionsForexample,cannotbeinterpretedasahigherpercentgerminationthanLogisticregressionhasbeenwidelyusedinmedicalresearch(,,,,,)Inthefieldofpredictivefoodmicrobiology,logisticmodelshavebeendevelopedtodescribethebacterialgrowthnogrowthinterface(,,,)Inthesemodels,thedatawerepresentedintheformat,asinatypicalbinomialdatasetGenigeorgisetalfirstpresentedtheconceptoftheprobabilitythatonecellcouldgrowinaspecificenvironment()Later,thisprobabilitywasmodeledinvarioussystemsusinglogisticregressioncombinedwithalinearregressionofthelagperiod(,,,,,)RobertsetalusedasimilarconceptandtheregressionapproachtomodeltoxinproductionbyCbotulinuminpasteurizedporkslurry()Coleetalmodeledtheprobabilityofgrowthofspoilageyeastinamodelfruitdrinkbydirectlyrelatingthelogitofprobabilitywiththeenvironmentalfactors()Inthesestudies,probability(acontinuousnumberbetweenand)insteadofadichotomousvariable(ie,,)wasmodeledAspointedoutbyRatkowskyandRoss(),theresponsemodeledbylogisticregressionatagivencombinationoflimitingfactorscaneitherhaveavalueofororbeaprobabilityProbability,generallyexpressedbydividingthenumberofsuccessesbythetotalnumberoftrials,issimplyasummarizationofbinomialdataandthuscanbeapproximatedbyalogisticgenerallinearmodel()Inthisstudy,wecomparedthegoodnessoffitoflinearregressiontologisticregressionformodelingpercentagesWemodeleddatafromourownresearchandfromtheliterature*CorrespondingauthorMailingaddress:DepartmentofFoodScience,CookCollege,theNewJerseyAgriculturalExperimentStation,Rutgers,TheStateUniversityofNewJersey,DudleyRd,NewBrunswick,NJPhone:(),extFax:()Email:schaffneraesoprutgersedu†Presentaddress:NationalFoodProcessorsAssociation,Washington,DC(includingpublicationsfromourgroup)anddevelopedmodelsusingboththelogisticandlinearapproachesinexactlythesamemannerFivedifferentapproacheswereusedtocomparethegoodnessoffitofthetwomodelsInalmostallcases,thelogisticmodelsdisplayedgreateraccuracyandresultedinlessbiasedpredictionsMATERIALSANDMETHODSDatacollectionFourdifferentsetsofpercentagedatawerecollectedfrompreviousexperiments(,,,)EachsethaditsownuniquebiologicalmeaningandwascollectedwithadifferentmethodWeightisthedegreeofemphasisamodelputsonanobservationTheweightforapercentagedatumpointisthetotalnumberofobservationsassociatedwiththispercentage()Forexample,whenoftubesturnturbid,thepercentageis()andtheweightforthispercentageisTheassignmentofweightswasdetermineddifferentlyforeachdataset,asdescribedbelowDatasetI:dataforpercentgrowthpositivewerecollectedbyZhaoetal()ThisdatasetcontainedtheexactnumbersofwellsthatshowedgrowthandnogrowthThetotalnumberofwellsineachconditionisthesame,sotheweightassignedforeachconditionisthesameEnvironmentalfactorsstudiedwerepH,sodiumchlorideconcentration,andinoculumsizeinacompletebyfactorialdesignwithatotalofdifferentconditionsDatasetII:extentofgerminationdatawerecollectedbyCheaetal()ThetotalnumberofsporesstudiedforeachconditionwasbetweenandThesmalldifferenceinthetotalnumberineachconditionisnegligible,andequalweightforallthedatapointswasassumedinlogisticregressionEnvironmentalfactorsstudiedwerepH,sodiumchlorideconcentration,andtemperatureinacompletebyfactorialdesignwithatotalofdifferentconditionsDatasetIII:RazavilarandGenigeorgisstudiedtheprobabilityofonecellofListeriamonocytogenestogrow,asaffectedbysodiumchlorideconcentration,time,andtemperature()Weightswerenotobtainable,sothisparameterwasassumedtobethesameineachcaseDatasetIV:PmaxwastheparameterusedtoindicatethemaximumfractionofpositivetubesinoculatedwithCbotulinum()ItwasobtainedbyfittingtheexperimentaldatawithalogisticequationThetotalnumberoftubesvariedbyconditionandwasusedastheweightinlogisticregressionFourenvironmentalfactors,pH,sodiumchlorideconcentration,temperature,andinoculumsize,werestudiedinatotalofdifferentconditionsAsubset,containingdatapointsatC,wasnotusedtodevelopmodelsinstead,thesedatapointswereusedlatertovalidatethemodelsdevelopedfromtheremainingpointsModelingwithlinearandlogisticregressionBothlinearandlogisticmodelsweredevelopedinSplus(MathSoft,Inc,Seattle,Wash)foranobjectivecomparisonThegeneralizedlinearmodeling(“glm”)functionwasusedforbothmethodsThelinkfunctionforlogisticregressionis“binomial”andforlinearregressionis“gaussian”Thefullmodelsgeneratedbyeachapproach,withthesamenumberoftermsinthesameformat,wereusedtoensurethevalidityofthecomparisonThelinearmodelwiththreepredictorvariableshasthefollowinggeneralformat:PercentagebCXCYCZ()CXCYCZCXYCXZCYZwherePercentageistheobservedpercentage,bistheintercept,X,Y,andZarethepredictorvariables,andCisarethecoefficientsThelogisticmodelwiththreepredictorvariableshasthefollowinggeneralformat:logit~P!lnSPPDbCXCYCZ()CXCYCZCXYCXZCYZwherePistheprobabilitythattheeventwouldoccuraccordingtothemodelandtheremainingsymbolshavethesamemeaningasinequationModelcomparison(i)AdjustmentwithpredictionsfromlinearmodelsPredictionsfromlinearmodelscanbegreaterthanorlessthanInpractice,thesepredictionsaregenerallyforcedtobeand,respectively(,,,)Tomakethecomparisonofthetwomodelsfairer,predictionsfromlinearmodelswereforcedintotherangeoftointhismannerForallcomparisons,themodifiedpredictionsfromlinearmodelswereused,exceptasnotedbelow(ii)MethodstocomparemodelpredictionsOutofrangepredictionsfromlinearmodelswerecountedinMethodThenumberofpredictionsfromlogisticregressionthatwereclosertotheobservedvalueswasalsocalculatedForthiscalculation,theabsolutevalueofthedifference(predictedminusobserved)wasusedWeexcludedsomeobservationswhoselinearregressionpredictionswereoutofrangeinthecalculationofthepercentageofcloserpredictionsThisisrequiredbecauselogisticregressionpredictsstrictlybetweenandByforcingoutofrangelinearpredictionstobeor,wemayinappropriatelymakesomelinearpredictionsseembetterForexample,iftheobservationis,thelogisticpredictionis,andthelinearpredictionis,ifweforcethelinearpredictiontobe,itwillfalselybejudgedbetterInMethod,wecomparedtherangesofthedifferencesbetweenthepredictedandtheobservedvaluesPointsummariesofthedifferences(predictedminusobserved),ie,minimum,firstquarter,median,mean,thirdquarter,andmaximum,wereobtained,andtherangeandinterquarterrange(IQR)werecalculatedRangemaximumminimum()IQRthirdquarterfirstquarter()ThesmallerthevaluesoftherangeandIQR,thecloserthepredictionsaretotheobservationsTherangeissensitivetooutlyingpointswhosepredictedandobservedvaluesareverydifferent,whiletheIQRisnotaffectedasmuchForMethod,thedeviationofthemodelfromobservationswascalculatedasfollows:DeviationO(predictedobserved)()Thesmallerthedeviation,thecloserthemodelpredictionsweretotheobservationsMethodcannotdetectpredictionsthatarefarfromtheobservationsMethodallowsfordetectionofthesewidedeviationsbymeasuringtherangeofthedifferencesbetweenthepredictedandobservedvalues,butitisunabletoindicatewhichmodelresultsinagreaternumberofpredictionsclosertotheobservedMethodtakesbothconsiderationsintoaccountMethodusedgraphsoftheobservedvalues(xaxis)versuspredictedvalues(yaxis)frombothmodelsAsimplelinearregressionwasfittedtothepoints,andtheintercept,theslope,andRwereobtainedIfthepredictionsareinperfectagreementwiththeobservedvalues,theinterceptshouldbe,theslopeshouldbe,andRshouldbeTheclosertheinterceptisto,theslopeisto,andRisto,thebetteristhegeneralpredictivepowerofthemodelAslopeoflessthanindicatesthatthemodelunderpredictstheobservationMethodusedbiasandaccuracyvaluesasaquantitativewaytomeasurethegoodnessoffitofthemodels(,)Thebiasfactorindicatesbyhowmuch,onaverage,amodeloverpredicts(biasfactor)orunderpredicts(biasfactor,)theobserveddataBiasfactornOlogSpredictedvalueobservedvalueD()TheaccuracyfactorindicatesbyhowmuchthepredictionsdifferfromtheobserveddataAccuracyfactornOUlogSpredictedvalueobservedvalueDU()Inbothequations,nisthenumberofobservationsusedinthecalculationInaperfectmodel,boththebiasandaccuracyfactorsareequaltoSimplificationofthelogisticmodelDatafromCheaetal()wereusedtodemonstratetheprocedureforreducingthenumberofparametersinthelogisticmodelandtoshowhowbetterphysiologicalinsightintotheexperimentmightbederivedfromthereducedmodelZHAOETALAPPLENVIRONMICROBIOLRESULTSDatasetI:percentgrowthpositiveThirtypercentofthepredictionsfromlinearregressionareoutofthetorange(Table)FifteenpredictionsfromthelogisticmodelareclosertotheobservedSevenlinearpredictionswereinaccuratelymadebetterbyforcingpredictionsoverto,andoneconditionwasmadefalselybetterbyforcingthepredictionlowerthantoThepercentagebetterpredictedbylogisticregressioniscalculatedbyexcludingthesedatapoints:Percentagebetterbylogistic()Therangeofthedifferences(predictedminusobserved)fromlogisticregressionismorethantimessmallerthanthatfromlinearregressionTheIQRfromlogisticregressionisaboutonethirdofthatfromlinearregressionThedeviationvalueofthelogisticmodelismorethantimessmallerPredictionsfromlogisticregressionaremuchbetterthanthosefromlinearregressionovertheentirerangeandespeciallyatpointsclosertoand(Fig)Threepredictionsbythelinearmodel,eachwithanobservationof,are,,and,whilethelogisticpredictionsaremuchbetter:,,andThetwoobservationswerepredictedbythelinearmodeltobeand,,whilethelogisticpredictionswereandAnotherobservationatthelowerrange,,waspredictedtobeandbylinearregressionandlogisticregression,respectivelyThefittedlineforthepredictedvaluesfromlogisticregressionwasveryclosetoaperfectfit(Table)Thefittedlineforthelinearmodelpredictionsversustheobservationswasconsiderablyworse,withaslopeofabout,suggestingsystematicunderpredictionThebiasandaccuracyfactorsforlogisticregressionTABLEComparisonofresultsforlinearandlogisticregressionswithfivedifferentmethodsinfourdifferentdatasetsMethodnoParameterDatasetI(n)DatasetII(n)DatasetIII(n)DatasetIVmodel(n)DatasetIVvalidation(n)LinearLogisticLinearLogisticLinearLogisticLinearLogisticLinearLogisticNoofpredictionsNoofpredictions,NoofpredictionsoflogisticmodelclosertoobservedNAaNANANANAofpredictionsoflogisticmodelclosertoobservedNANANANANAbMininumstquarterMedianMeanrdquarterMaximumRangeIQRcDeviationdInterceptSlopeRBiasAccuracysinobservationaNA,notapplicablebDifferencesofvalues(predictedobserved)weresummarizedrangemaximumminimumIQRthirdquarterfirstquartercDeviationofthemodelfromobservationwascalculatedasS(predictedobserved)dLinearregressionwasdonebetweenpredictedandobservedvaluesAperfectmodelwouldhaveaninterceptvalueof,aslopeof,andanRvalueofFIGGoodnessoffitoflinearregressionandlogisticregressionforCbotulinumpercentgrowthpositive(DatasetI)fromZhaoetal()VOL,MODELINGPERCENTAGEWITHLOGISTICREGRESSIONareslightlyclosertothanthoseforlinearregression(Table)DatasetII:germinationextentApproximatelyofthelinearpredictionsareoutofrange(Table)ApproximatelyofthepredictionsfromlogisticregressionareclosertotheobservedvaluesTherangeofthedifferences(predictedminusobserved)fromthelogisticmodelislessthanonethirdthatfromthelinearmodel,andtheIQRisalmostonesixththatfromthelinearmodelThedeviationvalueofthelogisticmodelistimessmallerThelinefittedtothepredictedvaluesfromthelogisticmodelcomparedtoobservedvaluesisveryclosetotheperfectfittingline(Fig,Table)Thefittedlineforpredictionsfromthelinearmodelhadaslopeofonly,suggestingunderprediction(Table)Threeofsevenobservedvaluesofhadhigherlinearpredictions,at,,and,whiletheremainingfourpredictionswerenegativeAllsevenlogisticpredictionsareverycloseto,withthelargestbeingThreehigherobservations,,,and,werepredictedtobe,,andbythelinearmodel,whilelogisticregressionproducedmuchmoreaccuratepredictionsat,,and,respectivelyThebiasfactorsforthetwomodelsarealmostthesame,andthelogisticmodelisslightlymoreaccurateasjudgedbytheaccuracyfactor(Table)DatasetIII:probabilityofonecellofListeriamonocytogenestogrowThisisaveryspecialdatasetsinceofdatapointsareeitherorResultsdemonstratedthatlogisticregressionisamuchmorepowerfultoolwhenmodelingthistypeofdatasetApproximatelyofthelinearpredictedvaluesareoutofrangeAllobservationsarepredictedbetterbylogisticregressionTherangeofthedifferences(predictedminusobserved)fromthelogisticmodelismorethanfoldsmallerTheIQRanddeviationvaluefromthelogist

类似资料

编辑推荐

罗马法与帝国的遗产 古罗马政治思想史讲稿.pdf

1212-高清连环画-蓝壁毯-顾炳鑫.PDF

中国古代女子饰品.pdf

人脉关系课.pdf

1506-02 养素堂诗集04.pdf

职业精品

精彩专题

上传我的资料

精选资料

热门资料排行换一换

  • 日本的思想.pdf

  • 龙虎丹法.pdf

  • 美国短篇小说集.pdf

  • 莫泊桑中短篇小说选.pdf

  • 法国短篇小说选.pdf

  • 《胡适全集》(06)《哲学·专着…

  • 南方人物周刊2010年第39期.…

  • 成语植物图鉴.pdf

  • 增删卜易123卷.pdf

  • 资料评价:

    / 7
    所需积分:1 立即下载

    意见
    反馈

    返回
    顶部