关闭

关闭

封号提示

内容

首页 1a. Linear vs Logistic.pdf

1a. Linear vs Logistic.pdf

1a. Linear vs Logistic.pdf

上传者: 咕叽咕叽琦琦 2013-09-15 评分1 评论0 下载0 收藏0 阅读量377 暂无简介 简介 举报

简介:本文档为《1a. Linear vs Logisticpdf》,可适用于高等教育领域,主题内容包含APPLIEDANDENVIRONMENTALMICROBIOLOGY,$DOI:AEM–May,p–Vol,NoCopyright,America符等。

APPLIEDANDENVIRONMENTALMICROBIOLOGY,$DOI:AEM–May,p–Vol,NoCopyright,AmericanSocietyforMicrobiologyAllRightsReservedComparisonofLogisticRegressionandLinearRegressioninModelingPercentageDataLIHUIZHAO,YUHUANCHEN,†ANDDONALDWSCHAFFNER*DepartmentofFoodScience,CookCollege,theNewJerseyAgriculturalExperimentStation,Rutgers,TheStateUniversityofNewJersey,NewBrunswick,NewJerseyReceivedNovemberAcceptedFebruaryPercentageiswidelyusedtodescribedifferentresultsinfoodmicrobiology,eg,probabilityofmicrobialgrowth,percentinactivated,andpercentofpositivesamplesFoursetsofpercentagedata,percentgrowthpositive,germinationextent,probabilityforonecelltogrow,andmaximumfractionofpositivetubes,wereobtainedfromourownexperimentsandtheliteratureThesedataweremodeledusinglinearandlogisticregressionFivemethodswereusedtocomparethegoodnessoffitofthetwomodels:percentageofpredictionsclosertoobservations,rangeofthedifferences(predictedvalueminusobservedvalue),deviationofthemodel,linearregressionbetweentheobservedandpredictedvalues,andbiasandaccuracyfactorsLogisticregressionwasabetterpredictorofatleastoftheobservationsinallfourdatasetsInallcases,thedeviationoflogisticmodelswasmuchsmallerThelinearcorrelationbetweenobservationsandlogisticpredictionswasalwaysstrongerValidation(accomplishedusingpartofonedataset)alsodemonstratedthatthelogisticmodelwasmoreaccurateinpredictingnewdatapointsBiasandaccuracyfactorswerefoundtobelessinformativewhenevaluatingmodelsdevelopedforpercentagedata,sinceneitheroftheseindicescancomparepredictionsatzeroModelsimplificationforthelogisticmodelwasdemonstratedwithonedatasetThesimplifiedmodelwasaspowerfulinmakingpredictionsasthefulllinearmodel,anditalsogaveclearerinsightindeterminingthekeyexperimentalfactorsMicrobialdataexpressedaspercentageshavebeenmodeledformanyyearsPercentagedatamayhaveverydifferentbiologicalmeaningsandexpressionsIn,Genigeorgisetalinitiatedtheconceptofprobabilityforonecelltogrowandproducetoxin,presentedastheratioofRGoverRI,whereRGisthenumberofcellsinitiatinggrowth,andRIisthenumberofcellsintheinoculum()Inatimetoturbiditymodel,WhitingandOriente()describedthemaximumprobabilityofgrowthwiththeparameterPmax,thisvaluebeingobtainedfromfittingthegrowthcurvewiththelogisticequationCheaetalmodeledtheextentofsporegerminationusingtheplateauvalueofthegerminationcurve()ThepercentgrowthpositiveparameterdescribesthemaximumproportionofwellsthatexhibitedgrowthundervariousenvironmentalconditionsinastudyusingmicroplatesinoculatedwithClostridiumbotulinumspores()AconventionalapproachappliedtomodelingpercentagedataistouselinearregressionwithpolynomialtermsThismethodusuallyresultsinmoderate(R,)(,,,,,,)topoor(R,)()goodnessoffitGenerally,theaccuracyoflinearmodelsformodelingboundedvariables(eg,percentagedata)isnotasgoodasforotherunboundedvariablesobtainedinthesameexperiment,andtheresultinglinearmodelalsopredictspoorlyatvaluesclosetoand(,,)Aninsurmountablelimitationofthelinearapproachisthatthemodelcanpredictpercentagesoutsidetheprobabilityrange,ie,valuesof,or(,,,)Generally,allpredictednegativevaluesareforcedto,andthoseareforcedtoEvenwithoutthismodification,itisnotmeaningfultocomparetheseconditionsForexample,cannotbeinterpretedasahigherpercentgerminationthanLogisticregressionhasbeenwidelyusedinmedicalresearch(,,,,,)Inthefieldofpredictivefoodmicrobiology,logisticmodelshavebeendevelopedtodescribethebacterialgrowthnogrowthinterface(,,,)Inthesemodels,thedatawerepresentedintheformat,asinatypicalbinomialdatasetGenigeorgisetalfirstpresentedtheconceptoftheprobabilitythatonecellcouldgrowinaspecificenvironment()Later,thisprobabilitywasmodeledinvarioussystemsusinglogisticregressioncombinedwithalinearregressionofthelagperiod(,,,,,)RobertsetalusedasimilarconceptandtheregressionapproachtomodeltoxinproductionbyCbotulinuminpasteurizedporkslurry()Coleetalmodeledtheprobabilityofgrowthofspoilageyeastinamodelfruitdrinkbydirectlyrelatingthelogitofprobabilitywiththeenvironmentalfactors()Inthesestudies,probability(acontinuousnumberbetweenand)insteadofadichotomousvariable(ie,,)wasmodeledAspointedoutbyRatkowskyandRoss(),theresponsemodeledbylogisticregressionatagivencombinationoflimitingfactorscaneitherhaveavalueofororbeaprobabilityProbability,generallyexpressedbydividingthenumberofsuccessesbythetotalnumberoftrials,issimplyasummarizationofbinomialdataandthuscanbeapproximatedbyalogisticgenerallinearmodel()Inthisstudy,wecomparedthegoodnessoffitoflinearregressiontologisticregressionformodelingpercentagesWemodeleddatafromourownresearchandfromtheliterature*CorrespondingauthorMailingaddress:DepartmentofFoodScience,CookCollege,theNewJerseyAgriculturalExperimentStation,Rutgers,TheStateUniversityofNewJersey,DudleyRd,NewBrunswick,NJPhone:(),extFax:()Email:schaffneraesoprutgersedu†Presentaddress:NationalFoodProcessorsAssociation,Washington,DC(includingpublicationsfromourgroup)anddevelopedmodelsusingboththelogisticandlinearapproachesinexactlythesamemannerFivedifferentapproacheswereusedtocomparethegoodnessoffitofthetwomodelsInalmostallcases,thelogisticmodelsdisplayedgreateraccuracyandresultedinlessbiasedpredictionsMATERIALSANDMETHODSDatacollectionFourdifferentsetsofpercentagedatawerecollectedfrompreviousexperiments(,,,)EachsethaditsownuniquebiologicalmeaningandwascollectedwithadifferentmethodWeightisthedegreeofemphasisamodelputsonanobservationTheweightforapercentagedatumpointisthetotalnumberofobservationsassociatedwiththispercentage()Forexample,whenoftubesturnturbid,thepercentageis()andtheweightforthispercentageisTheassignmentofweightswasdetermineddifferentlyforeachdataset,asdescribedbelowDatasetI:dataforpercentgrowthpositivewerecollectedbyZhaoetal()ThisdatasetcontainedtheexactnumbersofwellsthatshowedgrowthandnogrowthThetotalnumberofwellsineachconditionisthesame,sotheweightassignedforeachconditionisthesameEnvironmentalfactorsstudiedwerepH,sodiumchlorideconcentration,andinoculumsizeinacompletebyfactorialdesignwithatotalofdifferentconditionsDatasetII:extentofgerminationdatawerecollectedbyCheaetal()ThetotalnumberofsporesstudiedforeachconditionwasbetweenandThesmalldifferenceinthetotalnumberineachconditionisnegligible,andequalweightforallthedatapointswasassumedinlogisticregressionEnvironmentalfactorsstudiedwerepH,sodiumchlorideconcentration,andtemperatureinacompletebyfactorialdesignwithatotalofdifferentconditionsDatasetIII:RazavilarandGenigeorgisstudiedtheprobabilityofonecellofListeriamonocytogenestogrow,asaffectedbysodiumchlorideconcentration,time,andtemperature()Weightswerenotobtainable,sothisparameterwasassumedtobethesameineachcaseDatasetIV:PmaxwastheparameterusedtoindicatethemaximumfractionofpositivetubesinoculatedwithCbotulinum()ItwasobtainedbyfittingtheexperimentaldatawithalogisticequationThetotalnumberoftubesvariedbyconditionandwasusedastheweightinlogisticregressionFourenvironmentalfactors,pH,sodiumchlorideconcentration,temperature,andinoculumsize,werestudiedinatotalofdifferentconditionsAsubset,containingdatapointsatC,wasnotusedtodevelopmodelsinstead,thesedatapointswereusedlatertovalidatethemodelsdevelopedfromtheremainingpointsModelingwithlinearandlogisticregressionBothlinearandlogisticmodelsweredevelopedinSplus(MathSoft,Inc,Seattle,Wash)foranobjectivecomparisonThegeneralizedlinearmodeling(“glm”)functionwasusedforbothmethodsThelinkfunctionforlogisticregressionis“binomial”andforlinearregressionis“gaussian”Thefullmodelsgeneratedbyeachapproach,withthesamenumberoftermsinthesameformat,wereusedtoensurethevalidityofthecomparisonThelinearmodelwiththreepredictorvariableshasthefollowinggeneralformat:PercentagebCXCYCZ()CXCYCZCXYCXZCYZwherePercentageistheobservedpercentage,bistheintercept,X,Y,andZarethepredictorvariables,andCisarethecoefficientsThelogisticmodelwiththreepredictorvariableshasthefollowinggeneralformat:logit~P!lnSPPDbCXCYCZ()CXCYCZCXYCXZCYZwherePistheprobabilitythattheeventwouldoccuraccordingtothemodelandtheremainingsymbolshavethesamemeaningasinequationModelcomparison(i)AdjustmentwithpredictionsfromlinearmodelsPredictionsfromlinearmodelscanbegreaterthanorlessthanInpractice,thesepredictionsaregenerallyforcedtobeand,respectively(,,,)Tomakethecomparisonofthetwomodelsfairer,predictionsfromlinearmodelswereforcedintotherangeoftointhismannerForallcomparisons,themodifiedpredictionsfromlinearmodelswereused,exceptasnotedbelow(ii)MethodstocomparemodelpredictionsOutofrangepredictionsfromlinearmodelswerecountedinMethodThenumberofpredictionsfromlogisticregressionthatwereclosertotheobservedvalueswasalsocalculatedForthiscalculation,theabsolutevalueofthedifference(predictedminusobserved)wasusedWeexcludedsomeobservationswhoselinearregressionpredictionswereoutofrangeinthecalculationofthepercentageofcloserpredictionsThisisrequiredbecauselogisticregressionpredictsstrictlybetweenandByforcingoutofrangelinearpredictionstobeor,wemayinappropriatelymakesomelinearpredictionsseembetterForexample,iftheobservationis,thelogisticpredictionis,andthelinearpredictionis,ifweforcethelinearpredictiontobe,itwillfalselybejudgedbetterInMethod,wecomparedtherangesofthedifferencesbetweenthepredictedandtheobservedvaluesPointsummariesofthedifferences(predictedminusobserved),ie,minimum,firstquarter,median,mean,thirdquarter,andmaximum,wereobtained,andtherangeandinterquarterrange(IQR)werecalculatedRangemaximumminimum()IQRthirdquarterfirstquarter()ThesmallerthevaluesoftherangeandIQR,thecloserthepredictionsaretotheobservationsTherangeissensitivetooutlyingpointswhosepredictedandobservedvaluesareverydifferent,whiletheIQRisnotaffectedasmuchForMethod,thedeviationofthemodelfromobservationswascalculatedasfollows:DeviationO(predictedobserved)()Thesmallerthedeviation,thecloserthemodelpredictionsweretotheobservationsMethodcannotdetectpredictionsthatarefarfromtheobservationsMethodallowsfordetectionofthesewidedeviationsbymeasuringtherangeofthedifferencesbetweenthepredictedandobservedvalues,butitisunabletoindicatewhichmodelresultsinagreaternumberofpredictionsclosertotheobservedMethodtakesbothconsiderationsintoaccountMethodusedgraphsoftheobservedvalues(xaxis)versuspredictedvalues(yaxis)frombothmodelsAsimplelinearregressionwasfittedtothepoints,andtheintercept,theslope,andRwereobtainedIfthepredictionsareinperfectagreementwiththeobservedvalues,theinterceptshouldbe,theslopeshouldbe,andRshouldbeTheclosertheinterceptisto,theslopeisto,andRisto,thebetteristhegeneralpredictivepowerofthemodelAslopeoflessthanindicatesthatthemodelunderpredictstheobservationMethodusedbiasandaccuracyvaluesasaquantitativewaytomeasurethegoodnessoffitofthemodels(,)Thebiasfactorindicatesbyhowmuch,onaverage,amodeloverpredicts(biasfactor)orunderpredicts(biasfactor,)theobserveddataBiasfactornOlogSpredictedvalueobservedvalueD()TheaccuracyfactorindicatesbyhowmuchthepredictionsdifferfromtheobserveddataAccuracyfactornOUlogSpredictedvalueobservedvalueDU()Inbothequations,nisthenumberofobservationsusedinthecalculationInaperfectmodel,boththebiasandaccuracyfactorsareequaltoSimplificationofthelogisticmodelDatafromCheaetal()wereusedtodemonstratetheprocedureforreducingthenumberofparametersinthelogisticmodelandtoshowhowbetterphysiologicalinsightintotheexperimentmightbederivedfromthereducedmodelZHAOETALAPPLENVIRONMICROBIOLRESULTSDatasetI:percentgrowthpositiveThirtypercentofthepredictionsfromlinearregressionareoutofthetorange(Table)FifteenpredictionsfromthelogisticmodelareclosertotheobservedSevenlinearpredictionswereinaccuratelymadebetterbyforcingpredictionsoverto,andoneconditionwasmadefalselybetterbyforcingthepredictionlowerthantoThepercentagebetterpredictedbylogisticregressioniscalculatedbyexcludingthesedatapoints:Percentagebetterbylogistic()Therangeofthedifferences(predictedminusobserved)fromlogisticregressionismorethantimessmallerthanthatfromlinearregressionTheIQRfromlogisticregressionisaboutonethirdofthatfromlinearregressionThedeviationvalueofthelogisticmodelismorethantimessmallerPredictionsfromlogisticregressionaremuchbetterthanthosefromlinearregressionovertheentirerangeandespeciallyatpointsclosertoand(Fig)Threepredictionsbythelinearmodel,eachwithanobservationof,are,,and,whilethelogisticpredictionsaremuchbetter:,,andThetwoobservationswerepredictedbythelinearmodeltobeand,,whilethelogisticpredictionswereandAnotherobservationatthelowerrange,,waspredictedtobeandbylinearregressionandlogisticregression,respectivelyThefittedlineforthepredictedvaluesfromlogisticregressionwasveryclosetoaperfectfit(Table)Thefittedlineforthelinearmodelpredictionsversustheobservationswasconsiderablyworse,withaslopeofabout,suggestingsystematicunderpredictionThebiasandaccuracyfactorsforlogisticregressionTABLEComparisonofresultsforlinearandlogisticregressionswithfivedifferentmethodsinfourdifferentdatasetsMethodnoParameterDatasetI(n)DatasetII(n)DatasetIII(n)DatasetIVmodel(n)DatasetIVvalidation(n)LinearLogisticLinearLogisticLinearLogisticLinearLogisticLinearLogisticNoofpredictionsNoofpredictions,NoofpredictionsoflogisticmodelclosertoobservedNAaNANANANAofpredictionsoflogisticmodelclosertoobservedNANANANANAbMininumstquarterMedianMeanrdquarterMaximumRangeIQRcDeviationdInterceptSlopeRBiasAccuracysinobservationaNA,notapplicablebDifferencesofvalues(predictedobserved)weresummarizedrangemaximumminimumIQRthirdquarterfirstquartercDeviationofthemodelfromobservationwascalculatedasS(predictedobserved)dLinearregressionwasdonebetweenpredictedandobservedvaluesAperfectmodelwouldhaveaninterceptvalueof,aslopeof,andanRvalueofFIGGoodnessoffitoflinearregressionandlogisticregressionforCbotulinumpercentgrowthpositive(DatasetI)fromZhaoetal()VOL,MODELINGPERCENTAGEWITHLOGISTICREGRESSIONareslightlyclosertothanthoseforlinearregression(Table)DatasetII:germinationextentApproximatelyofthelinearpredictionsareoutofrange(Table)ApproximatelyofthepredictionsfromlogisticregressionareclosertotheobservedvaluesTherangeofthedifferences(predictedminusobserved)fromthelogisticmodelislessthanonethirdthatfromthelinearmodel,andtheIQRisalmostonesixththatfromthelinearmodelThedeviationvalueofthelogisticmodelistimessmallerThelinefittedtothepredictedvaluesfromthelogisticmodelcomparedtoobservedvaluesisveryclosetotheperfectfittingline(Fig,Table)Thefittedlineforpredictionsfromthelinearmodelhadaslopeofonly,suggestingunderprediction(Table)Threeofsevenobservedvaluesofhadhigherlinearpredictions,at,,and,whiletheremainingfourpredictionswerenegativeAllsevenlogisticpredictionsareverycloseto,withthelargestbeingThreehigherobservations,,,and,werepredictedtobe,,andbythelinearmodel,whilelogisticregressionproducedmuchmoreaccuratepredictionsat,,and,respectivelyThebiasfactorsforthetwomodelsarealmostthesame,andthelogisticmodelisslightlymoreaccurateasjudgedbytheaccuracyfactor(Table)DatasetIII:probabilityofonecellofListeriamonocytogenestogrowThisisaveryspecialdatasetsinceofdatapointsareeitherorResultsdemonstratedthatlogisticregressionisamuchmorepowerfultoolwhenmodelingthistypeofdatasetApproximatelyofthelinearpredictedvaluesareoutofrangeAllobservationsarepredictedbetterbylogisticregressionTherangeofthedifferences(predictedminusobserved)fromthelogisticmodelismorethanfoldsmallerTheIQRanddeviationvaluefromthelogist

类似资料

编辑推荐

近景摄影测量-冯文灏着.pdf

中古汉语研究.pdf

中国药典美国药典欧洲药典下载链接.pdf

Cisco IOS Cookbook 2nd Edition.pdf

中国古代性文化.pdf

职业精品

精彩专题

注册化工工程师,专业考试如何备考?

注册化工工程师含金量很高,在经济建设中从事化工工程(包括化工、石化、化纤、医药和轻化)设计及相关业务活动的专业技术人员必须要有这个证书。那么全国注册化工工程师报考条件,注册化工工程师考几年,注册化工工程师如何备考~等等这些问题你都知道吗?

用户评论

0/200
    暂无评论
上传我的资料

精选资料

热门资料排行换一换

  • 列朝诗集小传·[清]钱谦益·(中…

  • 书影·[清]周亮工·(中国文学参…

  • 今世说·[清]王晫·(中国文学参…

  • 云麓漫钞·[宋]赵彦卫·(中国文…

  • 本事词·[清]叶申芗·(中国文学…

  • 本事诗续·[宋]聂奉先·(中国文…

  • 云溪友议·[唐]范摅·(中国文学…

  • 唐音癸签·[明]胡震亨·(中国文…

  • 醉翁谈录·[宋]罗烨·(中国文学…

  • 资料评价:

    / 7
    所需积分:1 立即下载

    意见
    反馈

    返回
    顶部