• No results found

Bayesian network modeling: a case study of an epidemiologic system analysis of cardiovascular risk

N/A
N/A
Protected

Academic year: 2022

Share "Bayesian network modeling: a case study of an epidemiologic system analysis of cardiovascular risk"

Copied!
15
0
0

Laster.... (Se fulltekst nå)

Fulltekst

(1)

jou rn al h om ep a ge :w w w . i n t l . e l s e v i e r h e a l t h . c o m / j o u r n a l s / c m p b

Bayesian network modeling: A case study of an epidemiologic system analysis

of cardiovascular risk

P. Fuster-Parra

a,b,∗

, P. Tauler

b

, M. Bennasar-Veny

b

, A. Lig˛eza

c

, A.A. López-González

d

, A. Aguiló

b

aDepartmentofMathematicsandComputerScience,UniversitatIllesBalears,PalmadeMallorca, BalearesE-07122,Spain

bResearchGrouponEvidence,Lifestyles&Health,ResearchInstituteonHealthSciences(IUNICS), UniversitatIllesBalears,PalmadeMallorca,BalearesE-07122,Spain

cDepartmentofAppliedComputerScience,AGHUniversityofScienceandTechnology,KrakówPL-30-059,Poland

dPreventionofOccupationalRisksinHealthServices,GESMA,BalearicIslandsHealthService,HospitaldeManacor, Manacor,BalearesE-07500,Spain

a r t i c l e i n f o

Articlehistory:

Received18August2015 Receivedinrevisedform 28November2015

Accepted11December2015

Keywords:

Bayesiannetworks Modelaveraging Cardiovascularlostyears Cardiovascularriskscore Metabolicsyndrome

Causaldependencydiscovery

a bs t r a c t

Anextensive,in-depthstudyofcardiovascularriskfactors(CVRF)seemstobeofcrucial importanceintheresearchofcardiovasculardisease(CVD)inordertoprevent(orreduce) thechanceofdevelopingordyingfromCVD.Themainfocusofdataanalysisisonthe useofmodelsabletodiscoverandunderstandtherelationshipsbetweendifferentCVRF.

InthispaperareportonapplyingBayesiannetwork(BN)modelingtodiscovertherela- tionshipsamongthirteenrelevantepidemiologicalfeaturesofheartagedomaininorder toanalyzecardiovascularlostyears(CVLY),cardiovascularriskscore(CVRS),andmetabolicsyn- drome(MetS)ispresented.Furthermore,theinducedBNwasusedtomakeinferencetaking intoaccountthreereasoningpatterns:causalreasoning,evidentialreasoning,andintercausal reasoning.ApplicationofBNtoolshasledtodiscoveryofseveraldirectandindirectrelation- shipsbetweendifferentCVRF.TheBNanalysisshowedseveralinterestingresults,among them:CVLYwashighlyinfluencedbysmokingbeingthegroupofmentheonewithhigh- estriskinCVLY;MetSwashighlyinfluencebyphysicalactivity(PA)beingagainthegroup ofmentheonewithhighestriskinMetS,andsmokingdidnotshowanyinfluence.BNs produceanintuitive,transparent,graphicalrepresentationoftherelationshipsbetween differentCVRF.TheabilityofBNstopredictnewscenarioswhenhypotheticalinformation isintroducedmakesBNmodelinganArtificialIntelligence(AI)toolofspecialinterestin epidemiologicalstudies.AsCVDismultifactorialtheuseofBNsseemstobeanadequate modelingtool.

©2015ElsevierIrelandLtd.Allrightsreserved.

Correspondingauthorat:DepartmentofMathematicsandComputerScience,UniversitatIllesBalears,PalmadeMallorca, BalearesE-07122,Spain.Tel.:+34971171386.

E-mailaddress:[email protected](P.Fuster-Parra).

http://dx.doi.org/10.1016/j.cmpb.2015.12.010

0169-2607/©2015ElsevierIrelandLtd.Allrightsreserved.

(2)

1. Introduction

BayesianNetworks(BNs)[1,2]alsoreferredtoasBeliefNetworks or probabilistic causal networks are an established frame- work foruncertainty management inArtificial Intelligence (AI). They constitute a tool which combines graph theory and probability theory to represent relationships between variables(nodesinthegraph)[3].Contrarytodeterministic understanding ofthe causality phenomenon[4], BN model- inghasitsoriginswithindataminingandmachinelearning research[5,6]and capturesprobabilisticinfluencesinduced outofbigdatasets.Theyconstituteapowerfulknowledgerep- resentationandanefficientreasoningtoolunderconditions ofuncertainty[7].Thenetworkstructureisadirectedacyclic graph(DAG)whereeachnoderepresentsarandomvariable [8,9]andthearcsaresuitableforrepresentingcausality[10].

BNshavebeenproventobeastrongtooltodiscoverthe relationshipsbetweenvariablesthatattemptstoseparateout directandindirectdependencies[11,12],andcancapturethe wayanexpertunderstandstherelationshipsamongallthe features[13].BNmodelingiswidelyusedinfieldslikeclin- ical decision support [14], systems biology [15,16], human immunodeficiencyvirus(HIV)andinfluenzaresearch[17,18], analyzes of complex disease systems [19–21], interactions betweenmultiplediseases[22],andalsoindiagnosticdiseases [23–27].

The metabolic syndrome is a set of risk factors that includeabdominal obesity, insulinresistance, dyslipidemia and hypertension leading to increased risk of developing cardiovasculardiseasesandtype2diabetes[28–31].Cardiovas- culardisease(CVD)epidemiologyisaworldwidepublichealth problem[32].TheeconomicburdenofCVDisalreadyaffecting theeconomiesoftheworld’swealthiestcountries.However, inthenextdecadesdevelopingcountrieswillbemoreaffected duetothegreatincreaseinCVDprevalenceexpectedinthese countries[33].Itisestimatedthatin2015,morethan20million peoplemaydieworldwidebecauseofCVD.Thisnumber is expectedtoincreaseintheupcomingdecades,thatevery5s intheworldamyocardialinfarctionwouldoccur[34,35].

CVDsare closelyrelatedtothe well-knowncardiovascu- larriskfactors(CVRF).TheconceptofCVRFappearedin1961, whenthegroupofKanneddefinedCVRFasbiologicaltraits orbehaviorsthatincreasedthechanceofdevelopingordying fromCVD[36,37].Thehighprevalenceofcertainriskfactors towhich we are exposed is the cause ofthis situation, in whichtheprevalenceofCVDisincreasedeveryyear.Itisnec- essarytocontrolthefactorsthatinfluencethedevelopment ofCVD,suchassmoking,hyperlipidemia,hypertension,dia- betes,obesity,adiethighinsaturatedfats,alcoholabuse,a sedentarylifestyle,andstress[38].Infact,WHO(WorldHealth Organization)estimatesthat80%ofprematuredeathsfrom cardiovascular disease and diabetes could beprevented by efficientcontrollingtheseriskfactors[39].

Therearesomescoresthatnumericallyquantifycardiovas- cularrisk(CVR).OneofthemostwidelyusedisFramingham score, withits calibrated form forthe Spanish population, theFramingham-REGICOR[35].Thisscaleestimatestheglobal CVRto10yearsanditisexpressedasapercentage.Recently, anewscorehasbeenproposed,theso-calledHeartAgetool

(HA),whichisbasedonFraminghamscore,andsupposesa simpleandgraphicwaytocommunicatetheCVRbecauseit expresses theCVRasanage.If theHAvalueisolderthan chronological age the term “lostyears”, definedas the HA minusthechronologicalage,couldbeused.TheHAisanovel concept designedspecificallyto helppeople tounderstand theirowncardiovasculardiseaseriskandimplementchanges intotheirlifestylestopreventtheincidenceofCVD[40].

Development and analysis of models to examine the relationshipsbetweendifferentCVRFcouldbenotonlyofthe- oreticalinterest,butcanserveasagenerictoolforapplication oriented activities: explanation, prediction, monitoring and prevention.Itenablesboththeoreticalanalysisoftherelation- shipsbetweennumerousvariables,andhavinginmindthe probabilisticnatureofthecausaldependencies,BNsseemto beanadequatetool.Moreover,BNmodelsarecapableofcreat- ingdifferentscenariosbasedonhypotheticalcaseswhennew observationsareinstantiated.

Thepaperis organizedasfollows.Section 2introduces BNs andsome basicconceptsforinference flow.Section 3 presents the materialsand methods forthe epidemiologic studyandtheprocessofinducingaBNfromadataset.Sec- tion 4showsdifferentreasoningpatternstoanalyzetheBN.

Section 5presentsadiscussion.Finally,Section 6concludes thepaper.

2. Bayesian networks

ABNconsistsof[41]:(i)asetofvariablesandasetofdirected edgesbetweenthesevariables,where(ii)eachvariable has a finite set ofmutually exclusive states, and (iii)the vari- ablestogetherwiththedirectededgesformaDAG.BNmodels estimatethejointprobabilitydistributionPoveravectorof randomvariablesX=(X1,...,Xn).Thejointprobabilitydistri- butionfactorizedasaproductofseveralconditionaldistribu- tions denotesthedependency/independency structurebya DAG:

P(X1,...,Xn)=

n i=1

P(Xi|Pa(XGi)) (1)

Eq.(1)(where Pa(XGi)denotestheparentnodesofXi)isthe mainreasonfortheformulationofamultivariatedistribution byBNs;thisequationisalsocalledthechainruleforBayesian networks.

AsBNsare usedtomakeinference[8],itisnecessaryto understand theflowofinfluencewhennewinformationis introducedinaBN.Belowweintroducesomebasicconcepts.

TwovariablesXandYinaBNared-separatedif,forevery possiblepathbetweenXandY,thereisanintermediatevari- ableZsuchthateither:(i)theconnectionisserial(X→ZYor XZY)ordiverging(X←ZY)andZisinstantiated,or(ii) theconnectionisconverging(X→ZY)andneitherZnorany ofZ’s descendantshavereceivedevidence.Wheninfluence flowsfromanodeXtoanothernodeYviaanodeZ,itissaid thatthetrailXZYisactive.AcausaltrailXZY(serial connection),anevidentialtrailXZY(serialconnection) or,acommoncausetrailXZY(divergingconnection)is activeifandonlyifZisnotobserved.Acommoneffecttrail

(3)

XZY(convergingconnection)isactiveifandonlyifeither ZoroneofZ’sdescendantsisobserved.

Let P be a joint probability distribution of the random variables in some set of features F, the set of arcs is denoted by A, and a DAG G=(F, A); then (G, P) satisfies the local Markov condition if for each variable (feature) X

F, X is conditionally independent of the set of all its non-descendantsgiventhesetofallitsparents.Theglobal MarkovpropertystatesthatanynodeXisconditionallyinde- pendent of any other node given its Markov blanket, i.e., I(X,nonmarkovblanket(X)|markovblanket(Xi));theMarkov blanketofanodeincludesitsparents,itschildren,andthechil- dren’sotherparents(spouses).AnynodeintheBNwouldbe d-separatedofthenodesbelongingtothenon-Markovblanket givenitsMarkovblanket.

3. Data and methodological issues

Thissection presentssomemethodologicalissuesconcern- ingdata acquisition.Reliabilityofdatawas assureddueto standardmedicalprocedures.Abriefdescriptionfollows.

3.1. Participants

Allparticipantswere workersfromthe publicsectorofthe BalearicIslands (Spain).Subjects inthe study were invited toparticipateduringtheir annualworkhealthassessment.

Anyworkerattendingtheworkhealthassessmentcouldbe includedinthestudy.4300workerswereinvitedtopartici- pate.Amongthem,3993subjects(Men=1758,Women=2235) agreedtoparticipate.Participantssignedinformed consent priortoenrollment.Afteracceptance,acompletefamilyand personal medicalhistory was recorded.The project ofthe studywasinaccordancewiththeDeclarationofHelsinkiand receivedapprovalfromtheBalearicIslandsClinicalResearch EthicalCommittee.

3.2. Instruments

3.2.1. Determiningvariables

Allanthropometricmeasurementsweremadeinthemorning after an overnight fast, and according to the recommen- dations ofthe International Standards for Anthropometric Assessment[42].Bodyweight(electronicscaleSeca700;Seca, Hamburg,Germany),height(stadiometerSeca220cm),and abdominal waist circumference (Lufkin Executive Thinline W606,precision1mm)weredeterminedaccordingtorecom- mendedtechniquesmentionedabove.Bodymassindex(BMI) wascalculatedasweight(kg)dividedbyheight(m)squared.

BMIvalueswerecategorizedfollowingthecriteriafromWHO [39].

Blood samples were collected during the same session and in the same place after an overnight fast of 12h.

Serumwas obtainedand totalcholesterol,HDLcholesterol, glucose, and triglycerides were measured using an auto- matedanalyzer(TechniconDAXsystem).Bloodpressurewas measuredwithacalibrated automaticsphygmomanometer (OmronM3).Measurementswererepeatedthreetimeswitha pauseof1minbetweenmeasurementsandtheaveragevalue

was recorded. To calculate physical activity practice, self- reportednumberofsessionsofphysicalactivityperweekwas obtained.

3.2.2. Determiningcardiovascularriskvariables

Thepresenceofmetabolicsyndrome(MS)wasascertainedby usingthecriterionsuggestedbytheNationalCholesterolEdu- cationalProgramAdultTreatmentPanelIII(NCEPATPIII).The FraminghamequationcalibratedfortheSpanishpopulation (Framingham-REGICOR)wasusedtodeterminethecardiovas- cularriskat10years(softwaretoolcalcumedplus,availableat http://www.fisterra.com).Classificationoftheparticipantsin thestudyaccordingtocardiovasculardisease(CVD)riskwas the Framingham-REGICOR guidelines:>10% High risk CVD, 5–9.9%ModerateriskCVD,<5%LowriskCVD[43].

TheheartagewascalculatedusingtheHeartAgeCalcula- tor,availableathttp://www.heartage.me.Cardiovascularlost years(CVLY) isdefinedasthedifferencebetweentheheart ageandthechronologicalage[44].CVLYtakesthevalues:First Quartile[−20,−4],SecondQuartile[−3,3],ThirdQuartile[4, 12],andFourthQuatile[13,20].

With slight differences between them, the parameters requiredforcalculatingtheFramingham-REGICORscoreand theheartageare:age,sex,height(incentimeters),weight(in kilograms),waistcircumference(incentimeters),familiarhis- toryofcardiovascular diseases,thepresenceor absenceof diabetes,smokinghabit,totalcholesterolandHDL-cholesterol levels, and systolic pressure or antihypertensive treatment [45].

3.3. LearningBayesiannetworks

ToobtainaBN,itisnecessarytodetermineastructure(defined byaDAG)andtheconditionalprobabilitiesassignedtoeach nodeoftheDAG.Therefore,tolearnaBNimpliestwotasks:

(i)structurallearning,thatis,theidentificationofthetopology oftheBN,and(ii)parametriclearning,thatistheestimationof numericalparameters(conditionalprobabilities)givenanet- worktopology.

3.3.1. Structurallearning

Theproblemofdiscoveringthecausalstructureincreaseswith thenumberofvariables[46–48].Table 1showsadescription ofthevariablesconsidered.

WeareinterestedinobtainingaDAG,soonlythreepossible connectionsareconsidered.Thenumberofdifferentstruc- tures,f(n),growsmorethanexponentiallyinthenumberof nodes,in[49]thefollowingefficientlycomputablerecursive functionisgiveninEq.(2):

f(n)=

n

i=1

(−1)i+1 n!

(n−1)!n!2i(n−1)f(n−1) (2)

Therearetwoapproachestostructurelearningthatcould basicallybeconsidered[50]:(i)search-and-scorestructurelearn- ing,and (ii)constraint-basedstructurelearning; combination ofbothgivesahybridlearningframework.Search-and-score searchalgorithmsassignsanumber(score)toeachBNstruc- ture,andthenthestructuremodelwiththehighestscoreis

(4)

Table1–Descriptionof13datasetfeaturesusedtolearnthestructure.

Variablename Description Values

Gender MaleandFemale Men,Women

Age Ageinyears 35–44,45–54,55–64

Smoking Neversmoker,Formersmoker Neversmoker,Formersmoker,

andCurrentsmoker Currentsmoker

PA Physicalactivity(threeor Nopractice,Practice

moretimes/weekduring1h)

BMI Bodymassindex(kg/m2) Underweight,Normalweight,

OverweightGI,OverweightGII, ObesityTI,ObesityTII,ObesityTIII

WC Waistcircumference(cm) High,Normal,Veryhigh

BP Bloodpressure(mmHg) Normal,Optimal,Normalhigh

Mild,Moderate,Serious

HDL HDL-cholesterol(mg/dl) Normal,Low,High

CVLY Cardiovascularlostyears Firstquartile,Secondquartile

Thirdquartile,Fourthquartile

Glucose Fastingbloodglucose(mg/dl) High,Normal

TG Triglycerides(mg/dl) Normal,Limit,Hyper

CVRS Framingham-REGICORscore Low,Moderate,High

MetS Metabolicsyndrome Yes,No

chosen.Constraint-basedsearchalgorithmsestablishasetof conditionalindependenceanalysisonthedata[51].Usingthis analysisanundirectedgraphcouldbegenerated.Takinginto accountadditionalindependencetest,thenetworkistrans- formedintoaBN.Hybridalgorithmscombineaspectsofboth constraint-basedandscore-basedalgorithms,theyusecon- ditional independencetestto reducethe searchspace and networkscore to findthe optimal networkin the reduced space.

InordertoobtaintheDAG,weusedthebnlearnpackage [52,53]ofRlanguage[54].Astherearemanystructuresthatare

consistentwiththesamesetofindependencies,priorknowl- edgeofthesystem understudy wastakeninto accountin modelselectionprocess;tochooseastructurethatreflectsthe causalorderanddependencies,thatisthosecausesarepar- entsoftheeffects,areconsideredstructuresthattendtowork well[1],causalgraphstendtobesparser.Causalitywouldbe intheworld,notintheinferenceprocess.

We included our prior knowledge of the system under study intothemodelselectionprocess,thusvariableswere divided into four blocks: (1) background variables={Gender, Age}, (2) conditional variables={Smoking, PA}, (3) intermediate

Fig.1–Structureobtainedbymodelaveragingover500networks.Itwasbuiltwiththehillclimbinglearningalgorithmhc frombnlearnpackageinRlanguageusingathreshold=0.85.Inmodelselectionprocessweincludedpriorknowledge,thus variablesweredividedintofourblocks:(1)backgroundvariables={Gender,Age},(2)conditionalvariables={Smoking,PA},(3) intermediatevariables= {BMI,TG,WC,HDL,BP,Glucose},and,(4)diagnosticvariables={CVLY,CVRS,MetS}.

(5)

Table2–ExpectedvaluesofprobabilitiesforSmokingfeatureconditionaloncombinationsofitsparentvalues,inthis caseconditionalonGenderandAgefeatures.

Gender Age Smoking=Former Smoking=Current Smoking=Never

Men 35–44 0.0668 0.3636 0.5695

Men 45–54 0.0845 0.3825 0.5329

Men 55–64 0.1122 0.2852 0.6026

Women 35–44 0.1139 0.3231 0.5630

Women 45–54 0.1415 0.3371 0.5206

Women 55–64 0.1348 0.1311 0.7341

variables={BMI, TG,WC, HDL, BP, Glucose},and,4)diagnostic variables={CVLY,CVRS,MetS}.Werestrictedthemodelselec- tionprocessbyblacklistingarrowsthatpointfromalatertoan earlierblock[55].Toobtainthestructure,twooptionseither selectasinglebestmodelorobtainsomeaveragemodel,which isknownasmodelaveraging[56].Ourmodelwaslearntbyhill- climbing(hc)algorithm.Thefinalmodelwasobtainedrepeating severaltimesstructurelearning,alargenumberofnetwork structureswere explored(500BNs)to reducethe impactof locallyoptimal(butgloballysuboptimal)networksonlearn- ing.Thenetworkslearnedwere averaged toobtain amore robustmodel.Theaveragednetworkstructurewasobtained usingthearcspresentinatleast85%ofthenetworks,which givesameasureofthestrengthofeacharcandestablishesits significancegivenathreshold(85%)(seeFig.1).

3.3.2. Parametriclearning

Parameterswereobtainedagainwiththebnlearnpackagein RlanguagebyperformingaBayesianparameterestimation usingtheDirichletdistribution[57].

Aconditionalprobabilitydistributionisobtainedforeach node.InTable2anexampleofconditionalprobabilitydistri- butionisshown.

3.4. Cardiovascularriskmodel

AlthoughthebnlearnpackageinRallowsustomakeinfer- ence,inordertohaveacleargraphicalrepresentationfromthe structureandparametersobtainedwithbnlearninRlanguage theBNwasimplementedinNetica[58].Thecompilednetwork isrepresentedinFig.2.Thejointprobabilitydistributionofthe BNinFig.2requiresthespecificationof13conditionalproba- bilitytables,oneforeachvariableconditionedtoitsparents’

set.

AswecanobserveinFig.2,CVLYandCVRSvariableshave adirectconnection,andbothareconnectedtoMetSvariable throughdifferenttrails,e.g.,MetSvariableisconnectedtoCVLY variablethroughBPvariable(BPisacommoncause),onceBPis instantiatedtheconnectionviathistrailisbroken),andMetS variableisalsoconnectedtoCVRSvariablethroughTGvariable

Fig.2–BNforthestudyoffeaturesrelationshipstoevaluateCVLY,CVRSandMetSfeatures.TheBNshowsanoptimal(46.8%) bloodpressure(BP),normal(82.7%)triglycerides(TG),normal(87.2%)Glucose,normalweight(43.2%)(BMI),andpractice physicalactivity(PA)(47.7%)andnopracticephysicalactivity(PA)(52.3%).ItalsoshowslowlevelsofFramingham-REGICOR score(CVRS)(91.8%),nometabolicsyndrome(MetS)(88.3%)andsimilarcardiovascularlostyears(CVLY)inthefourquartiles.

(6)

(itisalsoacommoncause,onceTGvariableisinstantiatedthe connectionviathistrailisbroken),howeverthereareother possibletrailssuchas:MetSHDLGenderCVLY,MetSWCGenderCVLYCVRS,etc.

ThefinalBNobtainedfromthedatasetshowsaHighlike- lihoodinLowvalueofCVRSvariable,aHighlikelihoodinNo valueofMetSvariable,aHighlikelihoodinnormalvalueofGlu- cosevariable,aHighlikelihoodinNormalvalueofTGvariable,a highlikelihoodinNormalvalueofBPvariable,ahighlikelihood inNormal valueofHDLvariable,ahighlikelihoodinNormal weightvaluesofBMIvariable,ahighlikelihoodinNormalvalue ofWCvariable,ahighlikelihoodinNevervalueofSmokingvari- able,andsimilarlikelihoodsinthedifferentlabelsofCVLYand PAvariables.

3.5. ValidationoftheBN

TheBNwasvalidatedusinga10-foldcross-validationforBN, usingalog-likelihoodlossfunction,obtaininganexpectedloss of9.3895.InTable3,theareaundertheROCcurve(AUC),and thepercentagecorrectlyclassifiedforthedifferentfeaturesis shown.

3.6. Performancecomparison

InordertoprovidereferencebenchmarksabouthowourBN classifies, we also report other classification performances (seeTable4)obtainedbythewidelyusedNaïveBayes(NB), Tree Augmented Naïve Bayes (TAN), Multilayer Perceptron (MLP),andtheC4.5decisiontreealgorithmintegratedinWEKA [59]. Onlythe diagnostic features (CVLY, CVRS, MetS) were consideredasacomparativeexample.Performanceofeach classificationmodelisevaluatedusingthreestatisticalmeas- ures:accuracy,sensitivityandspecificity.

LearningaBNfromdataisaformofunsupervisedlearning, inthesensethatthelearnerdoesnotdistinguishtheclass variablefromtheattributevariablesinthedata[60].Wecom- pareourBNwithseveralsupervisedlearningalgorithms:NB, TAN,MLP,andtheC4.5decisiontree.

NBand TANclassifiersare specialtypesofBN, wherea supervisedlearnisperformed.NB isaprobabilistic graphi- calclassifierbasedonBayestheoremwhichusesverystrong assumptionsontheindependencebetweenthepredictorvari- ables.TheNBmodelassumesthatinstancesfallintooneof anumber ofmutually exclusive classes,and it isthe sim- plestBNclassifier,wherethepredictivevariablesareassumed tobeconditionallyindependentgiventheclass.Theperfor- manceofNBissurprising,sincethisassumptionisunrealistic.

TheTAN classifier [60]extends the NB model witha tree- shape graph across the predictor variables. TAN model is similartoNBexceptthateach predictorvariableisallowed todependonotherpredictorvariableinadditiontotheclass.

ThismodelprovidesmoreinformationthantheNBmodelas itisincludedinformationabouttherelationshipamongall predictorvariables.MLPisafeedforwardartificialneuralnet- workmodelwhichconsistsofmultiplelayersofnodesina directedgraph,witheachlayerfullyconnectedtothenextone.

C4.5algorithmisadecisiontreeinductionmethoddevelopby Quinlan[61].

Table3–AUCsandpercentagecorrectlyclassifiedforthe differentfeatures.

Variablename State AUC Accuracy

Gender Men 0.9048 82.1938

Gender Women 0.9047 82.1938

Age 35–44 0.6756 53.4435

Age 45–54 0.6088 53.4435

Age 55–64 0.7273 53.4435

Smoking Formersmoker 0.6864 73.3534

Smoking Currentsmoker 0.8772 73.3534

Smoking Neversmoker 0.8117 73.3534

PA Nopractice 0.8763 78.6126

PA Practice 0.8773 78.6126

BMI Underweight 0.8242 55.1966

BMI Normalweight 0.8460 55.1966

BMI OverweightGI 0.7110 55.1966

BMI OverweightGII 0.7338 55.1966

BMI ObesityTI 0.8654 55.1966

BMI ObesityTII 0.8905 55.1966

BMI ObesityTIII 0.8638 55.1966

WC High 0.7487 73.0278

WC Normal 0.8677 73.0278

WC Veryhigh 0.9150 73.0278

BP Normal 0.7384 59.2787

BP Optimal 0.8902 59.2787

BP Normalhigh 0.7505 59.2787

BP Mild 0.8453 59.2787

BP Moderate 0.8805 59.2787

BP Serious 0.9408 59.2787

HDL Normal 0.7639 69.4465

HDL Low 0.8762 69.4465

HDL High 0.8806 69.4465

CVLY Firstquartile 0.9188 63.9600

CVLY Secondquartile 0.7926 63.9600

CVLY Thirdquartile 0.8238 63.9600

CVLY Fourthquartile 0.9335 63.9600

Glucose High 0.7274 87.1525

Glucose Normal 0.7277 87.1525

TG Normal 0.8523 84.5980

TG Limit 0.7953 84.5980

TG Hyper 0.8636 84.5980

CVRS Low 0.8095 91.2597

CVRS Moderate 0.8201 91.2597

CVRS High 0.8067 91.2597

MetS Yes 0.9836 96.4438

MetS No 0.9835 96.4438

ThemajoradvantageofBNistheabilitytorepresentand hence understandknowledge.OurBNmodelgivesthe best classificationperformances.Furthermoretheirgraphicalrep- resentationisveryinformative.

4. Reasoning patterns

BNsareusedtocalculatenewprobabilitieswhennewinfor- mation is obtained [8]. Given the evidence E=e, our goal is to find the most likely assignment to the variables in U=complementary(E),seeEq.(3):

MAP(U|e)=argmax

u P(u,e) (3)

There are twomaintypes ofqueries:(1) inaprobability query,wetrytofindthemostlikelyassignment toasingle

(7)

Table4–PerformanceforCVLY,CVRS,andMetSfeatures comparingourBNandusinga10-foldcrossvalidation experimentswiththecorrespondingalgorithms.

Algorithms CVLY CVRS MetS

Accuracy

Bayesiannetwork 63.9600 91.2597 96.4438

NaïveBayes 59.0033 90.4833 95.4921

TreeAugmentedNaïveBayes 63.8900 91.2580 96.0690 Multilayerperceptron 61.9835 91.2596 96.2434

TreesC4.5 62.1337 91.2597 95.4420

Sensitivity

Bayesiannetwork 0.6392 0.9131 0.9901

NaïveBayes 0.5901 0.9050 0.9550

TreeAugmentedNaïveBayes 0.6389 0.9126 0.9900 Multilayerperceptron 0.6200 0.9130 0.9620

TreesC4.5 0.6210 0.9130 0.9544

Specificity

Bayesiannetwork 0.8785 0.2874 0.7967

NaïveBayes 0.8610 0.2790 0.7920

TreeAugmentedNaïveBayes 0.8784 0.0874 0.7565 Multilayerperceptron 0.8720 0.0870 0.7670

TreesC4.5 0.8740 0.0870 0.7460

variable,i.e.tocomputeP(X|e);(2) inaMAPquery,wefind themostlikelyjointassignmenttothevariablesinU.Inorder tointroduceevidenceinthenetworkwehaveselectedthree reasoningpatterns:causalreasoning,evidentialreasoning,and intercausalreasoning.

4.1. Causalreasoning

Causal reasoningtakesplacewhenwepredict effectsfrom causes(ansoweproceedfromtoptobottom).Weinstantiate onevariableateachasinglestep.Instep1Gendervariableis instantiatedeithertoMenorWomen,instep2Smokingvari- ableisinstantiatedtoCurrentSmokerorNeverSmoker,instep 3physicalactivity(PA)variableisinstantiatedtoPracticeorNo Practice,instep4Agevariableisinstantiatedto35–44,instep5 Agevariableisinstantiatedto45–54,andinstep6Agevariable isinstantiatedto55–64.

4.1.1. AnalysisofcardiovascularlostyearsCVLYvariable InFig.3,asummaryabouthowthedifferentquartilesofCVLY variablechangesateachstepisshown.Takingintoaccount theconditionalvariables(Smokingandphysicalactivity)the onewithgreatestinfluenceoncardiovascularlostyearsCVLY isthesmokinghabit,obtainingtwoclearpatterns:(1)When Smoking isintheNeverstate,Fig.3shows thatthehighest probabilityisachievedforfirstquartileWomenfollowbythe secondandthirdquartilesinMen.AddingphysicalactivityPA variableinthePracticestateshowsadecreaseintheprobabil- ity forfourthquartileinMenand Women,showingslower values in Women;and also,an increase in the probability forfirstquartileinWomenandforsecond quartileinMen;

and (2) When Smokingis inthe Current state, Fig.3 shows that the highestprobability is achieved forfourth quartile MenfollowedbythethirdquartileWomen.Fig.3alsoshows

Fig.3–Stepbystepinstantiations.Thedifferentsteps:step1=Gender,step2=Smoking,step3=PA,step4=Age=35-44, step5=Age=45–50,and,step6=Age=55–64toevaluateCVLY.WhereS=Smoking,andPA=PhysicalActivity.Thedifferent stepsarerepresentedinthehorizontalaxis.TheestimatedprobabilityforCVLYvariableexpressedasapercentageatthe differentquartilesisshowedintheverticalaxis:M:Men,andW:Women.

(8)

Fig.4–Stepbystepinstantiations.Thedifferentsteps:step1=Gender,step2=Smoking,step3=PA,step4=(Age=35–44), step5=(Age=45–50),and,step5=(Age=55–64)toevaluateMetSfeature.WhereS=Smoking,andPA=PA=NoPractice.The differentstepsarerepresentedinthehorizontalaxis.TheestimatedprobabilityforMetSvariableexpressedasapercentage atthedifferentvalues(yes,no)isshowedintheverticalaxis:M:Men,andW:Women.

animprovement ofthe situation when physicalactivity is instantiatedtoPracticeandifthegroupoftheyoungestpopu- lationisconsidered,beingthegroupofMenwiththehighest risk.

4.1.2. StudyingmetabolicsyndromeMetS

From Fig. 4, we can differentiate two patterns taking into accountwhetherthesubjectspracticephysicalactivityornot (PAvariable).When physicalactivity(PAvariable)isinstan- tiated to Practice we obtain the highest probability in the No state for Metabolic Syndrome (MetS variable), showing thatSmokingvariable doesnothaveanyinfluenceandthe groupof Womenwere the mostprivileged (withthe high- est probability forMets variable inthe No state). However, when Physical Activity (PA variable) is instantiated to No Practiceweobserve thatforMetabolicSyndrome(MetSvari- able)intheYesstatetheprobabilityincreases,showingthat theSmokingvariabledoesnothaveanyinfluenceagain.The groupwiththehighestriskofgettingMetS=Yesisthegroup ofMen.

4.1.3. StudyingcardiovascularriskscoreCVRS

FromFig.5whenphysicalactivityisinstantiatedtoPracticewe obtainsimilarprobabilitiesforCVRSvariableindependentlyof whetherthesubjectsmokesornot.Similarlywhenphysical activityisinstantiatedtoNoPractice.

4.2. Evidentialreasoning

Queries,wherewereasonfromeffectstocases(frombottom toup),areinstancesofevidentialreasoningorexplanation.

MetsandCVRSvariablesareinstantiatedtovaluesYesand Highrespectively.Weobservehowtheprobabilityofthedif- ferent variableschanges.CVLYvariable increasesitsFourth quartilevaluefrom22.4%to90%.BPvariableincreasesitsMild valuefrom 16.1% to44.5%,anddecreases its Optimalvalue from46.8%to3.28%.TGvariableachievesimilarlikelihoodsfor allitsvalues:Normal,LimitandHyper.HDLvariableincreases itsHighvaluefrom16.8%to69.1%.WCvariableincreasesits VeryHighvaluefrom26.6%to68.8%.BMIvariabledecreases its NormalWeight valuefrom43.2% to11.1%,and increases theprobabilityofOverweightGIIfrom19.9%to28.4%,andthe probabilityofObesityT1from12.6%to33.7%.ThePAvariable increasestheprobabilityoftheNoPracticevaluefrom52.3%

to89.7%.TheGlucosevariable increasesits Highvaluefrom 12.8% to31.5%.TheGendervariableincreasesits Menvalue from44.0%to78.5%.TheAgevariableincreasesits55–64value from14.6%to50.7%.Fig.6showstheprobabilityvariations.

4.3. Conditionalentropy

InShannon[62]theory,entropyofXisthelowerboundonthe averagenumberofbitsthatareneededtoencodevaluesof X.Anotherwayofviewingtheentropyisasameasureofour

(9)

Fig.5–StepbystepinstantiationstoevaluateCVRSfeature.Instep1=Gender,step2=Smoking,step3=PhysicalActivity, step4=Age=35–44,step5=Age=45–50,and,Age=55–64.

Fig.6–Evidentialreasoning.MetabolicsyndromeMetSvariableisinstantiatedtoNovalueandCVRSvariableisinstantiated toHighvalue.

(10)

uncertaintyaboutthevalueofX,i.e.,littleuncertaintyabout Xwillproducealowentropyvalue.

AnaturalquestioniswhatisthecostofencodingXifwe arealreadyencodingY.TheconditionalentropyofXgivenY is

HP(X|Y)=EP

log 1 P(X|Y)

=

P(X|Y)·log 1

P(X|Y) (4)

which captures the additional cost (in terms of bits) of encodingXwhenweare alreadyencodingY.Notethatthe maximum valueofprobability inP(X|Y)implies the lowest entropyvalue.

For MetS, CVLY, and CVRS featureswe are interested in determiningandorderingthestatevaluesforconditionedfea- turessuchasweobtainthe maximumprobability valuein somestates,whichwillleadtoachievetheminimumcondi- tionedentropy.

4.4. Intercausalreasoning

Whendifferentcausesofthesameeffectcaninteractwetalk ofintercausalreasoning,whichconstitutesaverycommonpat- terninhumanreasoning.

Furthermore,BNsareabletoproduceprobabilityestimates, inthissenseweareinterestedinknowingthefeatureswith highestinfluenceinmaximizingMetS,CVLY,andCVRSinsome oftheirstates.

4.4.1. MinimizingconditionedentropyforMetS

WemaximizeMetSfeatureprobabilityinaYesstate.Toachieve it, we consider the Markov blanket of MetS variable, it is

Table5–Step-by-stepinstantiationsleadingto maximizationoftheprobabilityoftheMetSvariable, whereintheinitialBNwithoutevidenceMetS=Yes reachedaprobabilityof11.7%.Thedifferentvalues:

Serious,Moderate,MildandNormalHforBPvariablegave thesameprobabilityvaluefortheMetSvariable.

Step Instantiated variable

Value MetS=Yes

1 TG = Hyper 48.7%

2 WC = VeryHigh 85.4%

3 HDL = Low 100%

4 BP = NormalH 100%

4 BP = Mild 100%

4 BP = Moderate 100%

4BP = Serious 100%

composedofthe fourfollowingvariables:WC,HDL, BPand TG.Wechoosefromeachstepthevariableandthestatethat inducesthegreatestincreaseintheconditionalprobabilityof MetSvariableinaYesstate.AsummaryisshowninTable5and Fig.7.GiventheMarkovblanketoftheMetSvariable,theglobal MarkovpropertystatesthattheMetSvariableisconditionally independentofanyothervariable.

Again,wemaximizetheMetSvariableprobabilityinaNo state.Wechooseateachstatethevariableandthestatethat mostincreasesthe probabilityoftheMetSvariable inaNo state.AsummaryisshowninTable6andFig.8.

4.4.2. MinimizingconditionedentropyforCVLY

WemaximizeCVLYfeatureprobabilityinFirstQuartilestate.To achieveit,weconsidertheMarkovblanketofCVLYvariable,it iscomposedofthesixfollowingvariables:CVRS,Gender,Age, Smoking,BP,andHDL.Wechoosefromeachstepthevariable

Fig.7–Intercausalreasoning:maximizingMetSfeatureintheYesstate.WetrytoobtainthehighestprobabilityforMetS= Yesafterintroducingthefollowingevidence:TG=Hyper,WC=VeryHigh,HDL=Low.

(11)

Table6–Step-by-stepinstantiationsleadingto maximizationoftheprobabilityoftheMetSvariable, whereintheinitialBNwithoutevidenceMetS=No reachedaprobabilityof88.3%.Themaximum

probabilityforMetSfeatureinstateNOisachievedwhen BP=Normal,WC=NormalandTG=Normal.

Step Instantiated variable

Value MetS=No

1 BP = Normal 95.4%

2 WC = Normal 99.1%

3 TG = Normal 100%

4 HDL = Normal 100%

4 HDL = Low 100%

4 HDL = High 100%

andthestatethatinducesthegreatestincreaseinthecon- ditionalprobabilityofCVLYvariableinFirstQuartilestate.Age featurehasnotbeenincluded,becauseitdoesnotincrease theprobabilityofCVLYinFirstQuartileonceBP, HDL, Smok- ing, Genderand CVRSfeaturesare instantiated.Asummary isshowninTable7.GiventheMarkovblanketoftheCVLY variable,theglobalMarkovpropertystatesthattheCVLYvari- ableisconditionallyindependentofanyothervariable.

Again,wemaximizeCVLYfeatureprobabilityinaSecond Quartilestate.Theorderoffeaturesis:Smoking,BP,HDL,Gender, CVRS,andAge.Achievingamaximumprobabilityvalueof68%

forCVLYfeatureinSecondQuartilevalue.Asummaryisshown inTable8.

Again, we maximize CVLY feature probability ina Third Quartilestate.Theorderoffeaturesis:BP,HDL,Smoking,Gen- der,CVRS,andAge.Achievingamaximumprobabilityvalue

Table7–Step-by-stepinstantiationsleadingto maximizationoftheprobabilityoftheCVLYvariable, whereintheinitialBNwithoutevidenceCVLY=First Quartilereachedaprobabilityof28.2%.

Step Instantiated variable

Value CVLY=First Quartile

1 BP = Optimal 51.2%

2 HDL = Low 70.7%

3 Smoking = NeverSmoker 90.8%

4 Gender = Women 91.6%

5 CVRS = Low 91.7%

Table8–Step-by-stepinstantiationsleadingto maximizationoftheprobabilityoftheCVLYvariable, whereintheinitialBNwithoutevidenceCVLY=Second Quartilereachedaprobabilityof24.3%.

Step Instantiated variable

Value CVLY=Second Quartile

1 Smoking = NeverSmoker 29.2%

2 BP = Normal 43.3%

3 HDL = Normal 58.3%

4 Gender = Men 64.2%

5 CVRS = Low 65.1%

6 Age = 55–64 68.0%

of79%forCVLYfeatureinThirdQuartilevalue.Asummaryis showninTable9.

Finally,wemaximizeCVLYfeatureprobabilityinaFourth Quartilestate.Theorderoffeaturesis:BP,andSmoking.Achiev- ingamaximumprobabilityvalueof100%forCVLYfeaturein FourthQuartilevalue.AsummaryisshowninTable10.

Fig.8–Intercausalreasoning:maximizingMetSfeatureintheNostate.Wetrytoobtainthehighestprobabilityfor MetS=Noafterintroducingthefollowingevidence:BP=Normal,TG=Normal,WC=Normal.

(12)

Table9–Step-by-stepinstantiationsleadingto maximizationoftheprobabilityoftheCVLYvariable, whereintheinitialBNwithoutevidenceCVLY=Third Quartilereachedaprobabilityof25.1%.

Step Instantiated variable

Value CVLY=Third Quartile

1 BP = Normal 34.1%

2 HDL = High 49.2%

3 Smoking = Neversmoker 70.1%

4 Gender = Men 75.5%

5 CVRS = Low 77.6%

6 Age = 45–54 79.0%

Table10–Step-by-stepinstantiationsleadingto maximizationoftheprobabilityoftheCVLYvariable, whereintheinitialBNwithoutevidenceCVLY=Fourth Quartilereachedaprobabilityof22.4%.

Step Instantiated variable

Value CVLY=Fourth Quartile

1 BP = Serious 79.6%

2 Smoking = Currentsmoker 100%

4.4.3. MinimizingconditionedentropyforCVRS

We maximize CVRS feature probability in a Low state. To achieve it, we consider the Markov blanket of CVRS vari- able,it iscomposed ofthethree followingvariables:CVLY, Age,andGender.Wechoosefromeachstepthevariableand the state that induces the greatest increase in the condi- tionalprobabilityofCVRSvariable inaLowstate.Theorder offeaturesis:CVLY,Ageand,Gender.Achievingamaximum probability valueof100% forCRVS feature inLow value. A summary is shown inTable 11. Giventhe Markovblanket oftheCVLYvariable,theglobalMarkovpropertystatesthat theCVRSvariableisconditionallyindependentofanyother variable.

Again,wemaximizeCVRSfeatureprobabilityinaModerate state.Theorderoffeaturesisthesamethatthecasebefore:

CVLY,Ageand,Gender.Achievingamaximumprobabilityvalue of34.4%forCRVS feature inModeratevalue. Asummary is showninTable12.

Finally, we maximize CVRS feature probability ina Low state.Theorderoffeaturesisthesamethatthecasesbefore:

CVLY,Ageand,Gender.Achievingamaximumprobabilityvalue of11.9%forCRVSfeatureinHighvalue.Asummaryisshown inTable13.

Table11–Step-by-stepinstantiationsleadingto maximizationoftheprobabilityoftheCVRSvariablein Lowstate,whereintheinitialBNwithoutevidence CVRS=Lowreachedaprobabilityof91.8%.The maximumconditionedprobabilityforCVRSfeaturein stateLowisachievedwhenCVLY=FirstQuartile, Age=55–64andGender=Men.

Step Instantiated variable

Value CVRS=Low

1 CVLY = FirstQuartile 97.5%

2 Age = 55–64 99.0%

3 Gender = Men 100%

Table12–Step-by-stepinstantiationsleadingto maximizationoftheprobabilityoftheCVRSvariablein Moderatestate,whereintheinitialBNwithoutevidence CVRS=Moderatereachedaprobabilityof6.83%.The maximumconditionedprobabilityforCVRSfeaturein stateModerateisachievedwhenCVLY=FourthQuartile, Age=55–64andGender=Men.

Step Instantiated variable

Value CVRS=

Moderate

1 CVLY = FourthQuartile 19.4%

2 Age = 55–64 30.5%

3 Gender = Men 34.4%

5. Discussion

ThisstudydemonstratesthefeasibilityofBNsinepidemiolog- icalstudies,particularlywhendatafromcardiovascularrisk factorsisconsidered.BNscanbeusedforansweringclinical questionsbasedonunobservedevidencesincetheprobability distributionscanbeautomaticallyupdatedwhennewpatient informationisaddedinanappealingway.

TheBNsallowustoestablishtherelationshipsbetween featuresthrough therelationshipsofdependencyand con- ditional independency.GivenGender, Age, BPand HDLthen CVLYandCVRSfeaturesared-separatedofMetSfeature,any activetrailconnectingthemwasfound.However,considering thelocalMarkovpropertyofanode,e.g.,giventheparentsof CVLYfeature,whichiscomposedofGender,Smoking,BP(blood pressure)andHDL(cholesterol)andtakingintoaccountthe localMarkovconditionCVLYfeatureremainsindependentof itsnondescendants,CVLYfeatureisindependentofallother variables exceptofCVRSfeature,inparticularindependent oftheMetSfeature.Similarly,givenBP,WC,HDL,TGfeatures, thenMetSfeatureisindependentoftheremainingfeatures;in thiscase,astheMetSfeaturedoesnothaveanydescendants, BP,WC,HDL,TGfeaturesconstituteitsMarkovblanket,and theglobalMarkovpropertystatesthattheMetSfeatureiscon- ditionallyindependentofanyotherfeaturegivenitsMarkov blanket.

GiventhestructureofaBN,theuseoftheglobalMarkov property on each feature allows us toestablish the set of features(whichwillbeconstitutedbytheMarkovblanketof thisspecificfeature)withthestrongestinfluenceonthatfea- ture;furthermore,theMarkovblanketofaparticularfeature (nodeintheDAG)canbeusedtofindthecombinationofthe

Table13–Step-by-stepinstantiationsleadingto maximizationoftheprobabilityoftheCVRSvariablein Highstate,whereintheinitialBNwithoutevidence CVRS=Highreachedaprobabilityof1.36%.The maximumconditionedprobabilityforCVRSfeaturein stateHighisachievedwhenCVLY=FourthQuartile, Age=55–64andGender=Men.

Step Instantiated variable

Value CVRS=

High

1 CVLY = FourthQuartile 3.77%

2 Age = 55–64 9.41%

3 Gender = Men 11.9%

(13)

differentstatesthat allowtomaximizeor minimize apar- ticularstateofsuchfeature. Inthisstudy wefocusmainly onCVLY,CVRS,andMetSfeatures.However,usingtheBNsa characterizationofthewholesetofvariablescouldbegiven;

e.g.,theMarkovblanketforBMIfeatureisgivenbyphysical activity(PA),gender(Gender),andwaistcircumference(WC), giventhesethreefeatures,BMIfeatureisindependentofthe remainingones;furthermoreitcouldbeusedtofindthecom- binationofstateswhichmaximizeorminimizeaspecificstate ofBMIfeature.

InourBNmodelGenderandBMIareconnected.Anassocia- tion,oralink,betweengenderandBMIhasbeenwidelyshown intheliterature.However,reasonsforBMIgenderdifference areunclear.Differencesinanatomic,physiologic,metabolic andsexhormonalstatusbetweengenderscouldcontributeto thesedifferences.In[63]and[64]fromaSwedishandCana- dianpopulationrespectivelyGenderandBMIappearrelated.

In[65]fromadatasetforepidemiologicalresearchofKorean populationthe authorsbuilda BNforpredictingmetabolic syndrome,Genderappearscompletelyisolated,itisneither relatedtoBMInorrelatedtoanyothervariable(WC,Age,HDL, Cholesterol,etc.).

TheBNmodelincludedBMIandWCfeatures.Themost commonlymethodusedforclassifyinganindividualasover- weightor obese isthe body mass index (BMI). TheBMI is definedasthebodymassdividedbythesquareofthebody height,andisuniversallyexpressedinunitsofkg/m2,result- ingfrommassinkilogramsandheightinmetres.However, theBMIhaslimitationsandcanleadtothemisclassificationof certainindividualssuchasthosewithincreasedmusclemass ortheelderly.Waistcircumference(WC)maybeabetterindi- catorofhealthriskthanBMIalone,especiallywhenusedin combinationwithBMI.WCisparticularlyusefulforindivid- ualswithaBMIof25–34.ForindividualswithaBMIlessthan 35,WCaddslittlepredictivepoweronthediseaseriskclas- sificationofBMI.Resultsobtainedinrecentstudiesreported thatcorrelationsbetweenWC,waist-to-hipratio(WHR)and waist-to-heightratio(WHtR)andcardiovascularriskfactors arebetterthanBMI(seeforinstance[66,67]).

Reasons for the sex difference in CVRS are not fully understood.Differencesinmajorcardiovascularriskfactors, particularlyinHDLcholesterollevel,obesityandsmokingrate, explainedasubstantialpartofthesexdifferenceincardiovas- cularrisk[68,69].

Themaindifferencewithrespectothercardiovascularrisk studies intheliterature[65,70–74] isthatwe includethree diagnosticfeatures:CVLY,CVRS,andMetS.Thisfacthelpsto determinethosefeatureswiththegreatestinfluenceineach ofthediagnosticfeatures.

Insummary,BNs are agraph-based structureofajoint multivariateprobabilitydistribution whichcapturethe way an expert establishes the relationships between variables.

Furthermore, BNs are a powerful tool for modeling the decision-makingprocessunderuncertainty,whichcombinea qualitativeandquantitativerepresentationatthesametime.

Duetosimilar knowledgepattern,aBNnetwork(amodel- ingtool)canserveasaninformalbasisfordevelopmentofa frameworkofDecisionSupportSystem(DSS)intheformof tabularrule-basedsystem[75]formedicalrecommendations (aDSStool).

6. Conclusions

BNs have been chosen in order to produce an intuitive, transparent, graphical representation of the investigated interdependencies. The obtained model helps us to easily identify the relationships of probabilistic causal dependencies andconditionalindependenciesbetweenfeatures.Asaresult, wecanthenvisualizetherelationshipsbetween13features inthedomainofcardiovascularrisk.Inthiscase,duetoCVD ismultifactorial,theapplicationofthiskindofnetworksisof specialinterest,bothfromtheoreticalandpracticalpointof view.

Furthermore, the implemented BN was used to make inferences i.e.,to predictnewscenarioswhenhypothetical information was introduced. Adding evidence like differ- ent CVRF values in the implemented BN may be of great interest in epidemiologicalstudies.To makeaBN analysis threereasoningpatternswereconsidered:causal,evidential andintercausalreasoning.Combiningthereasoningpatterns togetherwithlocalandglobalMarkovpropertiesandthecon- ceptofMarkovblanketsomefeatureswereoptimized.

Acknowledgement

ThisresearchwasfundedbytheSpanishMinistryofScience andInnovation(PI13/01477).

references

[1] D.Koller,N.Friedman,ProbabilisticGraphicalModels:

PrinciplesandTechniques,TheMITPress,Cambridge, MA/London,England,2010.

[2] J.Pearl,Causality:Models,ReasoningandInference, Cambridgeuniversitypress,Cambridge,2000.

[3] P.Larranaga,S.Moral,Probabilisticgraphicalmodelsin artificialintelligence,Appl.SoftComput.11(2011) 1511–1528.

[4] A.Lig ˛eza,P.Fuster-Parra,AND/OR/NOTcausalgraphs–a modelfordiagnosticreasoning,Int.J.Appl.Math.Comput.

Sci.7(1997)185–203.

[5] G.F.Cooper,E.Herskovits,ABayesianmethodforthe inductionofprobabilisticnetworksfromdata,Mach.Learn.

9(1992)309–347.

[6] D.Heckerman,D.Geiger,D.M.Chickering,LearningBayesian networks:thecombinationofknowledgeandstatistical data,Mach.Learn.20(1995)197–243.

[7] F.Liang,J.Zhang,LearningBayesiannetworksfordiscrete data,Comput.Stat.DataAnal.53(2009)865–876.

[8] C.J.Butz,S.Hua,J.Chen,H.Yao,Asimplegraphical approachforunderstandingprobabilisticinferencein Bayesiannetworks,Inf.Sci.179(2009)699–716.

[9] C.Glymour,R.Scheines,P.Spirtes,K.Kelly,Discovering causalstructure,TechnicalreportCMU-PHIL-1,1986.

[10] P.Spirtes,C.Glymour,R.Scheines,Causation,Predictionand Search,AdaptiveComputationandMachineLearning,2nd ed.,TheMITPress,2001.

[11] P.Fuster-Parra,A.García-Mas,F.J.Ponseti,P.Palou,J.Cruz,A Bayesiannetworktodiscoverrelationshipsbetween negativefeaturesinsport:acasestudyofteenplayers,Qual.

Quant.48(2014)1473–1491,http://dx.doi.org/

10.1007/s11135-013-9848-y.

(14)

[12] P.P.Fuster-Parra,A.García-Mas,F.J.Ponseti,F.M.Leo,Team performanceandcollectiveefficacyinthedynamic psychologyofcompetitiveteam:aBayesiannetwork analysis,Hum.Mov.Sci.40(2015)98–118,

http://dx.doi.org/10.1016/j.humov.2014.12.005.

[13] J.DeFelipe,P.L.López-Cruz,R.Benavides-Piccione,C.Bielza, P.Larranaga,etal.,Newinsightsintotheclassificationand nomenclatureofcorticalGABAergicinterneurons,Nat.Rev.

Neurosci.14(2013)202–216.

[14] M.B.Sesen,A.E.Nicholson,R.Banares-Alcantara,T.Kadir,M.

Brady,Bayesiannetworksforclinicaldecisionsupportin LungCancerCare,PLOSONE8(2013)e82349,

http://dx.doi.org/10.1371/journal.pone.0082349.

[15] A.Djebbari,J.Quackenbush,SeededBayesiannetworks:

constructinggeneticnetworksfrommicroarraydata,BMC Syst.Biol.(2008)2–57,http://dx.doi.org/10.1186/

1752-0509-2-57.

[16] C.J.Needham,J.R.Bradford,A.J.Bulpitt,etal.,Aprimeron learninginBayesiannetworksforcomputationalbiology, PLoSComput.Biol.3(2007),http://dx.doi.org/10.1371/

journal.pcbi.0030129.

[17] S.J.Lycett,M.J.Ward,F.I.Lewis,etal.,Detectionof mammalianvirulencedeterminantsinhighlypathogenic avianinfluenzaH5N1viruses:multivariateanalysisof publisheddata,J.Virol.83(19)(2009)9901–9910.

[18] A.F.Poon,F.I.Lewis,S.L.Pond,etal.,Evolutionary interactionsbetweenN-linkedglycosylationsitesinthe HIV-1envelope,PLoSComput.Biol.3(1)(2007), http://dx.doi.org/10.1371/journal.pcbi.0030011.

[19] R.Jansen,H.Yu,D.Greenbaum,etal.,ABayesiannetworks approachforpredictingprotein-proteininteractionsfrom genomicdata,Science302(5644)(2003)449–453.

[20] F.I.Lewis,F.Brälisauer,G.J.Gunn,Structurediscoveryin Bayesiannetworks:ananalyticaltoolforanalysing complexanimalhealthdata,Prev.Vet.Med.100(2)(2011) 109–115.

[21] F.I.Lewis,B.J.McCormick,Revealingthecomplexityof healthdeterminantsinresource-poorsettings,Am.J.

Epidemiol.176(11)(2012)1051–1059.

[22] M.Lappenschaar,A.Hommerson,P.J.F.Lucas,J.Lagro,S.

Visscher,MultilevelBayesiannetworksfortheanalysisof hierarchicalhealthcaredata,Artif.Intell.Med.57(2013) 171–183.

[23] P.Antal,G.Fannes,D.Timmerman,Y.Moreau,B.D.Moor, Bayesianapplicationsofbeliefnetworksandmultilayer perceptronsforovariantumorclassificationwithrejection, Artif.Intell.Med.29(2003)29–60.

[24] P.Antal,G.Fannes,D.Timmerman,Y.Moreau,B.D.Moor, UsingliteratureanddatatolearnBayesiannetworksas clinicalmodelsofovariantumors,Artif.Intell.Med.30 (2004)257–281.

[25] T.Charitos,L.C.Gaag,S.Visscher,K.A.M.Schurink,P.J.F.

Lucas,AdynamicBayesiannetworkfordiagnosing ventilator-associatedpneumoniainICUpatients,Expert Syst.Appl.36(2009)1249–1258.

[26] S.M.Maskery,H.Hu,J.Hooke,C.D.Shriver,M.N.Liebman,A Bayesianderivednetworkofbreastpathology

co-occurrence,J.Biomed.Inform.41(2008)242–250.

[27] X.H.Wang,B.Zheng,W.F.Good,J.L.King,Y.H.Chang, Computerassisteddiagnosisofbreastcancerusinga data-drivenBayesianbeliefnetwork,Int.J.Med.Inform.54 (1999)115–126.

[28] J.J.Cabre,F.Martin,B.Costa,J.L.Pinol,J.L.Llor,Y.Ortega, etal.,Metabolicsyndromeasacardiovasculardiseaserisk factor:patientsevaluatedinprimarycare,BMCPublicHealth 8(2008)251,http://dx.doi.org/10.1186/1471-2458-8-251.

[29] S.M.Grundy,J.I.Cleeman,S.R.Daniels,K.A.Donato,R.H.

Eckel,B.A.Franklin,D.J.Gordon,R.M.Krauss,P.J.Savage,S.C.

SmithJr.,J.A.Spertus,F.Costa,Diagnosisandmanagement ofthemetabolicsyndrome:anAmericanHeart

Association/NationalHeart,Lung,andBloodInstitute ScientificStatement,Circulation112(2005)2735–2752.

[30] J.G.Lee,S.Lee,Y.J.Kim,H.K.Jin,B.M.Cho,Y.J.Kim,etal., Multiplebiomarkersandtheirrelativecontributionsto identifyingmetabolicsyndrome,Clin.Chim.Acta408(2009) 50–55.

[31] P.Tauler,M.Bennasar-Veny,J.M.Morales-Asencio,A.A.

Lopez-Gonzalez,T.Vicente-Herrero,J.DePedro-Gomez,V.

Royo,J.Pericas-Beltran,A.Aguilo,Prevalenceofpremorbid metabolicsyndromeinSpanishadultworkersusingIDFand ATPIIIdiagnosticcriteria:relationshipswithcardiovascular riskfactors,PLOSONE9(2)(2014),http://dx.doi.org/

10.1371/journal.pone.0089281.eCollection.

[32] B.VanSteenkiste,T.VanderWeijden,H.E.Stoffers,A.D.

Kester,D.R.Timmermans,R.Grol,Improvingcardiovascular riskmanagement:arandomized,controlledtrialonthe effectofadecisionsupporttoolforpatientsandphysicians, Eur.J.Cardiovasc.Prev.Rehabil.14(1)(2007)44–50.

[33] P.D.Sorlie,D.E.Bild,M.S.Lauer,Cardiovascularepidemiology inachangingworld-challengestoinvestigatorsandthe NationalHeart,Lung,andBloodInstitute,Am.J.Epidemiol.

175(7)(2012)597–601.

[34] M.Franco,U.Bilal,E.Guallar,G.Sanz,A.F.Gómez,V.Fuster, R.Cooper,SystematicreviewofthreedecadesofSpanish cardiovascularepidemiology:improvingtranslationfora futureofprevention,Eur.J.Prev.Cardiol.(2012),

http://dx.doi.org/10.1177/2047487312455314.

[35] J.Marrugat,R.Elosua,H.Marti,Epidemiologyofischaemic heartdiseaseinSpain:estimationofthenumberofcases andtrendsfrom1997to2005,Rev.Esp.Cardiol.55(4)(2002) 337–346.

[36] A.Willis,M.Davies,T.Yates,K.Khunti,Primaryprevention ofcardiovasculardiseaseusingvalidatedriskscores:a systematicreview,J.R.Soc.Med.105(8)(2012)348–356.

[37] F.H.Zimmerman,Cardiovasculardiseaseandriskfactorsin lawenforcementpersonnel:acomprehensivereview, Cardiol.Rev.20(4)(2012)159–166.

[38] R.B.D’Agostino,R.S.Vasan,M.J.Pencina,P.A.Wolf,M.

Cobain,J.M.Massaro,W.B.Kannel,Generalcardiovascular riskprofileforuseinprimarycare:theFraminghamheart study,Circulation117(6)(2008)743–753.

[39] WorldHealthOrganization,Obesity:Preventingand ManagingtheGlobalEpidemic,WHO,Geneva,1998.

[40] A.A.Lopez-Gonzalez,A.Aguilo,M.Frontera,M.

Bennasar-Veny,I.Campos,T.Vicente-Herrero,M.

Tomas-Salva,J.DePedro-Gomez,P.Tauler,Effectivenessof theHeartAgetoolforimprovingmodifiablecardiovascular riskfactorsinaSouthernEuropeanpopulation:a

randomizedtrial,Eur.J.Prev.Cardiol.22(3)(2015)389–396, http://dx.doi.org/10.1177/2047487313518479.

[41] F.V.Jensen,T.D.Nielsen,BayesianNetworksandDecision Graphs,InformationScience&Statistics,Springer,2007.

[42] M.Marfell-Jones,T.Olds,A.Stewart,L.Carter,International StandardsforAnthropometricAssessment,International SocietyfortheAdvancementofKinanthropometry, Potchefstroom,SouthAfrica,2006.

[43] F.Buitrago,L.Canon-Barroso,N.Diaz-Herrera,E.

Cruces-Muro,M.Escobar-Fernandez,J.M.Serrano-Arias, ComparisonoftheREGICORandSCOREfunctionchartsfor classifyingcardiovascularriskandforselectingpatientsfor hypolipidemicorantihypertensivetreatment,Rev.Esp.

Cardiol.60(2007)139–147.

[44] M.R.Cobain,AssessmentHeartAge,2011http://www.

heartagecalculator.com.

[45] A.Soureti,R.Hurling,P.Murray,W.vanMechelen,M.

Cobain,Evaluationofacardiovasculardiseaserisk

(15)

assessmenttoolforthepromotionofhealthierlifestyles, Eur.J.Cardiovasc.Prev.Rehabil.17(2010)519–523.

[46] W.Buntine,Aguidetotheliteratureonlearning

probabilisticnetworksfromdata,IEEETrans.Knowl.Data Eng.8(2)(1996)195–210,http://dx.doi.org/10.1109/69.494161.

[47] J.Cheng,R.Greiner,J.Kelly,D.Bell,W.Liu,LearningBayesian networksfromdata:aninformation-theorybasedapproach, Artif.Intell.137(2002)43–90.

[48] L.E.Sucar,M.Martínez-Arroyo,Interactivestructural learningofBayesiannetworks,ExpertSyst.Appl.15(1998) 325–332.

[49] R.W.Robinson,Countingunlabeledacyclicdigraph,in:Little CHC,editor,Lecturenotesinmathematics,622,

CombinatorialmathematicsV,Springer-Verlag,NewYork, 1977,pp.28–43.

[50] R.Daly,Q.Shen,S.Aitken,LearningBayesiannetworks:

approachesandissues,Knowl.Eng.Rev.26(2)(2011)99–157.

[51] D.Margaritis,LearningBayesiannetworkmodelstructure fromdata,2003(PhDThesisofCMU-CS-03-153).

[52] R.Nagarajan,M.Scutari,S.Lèbre,BayesianNetworksinR:

WithApplicationsinSystemsBiology,Springer,2013.

[53] M.Scurati,LearningBayesiannetworkswiththebnlearnR package,J.Stat.Softw.35(3)(2010)1–22.

[54] RDevelopmentCoreTeam,R:ALanguageandEnvironment forStatisticalComputing,in:RFoundationforStatistical Computing,Vienna,Austria,2012,ISBN:3-900051-07-0, http://www.R-project.org/.

[55] S.Hojsgaard,D.Edwards,S.Lauritzen,GraphicalModels withR,Springer,NewYork,2012.

[56] G.Claeskens,N.L.Hjort,ModelSelectionandModel Averaging,CambridgeUniversityPress,Cambridge,2008.

[57] R.E.Neapolitan,LearningBayesianNetworks,PrenticeHall, Inc.,UpperSaddleRiver,NJ,USA,2003.

[58] NorsysSoftwareCorporation,Neticaisatrademarksof NorsysSoftwareCorporation,2012,Retrievedfrom:

http://www.norsys.com,Copyright1995–2012.

[59] Weka,3.6.9:WaikatoEnvironmentforknowledgeAnalysis, TheUniversityofWaikato,Hamilton,NewZealand,2013.

[60] N.Friedman,D.Geiger,M.Goldszmidt,Bayesiannetwork classifiers,Mach.Learn.29(1997)131–163.

[61] J.R.Quinlan,C4.5:ProgramsforMachineLearning,Morgan Kaufman,SanFrancisco,CA,1993.

[62] C.E.Shannon,Amathematicaltheoryofcommunication, BellLabs.Tech.J.27(1948)379–423,http://dx.doi.org/10.1002/

j.1538-7305.1948.tb01338.x.

[63] C.Li,G.Engström,B.Hedblad,S.Calling,G.Berglund,L.

Janzon,SexdifferencesintherelationshipsbetweenBMI,

WHRandincidenceofcardiovasculardisease:a population-basedcohortstudy,Int.J.Obes.30(2006) 1775–1781,http://dx.doi.org/10.1038/sj.ijo.0803339.

[64] D.R.McCreary,Genderandagedifferencesinthe relationshipsbetweenbodymassindexandperceived weight:exploringtheparadox,Int.J.Men’sHealth1(1) (2002)31–42.

[65] H.S.Park,S.B.Cho,Evolutionaryattributeorderingin Bayesiannetworksforpredictingthemetabolicsyndrome, ExpertSyst.Appl.39(2012)4240–4249.

[66] M.Bennasar-Veny,A.A.Lopez-Gonzalez,P.Tauler,M.L.

Cespedes,T.Vicente-Herrero,etal.,Bodyadiposityindex andcardiovascularhealthriskfactorsincaucasians:a comparisonwiththebodymassindexandothers,PLoSONE 8(5)(2013)e63999.

[67] M.B.Snijder,M.Nicolaou,I.G.vanValkengoed,L.M.Brewster, K.Stronks,Newlyproposedbodyadiposityindex(bai)by Bergmanetal.isnotstronglyrelatedtocardiovascular healthrisk,Obesity(SilverSpring)20(2012)1138–1139.

[68] R.G.Baeza,V.Neira,C.Neira,M.Acevedo,Gender

differencesincardiovascularriskbytwodifferentscores:a fiveyearsfollowupanalysisofa1500-patientdatabase,J.

Am.Coll.Cardiol.65(10)(2015)A1502.

[69] A.Lopez-Gonzalez,etal.,Desigualdadessocioeconómicasy diferenciassegúnsexoyedadenlosfactoresderiesgo cardiovascular,GacetaSanitaria29(2015)27–36.

[70] J.Vila-Francés,J.Sanchís,E.Soria-Olivas,A.J.Serrano,Expert systemforpredictingunstableanginabasedonBayesian networks,ExpertSyst.Appl.40(2013)5004–5010.

[71] V.G.Almeida,J.Borba,H.C.Pereira,T.Pereira,C.Correia,M.

Pêgo,J.Cardoso,Cardiovascularriskanalysisbymeansof pulsemorphologyandclusteringmethodologies,Comput.

MethodsProg.Biomed.117(2014)257–266.

[72] Ch.R.Twardy,A.E.Nicholson,K.B.Korb,J.Mcneil, EpidemiologicaldataminingcardiovascularBayesian networks,e-J.HealthInform.1(1)(2006).

[73] S.Paredes,T.Rocha,P.deCarvalho,J.Henriques,M.Harris,J.

Morais,Longtermcardiovascularriskmodels’combination.

Anewapproach,Comput.MethodsProg.Biomed.101(3) (2009)231–242.

[74] A.Elsayad,M.Fakr,Diagnosisofcardiovasculardiseases withBayesianclassifiers,J.Comput.Sci.11(2)(2015) 274–282,http://dx.doi.org/10.3844/jcssp.2015.274.282.

[75] A.Lig ˛eza,G.J.Nalepa,Astudyofmethodologicalissuesin designanddevelopmentofrule-basedsystems:proposalof anewapproach,WiresDataMin.Knowl.1(2)(2011) 117–137.

Referanser

RELATERTE DOKUMENTER

The name indicates that the source is in position 304, the sensor in position 306, and that the measured time series of the pressure is from the detonation with file number

The combined effect of these measures may well be a decline in jihadi activity in the short run, i.e., in the next two to five years. There are already signs that this is

The difference is illustrated in 4.23, and as we see, it is not that large. The effect of applying various wall treatments is of course most apparent in the proximity of the wall.

This report presented effects of cultural differences in individualism/collectivism, power distance, uncertainty avoidance, masculinity/femininity, and long term/short

3 The definition of total defence reads: “The modernised total defence concept encompasses mutual support and cooperation between the Norwegian Armed Forces and civil society in

A COLLECTION OF OCEANOGRAPHIC AND GEOACOUSTIC DATA IN VESTFJORDEN - OBTAINED FROM THE MILOC SURVEY ROCKY ROAD..

The projects concern acoustic propagation in waters having range dependent oceanography, that is, situations where the sound speed profiles change in the horizontal direction. Two

2 Box plots of the concentration (max and min indicate the 10 and 90 % percentile and numbers of samples) on wet weight (a) and lipid weight (b) concentrations of dioxins