• No results found

Massively parallel implicit equal-weights particle filter for ocean drift trajectory forecasting

N/A
N/A
Protected

Academic year: 2022

Share "Massively parallel implicit equal-weights particle filter for ocean drift trajectory forecasting"

Copied!
35
0
0

Laster.... (Se fulltekst nå)

Fulltekst

(1)

Contents lists available atScienceDirect

Journal of Computational Physics: X

www.elsevier.com/locate/jcpx

Massively parallel implicit equal-weights particle filter for ocean drift trajectory forecasting

Håvard Heitlo Holm

a,b,∗

, Martin Lilleeng Sætra

c,d

, Peter Jan van Leeuwen

e,f

aMathematicsandCybernetics,SINTEFDigital,P.O.Box124Blindern,NO-0314Oslo,Norway

bDepartmentofMathematicalSciences,NorwegianUniversityofScienceandTechnology,NO-7491Trondheim,Norway cInformationTechnologyDepartment,NorwegianMeteorologicalInstitute,P.O.Box43Blindern,NO-0313Oslo,Norway dDepartmentofComputerScience,OsloMetropolitanUniversity,P.O.Box4St.Olavsplass,NO-0130Oslo,Norway eDepartmentofAtmosphericScience,ColoradoStateUniversity,3915W.LaporteAve.,FortCollins,CO80521,USA fDepartmentofMeteorology,UniversityofReading,EarleyGate,ReadingRB66BB,UK

a r t i c l e i n f o a b s t r a c t

Articlehistory:

Received1October2019

Receivedinrevisedform14February2020 Accepted26February2020

Availableonline4March2020

Keywords:

Dataassimilation Particlefilters GPUcomputing Shallow-watersimulation Finite-volumemethod Drifttrajectoryforecasting

Forecastingofoceandrifttrajectoriesareimportantformanyapplications,includingsearch and rescue operations, oil spill cleanup and iceberg risk mitigation. In an operational setting,forecastsofdrifttrajectoriesare producedbasedoncomputationallydemanding forecasts of three-dimensional ocean currents. Herein, weinvestigate a complementary approachforshortertimescalesbyusingtherecentlyproposedtwo-stageimplicitequal- weights particlefilter applied toasimplified ocean model.To achieve this, wepresent a new algorithmic design for a data-assimilation system in which all components – includingthemodel,modelerrors,andparticlefilter–takeadvantageofmassivelyparallel computearchitectures,suchasgraphicalprocessingunits.Fastercomputationscanenable in-situ and ad-hoc model runs for emergency management, and larger ensembles for betteruncertaintyquantification.Usingachallengingtestcasewithnear-realisticchaotic instabilities, werun data-assimilationexperiments basedonsyntheticobservations from driftingandmooredbuoys,andanalyze thetrajectoryforecastsforthedrifters.Ourresults show that even sparsedrifterobservations are sufficient tosignificantlyimprove short- term driftforecastsup totwelve hours.With equidistantmooredbuoys observingonly 0.1%ofthestatespace,theensemblegivesanaccuratedescriptionofthetruestateafter dataassimilationfollowedbyahigh-qualityprobabilisticforecast.

©2020TheAuthors.PublishedbyElsevierInc.ThisisanopenaccessarticleundertheCC BYlicense(http://creativecommons.org/licenses/by/4.0/).

1. Introduction

Prediction ofdrift trajectoriesinthe oceanhas manyapplicationsthat are importanttosociety andtheenvironment.

Examples includesearch andrescue operations, recovering objectslost atsea, planningof boomplacements foroil spill cleanup,andpreventingcollisionsbetweenicebergsandoffshoreinstallations.Toproducehigh-qualitydrifttrajectoryfore- casts,itisimportanttohaveagoodrepresentationofoceancurrents.Thisisnotaneasytask,asoceancurrentshavelarge naturalvariabilityandtherearetypicallyfewavailableobservations.Furthermore,thesizeofoceanlow- andhigh-pressure systems,so-callededdies,ismuchsmallerthantheiratmosphericcounterparts,anditischallengingtoplacethemcorrectly intypicalgridresolutionsusedbyoperationaloceanmodelstoday.

*

Correspondingauthorat:MathematicsandCybernetics,SINTEFDigital,P.O.Box124Blindern,NO-0314Oslo,Norway.

E-mailaddresses:Havard.Heitlo.Holm@sintef.no(H.H. Holm),m.l.saetra@met.no(M.L. Sætra),peter.vanleeuwen@colostate.edu(P.J. van Leeuwen).

https://doi.org/10.1016/j.jcpx.2020.100053

2590-0552/©2020TheAuthors.PublishedbyElsevierInc.ThisisanopenaccessarticleundertheCCBYlicense (http://creativecommons.org/licenses/by/4.0/).

(2)

The operational approach for drift trajectory prediction is to use the currents fromthe mostrecent ocean forecasts directly [1].Theseareimported fromcomputationallyexpensiveoceancirculationmodels,such asROMS [2], whichsolve the dynamicstate oftheocean inthreedimensions. Typically,alarge portionofthesimulationrun-timeisspent onthe data assimilation, which uses available real-world observations to correct the modeled ocean statesthat serve asinitial conditions forthenext forecast.Common forecastranges foroceancirculation modelsare threeto fivedays.Operational drift trajectory forecastsat the Norwegian Meteorological Institute (MET Norway) are produced by OpenDrift [1], which is an offline trajectory model that reads the ocean current forecasts to predict drift trajectories. Although OpenDrift is computationallyefficient,theoceancirculationmodelsstillrequireaccesstosupercomputers.

This paper explores the option of using a recently proposed filter method applied to a simplified ocean model for efficientdrift trajectoryforecasting.The aimisto buildadata-assimilationsystemthatcanrun efficientlyoncommodity- level desktopcomputers, andalsobe extendable tosupercomputers. We achieve this by usinga simplified ocean model and a data-assimilationmethodthat both are able to take advantage ofmassively parallelaccelerator hardware, such as the graphical processingunit (GPU). This work is not intended asa substitute ofcurrent operational systems, but asa complementary approach, in which the predicted currents may even be updated with in-situ observations, e.g., during ongoing search andrescueoperations. Furthermore,byenabling researchmodels torun onindividualdesktopandlaptop computers,researchersareabletodomorerapidprototyping.Atthesametime,thisworkwillcontributetomoreefficient simulationsalsoonsupercomputers,sinceallalgorithmsmaybeextendedtorunonmultipleGPUsandcomputenodes.

The paperisorganized asfollows:We startby highlightingour contributionsandreviewingrelatedwork relevantfor Lagrangian dataassimilationwithacceleratedparticlefilters.InSection 2,we describethedata-assimilationproblemand summarizethekey conceptsofso-calledproposal-distributionparticlefilters.We presentthesimplified oceanmodeland modelerrors inSection 3,whereasSection 4offersadetaileddescriptionofanalgorithm forrunningthechosen particle filteronthismodel.ThelattertwosectionsalsodiscusshowtheGPUisusedforefficientimplementationofthecomputa- tionallyintensivecomponents.InSection5,wepresentnumericalexperimentsofdrifttrajectoryensembleforecasts,using an identical-twinexperimentsetupdesignedtoresemblereal-worldoceancurrentsandwithconfigurationinspiredbyop- erationalsystems.Furthermore,weshowanddiscussthestatisticalvalidityoftheforecastsandexaminethecomputational performanceofthesimulations.Finally,Section6containsasummaryandconcludingremarks.

Papercontribution. Wepresentan efficientGPU-accelerateddata-assimilationsystembasedontherecentlyproposed im- plicit equal-weights particle filter (IEWPF) applied to simplified ocean models described by the rotating shallow-water equations. The data-assimilation algorithm, the numerical scheme for evolving the ocean model, and the algorithm for samplinglocallycorrelated,well-balancedrandommodelerrorsarealldesignedtotakeadvantageofmassivelyparallelar- chitectures.Thedata-assimilationsystemistailoredforobservationsoftheoceancurrentobtainedfromeitherfree-drifting buoysormooredbuoys.Weshownumericalexperimentsforassimilatingachallengingtestcasewithnear-realisticchaotic behavior, alongwithdrift trajectoryforecasts.Theresultsshowthat byassimilatingapproximately0.1%ofthestatespace, the posteriorensemble meanstronglyresemblesthe truestate intheentiredomain, thusenabling an accurate drifttra- jectory forecasts. This isalsothe first timethat the IEWPFmethodis applied tohigh-dimensional geophysical problems.

Furthermore,weshowthat theparticlefilterhaswell-behavedstatisticalproperties,andthatthecomputationalefficiency ofthedataassimilationiswell-balancedwithrespecttothemodel.Tothebestofourknowledge,thereexistsnoprevious massivelyparallelimplementationsofastate-of-the-artparticlefilterappliedtoahigh-dimensionalgeophysicalsystem.

Relatedwork. Particle filters, andmore generallySequential MonteCarlo (SMC)methods, constitute a large class ofnu- merical methods for statistical inference. It is well-known that the standard particle filter is prone to degeneracy in high-dimensional systems [3–5], and there have been several attempts at designing particle filters without this limita- tion. A few such particlefilters havebeen used onhigh-dimensional, near-realistic applications inthe geosciences.Ades and vanLeeuwen [6] use the equivalent-weightsparticle filteron a high-dimensional, simplified,ocean model basedon thebarotropicequations,showingthatitispossibletoavoidthedegeneracyprobleminhigh-dimensionalsystems,atthe cost ofa biasedestimate. Althoughthe schemeperformedwell,the biasgrowswithensemble size.Poterjoy,Sobashand Anderson [7] usealocalparticlefilterontheweatherresearchandforecastingmodel,inwhichthebootstrapparticlefilter isapplied locallytoobservations,andparticlestatesaremergedinstate spacebetweenthelocationsoftheobservations.

However, it remains problematic toglueparticles fromtheselocalupdatestogether to fullparticles that spanthe whole modeldomain.Theneededsmoothingcaneasilydestroydelicatebalances intheflow.Furthermore,theminimumsize of the localareasissetbyphysicallength-scale constraints,typically meaningthattoomanyobservationsarewithina local domaintoavoiddegeneracy.Inpractice,aminimumweightvalueisset,meaningthatnotallinformationisextractedfrom theobservations.Hence,localization isnotsolvingtheproblem.Arecentreviewby vanLeeuwenetal. [8] discussesmost recentdevelopmentsonparticlefiltersforhigh-dimensionalgeophysicalsystems.

SeveralimplementationsofstandardparticlefiltersforparallelarchitecturessuchasGPUsexist,butmainlywithinother scientific disciplines than geosciences.Lopez etal. [9] present GPU-implementations of a particle filter (with sequential importanceresampling)andauxiliaryparticlefiltertodetectanomaliesinmanufacturingprocesses,andshowsufficientper- formanceforreal-timeapplication.Gelencsér-Horváthetal. [10] introduceamodifiedcellularparticlefilterwithMetropolis resamplingon theGPUforreal-timeapplications.LibBi [11] isasoftware packageforstate-spacemodeling and Bayesian inferencecapableofutilizingGPUs.SeveralparticlefiltersareimplementedinLibBi,e.g.,particleMarkovChainMonteCarlo

(3)

(pMCMC)andSMC2.OthermethodsinLibBiincludetheExtendedKalmanFilter(EKF)andparameteroptimization routines.

BaiandHu [12] demonstrateparticlefilter-baseddataassimilationforsimulationofwildfirespread,withparallelsampling andweightcomputationbasedontheMapReduceprogrammingmodel.Inamorerecentwork,Baietal. [13] describemore efficientroutingofparticlesbetweenprocessingunitsintheresamplingstepofadistributedparticlefilter.

Otherdata-assimilationmethodshavealsobeensubjecttoGPU-accelerations.BlattnerandYang [14] giveaperformance study ofa GPU-implementation ofthelocal ensembletransformKalman filter,WeiandHuang [15] explorea GPU-based implementationoftheEKF,andQuinnandAbarbanel [16] presentageneralpathintegralMonteCarloapproachappliedto aneuronmodel.Theyallreportmassivespeed-upsontheorder100-1000overCPUimplementations.Theoreticalspeed-up basedonhardwarespecificationsforFLOPSandmemorybandwidthisontheorder10 [17].

AssimilationofLagrangiandataischallengingduetothepotentialcomplexityofthetrajectoriesandtheneedfortrans- formingthedatainto Eularianvelocity data(forfixed-gridorspectral numericalmodels).Apte, JonesandStuart [18] use particlesmoothingforassimilatingLagrangian datafromdriftersandpresentthreemethods forsamplingfromtheexact posteriorprobabilitydensityfunctionbasedontheLangevinequation andtheMetropolis-Hastings algorithm.Theirmeth- ods are showntoproduce better resultsthan theensemble Kalmanfilterusingperturbed observations. Spiller, Apte and Jones [19] usebothparticlefilteringandsmoothing(exactposteriorsampling)forassimilatingLagrangiandatafromgliders anddrifters. Theyproposea newobservationoperatorto dealwiththe highuncertainty inthelocationsofthe observa- tions.Spilleretal. [20] investigatethedivergenceofaparticlefilterforthepoint-vortexmodel.Theyintroducebacktracking particlefiltersandshowthatthefiltersoutperformEKFforthetwo-pointvortexsystem.Othermethodsthanparticlefilters andsmoothershavealsobeensuccessfullyimplemented [21–25].

2. Thedata-assimilationproblem

Therearemanypotentialsourcesforerrorsinthesimulationofatmosphericandoceanographicprocesses.Theseerrors mayarise fromphysicalprocessesmissinginthemathematicalmodel,discretizationerrorsinthenumericalmethod,sub- grid effects that can not be resolved in thediscretized model, anduncertainties in modelparameters, initial conditions, forcing andboundary conditions. Hence, we do not only wish to simulate the behavior of the unknown physical state, denoted by ψ, butrather its probability density function (pdf), p(ψ). As geophysical applications tend to be very high- dimensionalanddrivenbynonlinearprocesses,ananalyticdescriptionofp(ψ)isgenerallyunobtainable,andanensemble- based Monte-Carlo simulation is one way to measure the uncertainties in the system. In its simplest form, ensemble- basedstatisticalsimulation consistsofaset of Ne independentstate vectors{ψi}i=1,...,Ne,which areinitialized according to uncertainties in the model parameters andinitial conditions.The state of each ensemble member is then simulated independentlyaccordingtothemodelequation,

ψ

ni

=

M

ψ

ni1

+ β

ni1

,

forn

=

1

,

2

, ...,

(1)

inwhichthemodelMevolvesthesolutiondeterministicallyfromtimetn1totn,andβni1isanoptionalstochasticvariable thatrepresentsrealizationsoftheerrorsinthemodel.Thepdfofthesystemcanthenberepresentedthroughthestatistical propertiesoftheresultingensemble,e.g.,as

p

n

) =

1 Ne

Ne

i=1

δ

ψ

n

ψ

ni

,

(2)

inwhichδistheDiracdeltafunction.

Ifan observation yn ofthesystemisavailable attimetn,thisinformationcanbeusedtoimprovetheobtainedproba- bilitydensity.Typically,theobservationisalsoinfluencedbyuncertainty,as

yn

=

H

ψ

ntrue

+

n

,

(3)

inwhichHistheobservationoperatorthatmapsthetruestateψntruetoobservationspaceand n isastochasticobservation error.Theobservationstypicallyonlycoverpartsofthesystem, sothatthesizeoftheobservationvector(denoted Ny)is smallerthanthesizeofstatevector(denoted Nψ).Thisisparticularlytrueforgeophysicalsystems,forwhichitisnormal that NyNψ (e.g., y can bethe value anddirectionof theoceancurrent atasingle point inspaceandtime).Because ofthis, we cannot simplyreplace theobserved partsofψn withthe valuesin yn directly, andwehave toconsiderthe conditional pdf p

ψn|yn

.Thedata-assimilationproblemconsistsoffinding thisconditional density,andits fundamental buildingblockisBayestheorem:

p

n

|

yn

) =

p

(

yn

n

)

p

n

)

p

(

yn

) .

(4)

The originalpdf p(ψn)is heretermedthe priorprobability,asit representsourunderstanding ofthesystem priorto as- similatingtheinformationintheobservation.Thelikelihood p(yn|ψn)expressestheprobability ofobserving yn underthe

(4)

assumptionthatψn isthetruestateofthesystem.Themarginalprobability p(yn),i.e.,theprobabilityofobserving yn,acts mainlyasanormalizationconstantandensuresthattheresultingposteriorprobabilitydensityisapdf.

2.1. Standardparticlefilter

Thestandardparticlefilterisanensemble-baseddata-assimilationtechniquethatusesadirectevaluationofBayestheo- rem.Eachparticle(equivalenttoan ensemblemember),ψi,isassignedaweight wi that givestherelative importanceof that particleintheensemble.Typically,all Ne particlesare initializedwithweight w0i =1/Ne,astheyare sampledinde- pendentlyfromthepdfoftheinitialconditions,p(ψ0).Eachparticleisthensimulatedindependentlyaccordingto(1) until observationtimetn.Byapplying(4) directlywith(2) asthepriordensity,andbyconsideringthemarginalprobabilityasa normalizationconstant,theposteriordistributionisexpressedas

p

n

|

yn

)

Ne

i=1

p

(

yn

| ψ

ni

)

Ne

j=1p

(

yn

| ψ

nj

) δ(ψ

n

ψ

ni

)

=

Ne

i=1

wni

δ(ψ

n

ψ

ni

).

(5)

Here,thelikelihoodisusedtoupdatetheweights wni foreachparticle,sothattheposteriorisrepresentedbyaweighted discretedistribution.Wecanevaluatethelikelihoodifweknowthepdffortheobservation.Forinstance,iftheobservation errorisGaussian, nN(0,R),theweightforparticleψibecomes

wi

exp

1 2

yn

H

ni

)

T

R1

yn

H

ni

) .

(6)

Assomeparticlesinevitablyendupwithverylowweights,theynolongercarrysignificantstatisticalvalue.Toimprove the statisticalcoverage inthe high-probabilityregions, theensemble isresampled accordingtothe weightdistribution in (6), so that {ψni}i=1,...,Nepn|yn). All weights for the resampled particles are then reset to 1/Ne. This is known as sequentialimportance resampling.Severalschemescanbe usedforthisresampling [4],andinthisworkwe considerthe residualresamplingscheme [26].Notethatifthemodel(1) hasβ=0,itisimportantthatduplicatedparticlesaregivena perturbationtoavoidensemblecollapseandcompletelyoverlappingparticletrajectories.Withastochasticmodel,however, exactduplicationswillevolvedifferentlythroughindependentrealizationsofβi.

One ofthe main advantages of the standard particle filteris that it preservesall physicalproperties throughout the simulation,asthefinalparticles aregeneratedfromsuccessfulsimulationrunsandnot throughmanipulationofthestate vectors. A drawback, however, is that the ensemble is prone to collapse when the dimension of the observation space increases [3–5].Inhigh-dimensional systems,all particlesendup inthetailofthe likelihood,withtheconsequencethat onlyveryfew particles(perhapsevenjustone)gainamuchhigherweightthanallothers.Thedistributionthencollapses asall Ne particlesareresampledfromfew(orasingle)particlesthathavenon-zeroweights.Thisproblemisoftenreferred toasthecurseofdimensionality.

2.2. Theimplicitequal-weightsparticlefilter

One technique used forovercomingthe curse ofdimensionalityisto sample the statesψni froma proposaldensity,q, withanappropriate compensationintheweights.First,(1) showsthat thepdf ofthestate attimetn isrelatedtothatof theprevioustimebytheMarkovianproperty

p

n

) =

p

n

n1

)

p

n1

)

d

ψ

n1

1 Ne

Ne

i=1

p

n

ni1

),

(7)

whereweassumedthatallparticleshavethesameweightattimetn1.Inthestandardparticlefilter,wedrawtheevolution of theparticlefrom p(ψn|ψni1), whichisequivalent to solving themodelequation forone time step.We canchoose it differently,byfirstmultiplyinganddividingtheargumentoftheintegralbyaproposaldensityq andthendrawtheparticle evolutionfromthatdensity,

p

n

) =

1 Ne

Ne

i=1

p

n

| ψ

ni1

)

qi

n

| ψ

n1:N1

e

,

yn

)

qi

n

| ψ

n1:N1

e

,

yn

).

(8)

We have large freedom in how to choose q, butthe support of q is required to be equal toor larger than the support of p(ψn|ψni1),and itshould preferably be easy to sample from. Here, theproposal ischosen tobe conditioned on the

(5)

observation ynandallparticlestatesattheprevioustimestep,ψn1:N1e,anditdependsontheparentstateψni1viaindexi. UsingtheproposaldensityinBayestheorem(4) givesus

p

n

|

yn

) =

1 Ne

Ne

i=1

p

(

yn

| ψ

n

)

p

n

| ψ

ni1

)

p

(

yn

)

qi

n

| ψ

n1:N1

e

,

yn

)

qi

n

n1:N1e

,

yn

).

(9)

Bynowsamplingψniqinn1:N1e,yn),theposteriorbecomes

p

n

|

yn

) =

Ne

i=1

wni

δ(ψ

n

ψ

ni

),

with wni

=

p

(

yn

| ψ

ni

)

p

ni

| ψ

ni1

)

Nep

(

yn

)

qi

ni

| ψ

n1:N1

e

,

yn

) .

(10)

Onechoiceofqistheoptimalproposaldensity[27],inwhichqin|ψn1:N1

e,yn)=pni|ψni1,yn).Byconsideringalinear observation operator H andGaussian modeland observationerrors, βN(0,Q) and

N(0,R),the optimal proposal densityisequivalentto N(ψni,a,P),with

ψ

ni,a

=

M

ni1

) +

Q HT

H Q HT

+

R

1

dni (11)

and P

=

Q1

+

HTR1H

1

,

(12)

inwhich

dni

:=

yn

H M

ni1

)

(13)

iscalledtheinnovationforparticlei.Theproposalisoptimalinthesensethatitgivesoptimalvarianceintheweightsfor proposalsoftheformq(ψn|ψni1,yn),butasitturnsout,itisnotsufficienttoavoidensembledegeneracy [3,5,28].

Themainparticlefilterwewilluseinthisworkisan extensionoftheimplicitequal-weightsparticlefilter(IEWPF).In theIEWPF [29],q ischosensimilarbutnotidenticaltotheimplicitparticlefilter [30] bychoosingthenewparticlesas

ψ

ni

= ψ

ni,a

+ α

i1/2P1/2

ξ

i

,

(14) inwhichξi isa drawfromthestandardmultivariateGaussian distributionξiN(0,I)and

α

i isafunctionofboth ξ and ψin1.Furthermore,we choose

α

i suchthat the weights ofall particles becomeequal toa target weight,which isequal to the lowest optimalproposalweight of all the particles.This choice is neededto ensure that we keep all particles in theensemble, butcomeswithtwo drawbacks. Firstly,when thenumberofparticles increases, theworst particlewill be located furtherandfurther away fromthe observations,so thescheme enforces all particles tomove further away from theobservations.Secondly,numericalexperimentsshow thatthespreadoftheparticles becomesunderestimated inlow- dimensionalsystems(itsbehavior inhigh-dimensionalsystemsishardertoassessaswedonotknowthetrueanswer).Not withstandingthesenegatives,theIEWPFisthefirstparticlefilterthathasuniformweightsinhigh-dimensionalsystems.

Toalleviatethesetwoissues,Skauvoldetal. [31] extendedtheschemebyproposinganupdateequationforeachparticle oftheform:

ψ

ni

= ψ

ni,a

+ α

i1/2P1/2

ξ

i

+ β

1/2P1/2

ν

i

,

(15) inwhich

ν

i isasecond randomvector

ν

iN(0,I)andβ isacovariancescaling parametercommontoall particles.The introductionofthenewtermenablesustoremove theunderestimation oftheparticlespreadbytuning β.Furthermore, wecanchoose

α

iandβ suchthatthetargetweightisequaltothemeanoftheoptimalproposalweights.Theconsequence ofthischoiceisthattheparticlesarenotforcedawayfromtheobservationswhentheensemblesizeincreases.Withthis, bothproblemsaresolved,andthisnewschemeisthebasisforournumericalexperiments.Detailsoftheschemearegiven inAppendixA.

3. Simplifiedoceanmodelformassivelyparallelarchitectures

Traditional ocean circulation models [2,32] aregenerally written to resolve asmany of the physicalprocesses in the ocean aspossible,andtypicallyconsider conservationofmass,momentum,energy, andtracers(salt andtemperature)in threedimensions.Thismakesthemverycomputationallydemandingandlimitsthefeasiblenumberofensemblemembers.

Thenumberofmembersinanoperationalensemblepredictionsystemtodayisusuallybetween10and100.Insteadofafull three-dimensionaloceancirculationmodel,weassumethattheverticalvelocitiesarenegligiblecomparedtothehorizontal movement, andlet thenonlinear shallow-water equations in a rotational domainserve as a simplified model.Thus, we vastlyreducethestatespaceoftheproblem. Inoperationalsettings,thesimplifiedmodelmaybeinitialized basedonthe

(6)

mostrecentoceanstatefromatraditionaloceancirculationmodel,andbeusedfortoforecast short-termoceancurrents.

Furthermore,drift ofLagrangian objectsintheocean aretypically drivenby theocean currents,wind,andwave-induced forces(Stokesdrift) [33],whereasinthisworkweonlyconsiderthecontributionfromtheoceancurrents.

Theshallow-waterequationsareintheclassofhyperbolicconservationlaws,whichareoftensolvedusingexplicitfinite- volumemethods [34].Thisclassofproblemsiswell-suitedforefficientimplementationonmassivelyparallelhardware,such asGPUs [35–37].Byalsocarefullytailoringthedata-assimilationalgorithmstouselocaloperations,weareabletorunthe mostcomputationally demandingpartsofthe codeontheGPU. Controlflowandintrinsicserialoperations,however,are still carriedout on the CPU.This way,we useeach processor type forthe taskwhich it isbest suitedfor. Through this approach,wecanefficientlyrunanensembleofasimplifiedoceanmodeloncommodity-leveldesktopcomputers,reducing therequirementsforaccesstosupercomputers.

TheGPUisanextremecaseofamany-coreprocessor,withhundredsorthousandsofsimplecores.Measuredinfloating- point operationsper second(FLOPS), astandarddesktopGPUsurpassestheperformanceofthetop supercomputerinthe world ten years ago [38], and is today roughly ten times asfast as the CPU. GPUs were initially designed for efficient graphics operations, buthave becomeincreasingly popular forgeneral-purpose computingover thelast 15 years.Due to theirdesignforoptimizedthroughputofdata-paralleloperationsandlowpricesdrivenbythegamingmarket,theybecame attractiveacceleratorswhenthesteadilyincreasingCPUclockfrequencycametoanend [39].Programminglanguagessuch as CUDA and OpenCL, and easy access to highly specialized third-party libraries,1 debuggers and profilers, have further contributedtomakethemaccessibleforawiderangeofcomputationalproblems.

TheprogrammingmodeloftheGPUisaccessedthroughkernels,whichareprogramswritteninspecializedlanguagesfor runningontheGPUinaSIMD/SIMT(SingleInstruction,MultipleData/Threads)fashion.Thethreadsareorganizedinblocks, whichagainareorganizedinagrid.Thegrid(andblocks)canbe one-,two- orthree-dimensional,andtheidealchoiceof block-sizeconfiguration,denotedby(bx,by),willvaryfordifferentkernelsandfordifferentGPUs.Eachthreadcancommu- nicate withotherthreadsinthesameblockthroughthesharedmemory,whichcanbedescribedasaprogrammablecache orscratchpadmemory.Communicationbetweenthreadsindifferentblocks,however,requirescostlyglobalsynchronization.

The GPUdoesnot sharethemainCPUmemory,andallrequireddatathereforeneedstobeexplicitlytransferredbetween the GPU andCPU. This operation is relatively expensiveand should be minimized foroptimal performance. For a more thoroughintroductiontoGPUcomputing;see,e.g.,SandersandKandrot [40].

Toachievebothcomputationalperformanceandcodedevelopmentefficiency,wetreatthecomputationalintensivepart of the codeandthe program flow indifferent ways. PyCUDA [41] isa Python package that exposesthecomplete CUDA run-timeAPI andallows ustocallnativeGPUkernelswritteninCUDAdirectlyfromPython.Thisway,one canwritethe program flow,aswellaspre- andpost-processingofthespecific applications,inhigh-level Python,andatthesametime ensure thatthecomputationallyexpensivesimulationlooprunsasefficientaspossiblethroughlow-levelCUDAC/C++.By takingadvantageofwidelyavailable andpopularpackages–includingNumPy [42] andmatplotlib [43],andenvironments suchastheJupyterNotebook [44] –thecodeandexperimentscanbedevelopedefficientlythroughrapidprototyping.

Intheremainderofthissectionwegive anoverviewofthemodelandthemodelerrors,andshowhowweutilizethe GPUtoincreasecomputationalefficiency.

3.1. Thesimplifiedoceanmodel

Theshallow-waterequationsconsiderthreeconservedvariables;theelevation

η

ofthefreeoceansurfacerelativetoits equilibriumlevel,andthevolumetransporthuandhv alongtheabscissaandordinate,respectively.Theequilibriumdepth isgivenbyHeq andishereassumedtobeconstant,sothatthefullheightofthewatercolumnbecomesh=Heq+

η

.With gravitationalaccelerationgandCoriolisparameter f,theshallow-waterequationscanbewritten

( η )

t

+ (

hu

)

x

+ (

hv

)

y

=

0

, (

hu

)

t

+

hu2

+

1

2gh2

x

+ (

huv

)

y

=

f hv

, (

hv

)

t

+ (

huv

)

x

+

hv2

+

1

2gh2

y

= −

f hu

.

(16)

Theequationsrepresentahyperbolicconservationlaw,andcanbewritteninvectorformas

ψ

t

+

F

(ψ)

x

+

G

(ψ)

y

=

Sf

),

(17)

forastatevectorψ= [

η

,hu,hv]T.Here,FandGarefluxtermsalongtheabsiccaandordinate,respectively,andSf consists ofthesourcetermsduetotheCoriolisforces.

The modeloperator M(ψ) will be the numericalscheme that solves (16) and evolvesthe state forward in time. We usethehigh-resolution central-upwindschemeproposedby Chertocketal. [45],butwithareformulationthat avoidsthe

1 BLAS,RNG,FFT,imageandsignalprocessing,collectivecommunicationprimitives,graphanalytics,etc.

(7)

expensiverecursiveformulationofCoriolispotential terms [46].The schemeisdesignedtobe well-balancedwithrespect tothegeostrophicbalance,

hu

= −

g Heq f

η

y and hv

=

g Heq f

η

x

,

(18)

which permitsrotatingsteady-state solutionsby balancing thegravitationalandCoriolis forces.The numericalschemeis solvedonaCartesiangridM consistingofNM=nx×nycells.Thesizeofeachcellisx×y,sothatthecellwithindex (j,k),containingthevalueψj,k,isthecellcenteredat

(

xj

,

yk

) =

j

+

12

x

,

k

+

12

y

.

(19)

The total size ofthe state vector ψ then becomes Nψ =3NM. The time integrationis solved by a second-order strong- stability-preserving Runge-Kuttamethod, andthe storagerequirement forthe schemeis therefore 2Nψ, asthe full state mustbestoredfortwoconsecutivetimesteps.

ThestepsizeofthenumericalschemeislimitedbytheCFLcondition,

tscheme

1 4min

x maxM

u

±

g

(

Heq

+ η ) ,

y maxM

v

±

g

(

Heq

+ η )

,

(20)

in which the dominating term is the speed of gravitational waves,

g(Heq+

η

). Even though such waves occur in the ocean,perhapsmostnotablethroughtides,theircontributiontodriftermotionislimited.Eddiesandotherrotation-driven dynamicsaremuchmoreimportant,buttheyoperateonlongertimescales.Nevertheless,theCFL-conditionin(20) mustbe satisfiedtoensurenumericalstability.Torunthedata-assimilationmodelonarelevanttimescale,wedecouplethemodel operatorM fromthetimestepofthenumericalscheme,andletthefixedmodeltimestept consistofasmanytscheme stepsasnecessary.Weevaluatetheconditionin(20) continuouslytoadapttscheme tothemostrecentmodelstate,using aCourantnumberof0.8.

3.2. Smallscalemodelerrors

Toaccountforerrorsinourmodel(e.g.,missingphysics),weintroducesmall-scaleperturbationsthroughthestochastic variable,β= [δ

η

,δhu,δhv]T,sothatβ isapproximatelydrawnfromN(0,Q).Thismodelerrorisgeneratedbysamplinga randomvectorξN(0,I)andapplyingacovarianceoperator,

β =

Q1/2

ξ.

(21)

Thiserrorisaddedtothemodelstateaftereachmodeltimestept.Wedesignthecovarianceoperatorbasedontwore- quirements.First,sinceweaimtoimplementallcomponentsinthedata-assimilationsystemtorunefficientlyonmassively parallelarchitectures,wedesignthecovarianceoperator Q1/2 intermsoflocaloperations.Second,itisimportantthatthe stochasticmodelerrordoesnotintroducediscontinuitiesornon-physicalmodelstatestothesolution.

Tomaketheperturbationoftheoceansurfaceδ

η

sufficientlysmooth,itisgeneratedaccordingtoasecond-orderauto- regressive(SOAR)functiongivenby

δ η

j,k

=

nx

a=1 ny

b=1 QSOAR1/2

j,k

,

a,b

ξ

a,b

,

(22)

inwhich

QSOAR1/2

(

j,k

,

a,b

) =

q0

1

+

dist

(

j,k

,

a,b

)

L0

exp

dist

(

j,k

,

a,b

)

L0

.

(23)

Here,q0 isascalingparameterfortheamplitudeofδ

η

,L0isameasureofthecorrelationlengthscale,anddist(j,k,a,b) istheEuclideandistancebetweenthecenterofthecellswithindices(j,k)and(a,b).Sincethecovariancebetweenpoints thatare farfromeachother relativeto L0 becomeszero,thecomputationalwork canbelimitedtooperateonlocaldata pointsonly,andthissatisfiesthefirstdesignrequirement.Equation(22) canthenbewrittenas

δ η

j,k

=

j+

cSOAR a=jcSOAR

k+

cSOAR b=kcSOAR

QSOAR1/2

j,k

,

a,b

ξ

a,b

,

(24)

in which cSOAR is our cut-off value, tuned so that there are no contribution to δ

η

j,k from a distance larger than cSOARmin(x,y)fromcellj,k.Operationssuchas(24) areverywellsuitedforimplementationontheGPU.

A drawback to the expression in (24) is that the computational work and data dependency of the stencil is tightly connectedtotheratiobetweenL0 andthecellsize.Tohavebettercontrolofthisworkload,weintroduceacoarserandom

(8)

Fig. 1.Alignmentofnestedgridswithc=3.ThegridM containscellsandisusedforevolvingthenumericalmodel,whereasthegridR contains pointvaluesandisusedforapplyingtheSOARfunctiononsampledrandomnumbersfromN(0,I).Forbestpossibleassimilationofobservations,anoffset canbeappliedtoR sothatoneofitsgridpointsisco-locatedwiththecellinMinwhichtheobservationwasmade.

numbergridR,onwhichthestandard normaldistributedrandom numbersξ are sampled,andapply theSOARfunction here. We choose the discretization of R so that we obtain a good trade-off betweencomputational efficiencyof (24), while maintaining a good spreadof informationwithin the correlated areas.The coarse grid will have grid cells ofsize (˜x,˜y)=c(x,y),wherec isan oddnumberrepresentingthecoarseness ofR.Valueson R areinterpretedas pointvalues,andwedenotethenumberofgridpointsinR byNR.Byrequiringthatcisodd,weensurethatthepoint values definedon R areco-located withcellcenters ofM,asshownin Fig.1.Furthermore,we choosethecoarsening factorcsothatthecut-offfactorin(24) canbechosenascSOAR=2.Afterhavingobtainedδ

η

onR through(24),weuse bicubicinterpolation,denotedbytheoperatorI,toobtaincell-averagedvaluesonM.

Toavoidthat theperturbation β produces non-physicalmodelstates(the seconddesign requirement),weuse(18) to ensurethatβisingeostrophicbalance.Bydiscretizing(18) withcentraldifferencesontheM grid,δhuandδhvarefound fromδ

η

by

δ

huj,k

= −

g Heq f

δ η

j,k+1

δ η

j,k1

2

y and

δ

hvj,k

=

g Heq f

δ η

j+1,k

δ η

j1,k

2

x

.

(25)

Thisoperationisdenotedby QG B1/2.Itshouldbenotedthatthederivativesofδ

η

areapproximatedby(25),eventhoughthey are analyticallyavailabledirectlyfromthebicubicinterpolation.Thereasonisthatgeostrophic balanceisonlymaintained by the numericalscheme withrespect to the grid resolution. The bicubic surface, however,is continuously definedand will typically containoscillations on sub-grid scale,meaning that thederivatives ofthebicubic surface oftenwillnot be represented bythediscrete valuesonthegrid.The centraldifferencesin(25) arethereforebetter suitedforgeneratinga modelstatethatisinbalanceunderthenumericalscheme.

Evaluatingthecompletemodelerrornowconsistsoffouroperations,

β =

Q1/2

ξ =

QG B1/2IQSOAR1/2

ξ,

(26) inwhichthefirststepistosampleξN(0,I).Notethat QG B1/2 andQSOAR1/2 are linearoperators,whereas I isanonlinear stencil.Theinputandoutputforeachoftheoperationsare

QSOAR1/2

:

R

R

,

I

:

R

M

,

QG B1/2

:

M

3

×

M

,

(27)

makingthecovarianceoperatoractas

Q1/2

:

R

3

×

M

.

(28)

TheseoperationsareillustratedinFig.2.First,therandomfieldξ issampledonthecoarsegridR,andtheSOARoperator QSOAR1/2 isappliedtogenerateacoarsecorrelatedfield.Then,thecorrelatedfieldisinterpolatedontothecomputationalgrid M using I,andδhuandδhvarecomputedtobeingeostrophicbalancewithrespecttoδ

η

.

It should be noted that our choice of the model error leads to a non-symmetric square root Q1/2, and that this implementation-oriented definitionof Q1/2 makes useofsignificantly fewer randomnumbersthan variablesin thestate vector. Using c=3,illustrated inFig.1,asan example,wesample one randomnumberforevery nine

η

variables,and none forhu andhv, since δhu andδhv are computedfrom(25). This resultsin one randomnumber forevery 27state variables.Itshouldfinallybe notedthatbecause Q1/2 isanonlinearoperatorduetothebicubicinterpolation,theβ’sare notstrictlyGaussiandistributed.This,however,isnotaproblemasthecovarianceoftheβ’sisstillsymmetricpositivesemi definite.

(9)

Fig. 2.Thesmallscalemodelperturbationβ= [δη,δhuhv]T isgeneratedbyfirstsamplingrandomnumbersfromastandardnormaldistributionξN(0,I)onthecoarsegridR.WethengivetherandomfieldacovariancestructureaccordingtotheSOARfunctionQ1SOAR/2,beforeinterpolatingthecoarse randomfieldontothefinemodelgridM throughItogetδη.Finally,wecalculatethecorrespondingmomentumδhuandδhvtoimposegeostrophic balance.

3.3. Efficientimplementationofmodelerrors

The SOAR function, bicubic interpolation, and geostrophic balance are all local stencil operations that are simple to parallelize, as each element of their output can be found independently from all other output elements. Generation of randomnumbersξ canfurtherbedonethroughthecuRANDlibraryavailablethroughtheCUDAtoolkit.Thesamplingofβ isthereforewell-suitedforimplementationontheGPU.

TheSOAR functionin(24) with cSOAR=2 consistsofastencil operationdepending on5×5 inputvaluescenteredon thetargetcell.WeuseoneGPUthreadperoutputelement.Tominimizetheamountofdatareadfromglobalmemory,all threadswithinthesameblockcooperatetoreadthecollectivelyrequiredinputdataintosharedmemory.

Inthebicubicinterpolation I,eachvalueinthefinegridM dependsonthe4×4 points inthecoarsegridR that surroundsitsposition.Thismeansthatthec×coutputvaluesthatarelocatedbetweenthesamefourcoarsegridpoints haveoverlappingdatadependencies.WestillapplyoneGPUthreadperoutputelement,andobtaingeostrophicallybalanced δhuandδhvwithinthesamekernel.Eachblockcomputes(bx+2)×(by+2)valuesofδ

η

andstoresthemtemporarilyin sharedmemory,sothatbx×byvaluesofδhuandδhv efficientlycanbecomputedusing(25).

Thememoryfootprintofobtainingβ istwobuffersofsize NR,holdingξ andtheresultfrom QSOAR1/2 ξ,respectively.The memoryfootprintoftherandomnumbergeneratorcomesinadditiontothis.Notethatweneverstoreβitself,butaddit directlyintothestatevectorψ.

3.4. Synthetictruthandobservations

The experiments in thispaper are so-calledidentical twinexperiments, meaning that the same modelequations are usedtogeneratethesynthetictruestateandtoevolvetheensemble.The truestate ψtrue isgeneratedfromaknownset ofinitial conditionsby runningthe numericalschemewithstochastic modelerrors asdescribed above.Furthermore, ND Lagrangian drifters(driftingbuoys)are simulatedtobe advected passivelyalong theoceancurrentaccordingto asimple forwardEulerintegrationscheme.

Referanser

RELATERTE DOKUMENTER

Moreover, this paper extends the results in (Myhre and Egeland, 2017) and (Myhre, 2019), where a recursive parameter estimation method based on particle filter- ing was used to

This paper presents a new method for simulating particles for computer graphics and video games, based on an improved Jacobi solver and a hybrid approach between velocity time

In Chapter 2, we give a presentation of the field of linear programming, and we describe the standard and revised simplex methods and a parallel revised simplex method called

In Chapter 2, we give a presentation of the field of linear programming, and we describe the standard and revised simplex methods and a parallel revised simplex method called

In Chapter 2, we give a presentation of the field of linear programming, and we describe the standard and revised simplex methods and a parallel revised simplex method called

In Chapter 2, we give a presentation of the field of linear programming, and we describe the standard and revised simplex methods and a parallel revised simplex method called

In Chapter 2, we give a presentation of the field of linear programming, and we describe the standard and revised simplex methods and a parallel revised simplex method called

In Chapter 2, we give a presentation of the field of linear programming, and we describe the standard and revised simplex methods and a parallel revised simplex method called