Massively parallel implicit equal-weights particle filter for ocean drift trajectory forecasting

(1)

Contents lists available atScienceDirect

Journal of Computational Physics: X

www.elsevier.com/locate/jcpx

Massively parallel implicit equal-weights particle ﬁlter for ocean drift trajectory forecasting

Håvard Heitlo Holm

^a^,^b^,∗

, Martin Lilleeng Sætra

^c^,^d

, Peter Jan van Leeuwen

^e^,^f

aMathematicsandCybernetics,SINTEFDigital,P.O.Box124Blindern,NO-0314Oslo,Norway

bDepartmentofMathematicalSciences,NorwegianUniversityofScienceandTechnology,NO-7491Trondheim,Norway cInformationTechnologyDepartment,NorwegianMeteorologicalInstitute,P.O.Box43Blindern,NO-0313Oslo,Norway dDepartmentofComputerScience,OsloMetropolitanUniversity,P.O.Box4St.Olavsplass,NO-0130Oslo,Norway eDepartmentofAtmosphericScience,ColoradoStateUniversity,3915W.LaporteAve.,FortCollins,CO80521,USA fDepartmentofMeteorology,UniversityofReading,EarleyGate,ReadingRB66BB,UK

a r t i c l e i n f o a b s t r a c t

Articlehistory:

Received1October2019

Receivedinrevisedform14February2020 Accepted26February2020

Availableonline4March2020

Keywords:

Dataassimilation Particleﬁlters GPUcomputing Shallow-watersimulation Finite-volumemethod Drifttrajectoryforecasting

Forecastingofoceandrifttrajectoriesareimportantformanyapplications,includingsearch and rescue operations, oil spill cleanup and iceberg risk mitigation. In an operational setting,forecastsofdrifttrajectoriesare producedbasedoncomputationallydemanding forecasts of three-dimensional ocean currents. Herein, weinvestigate a complementary approachforshortertimescalesbyusingtherecentlyproposedtwo-stageimplicitequal- weights particlefilter applied toasimplified ocean model.To achieve this, wepresent a new algorithmic design for a data-assimilation system in which all components – includingthemodel,modelerrors,andparticlefilter–takeadvantageofmassivelyparallel computearchitectures,suchasgraphicalprocessingunits.Fastercomputationscanenable in-situ and ad-hoc model runs for emergency management, and larger ensembles for betteruncertaintyquantification.Usingachallengingtestcasewithnear-realisticchaotic instabilities, werun data-assimilationexperiments basedonsyntheticobservations from driftingandmooredbuoys,andanalyze thetrajectoryforecastsforthedrifters.Ourresults show that even sparsedrifterobservations are sufficient tosignificantlyimprove short- term driftforecastsup totwelve hours.With equidistantmooredbuoys observingonly 0.1%ofthestatespace,theensemblegivesanaccuratedescriptionofthetruestateafter dataassimilationfollowedbyahigh-qualityprobabilisticforecast.

©²⁰²⁰^TheÂuthors.^Published^byÊlsevierÎnc.^Thisîsânôpenâccessârticleûnder^the^CC BYlicense(http://creativecommons.org/licenses/by/4.0/).

1. Introduction

Prediction ofdrift trajectoriesinthe oceanhas manyapplicationsthat are importanttosociety andtheenvironment.

Examples includesearch andrescue operations, recovering objectslost atsea, planningof boomplacements foroil spill cleanup,andpreventingcollisionsbetweenicebergsandoffshoreinstallations.Toproducehigh-qualitydrifttrajectoryfore- casts,itisimportanttohaveagoodrepresentationofoceancurrents.Thisisnotaneasytask,asoceancurrentshavelarge naturalvariabilityandtherearetypicallyfewavailableobservations.Furthermore,thesizeofoceanlow- andhigh-pressure systems,so-callededdies,ismuchsmallerthantheiratmosphericcounterparts,anditischallengingtoplacethemcorrectly intypicalgridresolutionsusedbyoperationaloceanmodelstoday.

*

Correspondingauthorat:MathematicsandCybernetics,SINTEFDigital,P.O.Box124Blindern,NO-0314Oslo,Norway.

E-mailaddresses:Havard.Heitlo.Holm@sintef.no(H.H. Holm),m.l.saetra@met.no(M.L. Sætra),peter.vanleeuwen@colostate.edu(P.J. van Leeuwen).

https://doi.org/10.1016/j.jcpx.2020.100053

2590-0552/©²⁰²⁰^TheÂuthors.^Published^byÊlsevierÎnc.^Thisîsânôpenâccessârticleûnder^the^CC^BY^license (http://creativecommons.org/licenses/by/4.0/).

(2)

The operational approach for drift trajectory prediction is to use the currents fromthe mostrecent ocean forecasts directly [1].Theseareimported fromcomputationallyexpensiveoceancirculationmodels,such asROMS [2], whichsolve the dynamicstate oftheocean inthreedimensions. Typically,alarge portionofthesimulationrun-timeisspent onthe data assimilation, which uses available real-world observations to correct the modeled ocean statesthat serve asinitial conditions forthenext forecast.Common forecastranges foroceancirculation modelsare threeto fivedays.Operational drift trajectory forecastsat the Norwegian Meteorological Institute (MET Norway) are produced by OpenDrift [1], which is an offline trajectory model that reads the ocean current forecasts to predict drift trajectories. Although OpenDrift is computationallyefficient,theoceancirculationmodelsstillrequireaccesstosupercomputers.

This paper explores the option of using a recently proposed filter method applied to a simplified ocean model for efficientdrift trajectoryforecasting.The aimisto buildadata-assimilationsystemthatcanrun efficientlyoncommodity- level desktopcomputers, andalsobe extendable tosupercomputers. We achieve this by usinga simplified ocean model and a data-assimilationmethodthat both are able to take advantage ofmassively parallelaccelerator hardware, such as the graphical processingunit (GPU). This work is not intended asa substitute ofcurrent operational systems, but asa complementary approach, in which the predicted currents may even be updated with in-situ observations, e.g., during ongoing search andrescueoperations. Furthermore,byenabling researchmodels torun onindividualdesktopandlaptop computers,researchersareabletodomorerapidprototyping.Atthesametime,thisworkwillcontributetomoreefficient simulationsalsoonsupercomputers,sinceallalgorithmsmaybeextendedtorunonmultipleGPUsandcomputenodes.

The paperisorganized asfollows:We startby highlightingour contributionsandreviewingrelatedwork relevantfor Lagrangian dataassimilationwithacceleratedparticlefilters.InSection 2,we describethedata-assimilationproblemand summarizethekey conceptsofso-calledproposal-distributionparticlefilters.We presentthesimplified oceanmodeland modelerrors inSection 3,whereasSection 4offersadetaileddescriptionofanalgorithm forrunningthechosen particle filteronthismodel.ThelattertwosectionsalsodiscusshowtheGPUisusedforefficientimplementationofthecomputa- tionallyintensivecomponents.InSection5,wepresentnumericalexperimentsofdrifttrajectoryensembleforecasts,using an identical-twinexperimentsetupdesignedtoresemblereal-worldoceancurrentsandwithconfigurationinspiredbyop- erationalsystems.Furthermore,weshowanddiscussthestatisticalvalidityoftheforecastsandexaminethecomputational performanceofthesimulations.Finally,Section6containsasummaryandconcludingremarks.

Papercontribution. Wepresentan efficientGPU-accelerateddata-assimilationsystembasedontherecentlyproposed implicit equal-weights particle filter (IEWPF) applied to simplified ocean models described by the rotating shallow-water equations. The data-assimilation algorithm, the numerical scheme for evolving the ocean model, and the algorithm for samplinglocallycorrelated,well-balancedrandommodelerrorsarealldesignedtotakeadvantageofmassivelyparallelar- chitectures.Thedata-assimilationsystemistailoredforobservationsoftheoceancurrentobtainedfromeitherfree-drifting buoysormooredbuoys.Weshownumericalexperimentsforassimilatingachallengingtestcasewithnear-realisticchaotic behavior, alongwithdrift trajectoryforecasts.Theresultsshowthat byassimilatingapproximately0.1%ofthestatespace, the posteriorensemble meanstronglyresemblesthe truestate intheentiredomain, thusenabling an accurate drifttra- jectory forecasts. This isalsothe first timethat the IEWPFmethodis applied tohigh-dimensional geophysical problems.

Furthermore,weshowthat theparticlefilterhaswell-behavedstatisticalproperties,andthatthecomputationalefficiency ofthedataassimilationiswell-balancedwithrespecttothemodel.Tothebestofourknowledge,thereexistsnoprevious massivelyparallelimplementationsofastate-of-the-artparticlefilterappliedtoahigh-dimensionalgeophysicalsystem.

Relatedwork. Particle filters, andmore generallySequential MonteCarlo (SMC)methods, constitute a large class ofnu- merical methods for statistical inference. It is well-known that the standard particle filter is prone to degeneracy in high-dimensional systems [3–5], and there have been several attempts at designing particle filters without this limita- tion. A few such particlefilters havebeen used onhigh-dimensional, near-realistic applications inthe geosciences.Ades and vanLeeuwen [6] use the equivalent-weightsparticle filteron a high-dimensional, simplified,ocean model basedon thebarotropicequations,showingthatitispossibletoavoidthedegeneracyprobleminhigh-dimensionalsystems,atthe cost ofa biasedestimate. Althoughthe schemeperformedwell,the biasgrowswithensemble size.Poterjoy,Sobashand Anderson [7] usealocalparticlefilterontheweatherresearchandforecastingmodel,inwhichthebootstrapparticlefilter isapplied locallytoobservations,andparticlestatesaremergedinstate spacebetweenthelocationsoftheobservations.

However, it remains problematic toglueparticles fromtheselocalupdatestogether to fullparticles that spanthe whole modeldomain.Theneededsmoothingcaneasilydestroydelicatebalances intheﬂow.Furthermore,theminimumsize of the localareasissetbyphysicallength-scale constraints,typically meaningthattoomanyobservationsarewithina local domaintoavoiddegeneracy.Inpractice,aminimumweightvalueisset,meaningthatnotallinformationisextractedfrom theobservations.Hence,localization isnotsolvingtheproblem.Arecentreviewby vanLeeuwenetal. [8] discussesmost recentdevelopmentsonparticleﬁltersforhigh-dimensionalgeophysicalsystems.

SeveralimplementationsofstandardparticlefiltersforparallelarchitecturessuchasGPUsexist,butmainlywithinother scientific disciplines than geosciences.Lopez etal. [9] present GPU-implementations of a particle filter (with sequential importanceresampling)andauxiliaryparticlefiltertodetectanomaliesinmanufacturingprocesses,andshowsufficientper- formanceforreal-timeapplication.Gelencsér-Horváthetal. [10] introduceamodifiedcellularparticlefilterwithMetropolis resamplingon theGPUforreal-timeapplications.LibBi [11] isasoftware packageforstate-spacemodeling and Bayesian inferencecapableofutilizingGPUs.SeveralparticlefiltersareimplementedinLibBi,e.g.,particleMarkovChainMonteCarlo

(3)

(pMCMC)andSMC².OthermethodsinLibBiincludetheExtendedKalmanFilter(EKF)andparameteroptimization routines.

BaiandHu [12] demonstrateparticlefilter-baseddataassimilationforsimulationofwildfirespread,withparallelsampling andweightcomputationbasedontheMapReduceprogrammingmodel.Inamorerecentwork,Baietal. [13] describemore efficientroutingofparticlesbetweenprocessingunitsintheresamplingstepofadistributedparticlefilter.

Otherdata-assimilationmethodshavealsobeensubjecttoGPU-accelerations.BlattnerandYang [14] giveaperformance study ofa GPU-implementation ofthelocal ensembletransformKalman ﬁlter,WeiandHuang [15] explorea GPU-based implementationoftheEKF,andQuinnandAbarbanel [16] presentageneralpathintegralMonteCarloapproachappliedto aneuronmodel.Theyallreportmassivespeed-upsontheorder100-1000overCPUimplementations.Theoreticalspeed-up basedonhardwarespeciﬁcationsforFLOPSandmemorybandwidthisontheorder10 [17].

AssimilationofLagrangiandataischallengingduetothepotentialcomplexityofthetrajectoriesandtheneedfortrans- formingthedatainto Eularianvelocity data(forfixed-gridorspectral numericalmodels).Apte, JonesandStuart [18] use particlesmoothingforassimilatingLagrangian datafromdriftersandpresentthreemethods forsamplingfromtheexact posteriorprobabilitydensityfunctionbasedontheLangevinequation andtheMetropolis-Hastings algorithm.Theirmeth- ods are showntoproduce better resultsthan theensemble Kalmanfilterusingperturbed observations. Spiller, Apte and Jones [19] usebothparticlefilteringandsmoothing(exactposteriorsampling)forassimilatingLagrangiandatafromgliders anddrifters. Theyproposea newobservationoperatorto dealwiththe highuncertainty inthelocationsofthe observations.Spilleretal. [20] investigatethedivergenceofaparticlefilterforthepoint-vortexmodel.Theyintroducebacktracking particlefiltersandshowthatthefiltersoutperformEKFforthetwo-pointvortexsystem.Othermethodsthanparticlefilters andsmoothershavealsobeensuccessfullyimplemented [21–25].

2. Thedata-assimilationproblem

Therearemanypotentialsourcesforerrorsinthesimulationofatmosphericandoceanographicprocesses.Theseerrors mayarise fromphysicalprocessesmissinginthemathematicalmodel,discretizationerrorsinthenumericalmethod,sub- grid effects that can not be resolved in thediscretized model, anduncertainties in modelparameters, initial conditions, forcing andboundary conditions. Hence, we do not only wish to simulate the behavior of the unknown physical state, denoted by ψ, butrather its probability density function (pdf), p(ψ). As geophysical applications tend to be very high- dimensionalanddrivenbynonlinearprocesses,ananalyticdescriptionofp(ψ)isgenerallyunobtainable,andanensemble- based Monte-Carlo simulation is one way to measure the uncertainties in the system. In its simplest form, ensemble- basedstatisticalsimulation consistsofaset of Ne independentstate vectors{ψ_i}i=¹,...,Ne,which areinitialized according to uncertainties in the model parameters andinitial conditions.The state of each ensemble member is then simulated independentlyaccordingtothemodelequation,

ψ

ⁿ_i

=

^M

ψ

ⁿ_i⁻¹

+ β

ⁿ_i⁻¹

,

forn

=

¹

,

2

, ...,

(1)

inwhichthemodelMevolvesthesolutiondeterministicallyfromtimetⁿ⁻¹totⁿ,andβⁿ_i⁻¹isanoptionalstochasticvariable thatrepresentsrealizationsoftheerrorsinthemodel.Thepdfofthesystemcanthenberepresentedthroughthestatistical propertiesoftheresultingensemble,e.g.,as

p

(ψ

ⁿ

) =

¹ Ne

Ne

i=1

δ

ψ

ⁿ

− ψ

ⁿ_i

,

(2)

inwhichδistheDiracdeltafunction.

Ifan observation yⁿ ofthesystemisavailable attimetⁿ,thisinformationcanbeusedtoimprovetheobtainedproba- bilitydensity.Typically,theobservationisalsoinﬂuencedbyuncertainty,as

yⁿ

=

^H

ψ

ⁿ_true

+

ⁿ

,

(3)

inwhichHistheobservationoperatorthatmapsthetruestateψⁿ_truetoobservationspaceand ⁿ isastochasticobservation error.Theobservationstypicallyonlycoverpartsofthesystem, sothatthesizeoftheobservationvector(denoted Ny)is smallerthanthesizeofstatevector(denoted Nψ).Thisisparticularlytrueforgeophysicalsystems,forwhichitisnormal that N_y^Nψ (e.g., y can bethe value anddirectionof theoceancurrent atasingle point inspaceandtime).Because ofthis, we cannot simplyreplace theobserved partsofψⁿ withthe valuesin yⁿ directly, andwehave toconsiderthe conditional pdf p

ψⁿ|^yⁿ

.Thedata-assimilationproblemconsistsofﬁnding thisconditional density,andits fundamental buildingblockisBayestheorem:

p

(ψ

ⁿ

|

yⁿ

) =

^p

(

yⁿ

|ψ

ⁿ

)

p

(ψ

ⁿ

)

p

(

yⁿ

) .

(4)

The originalpdf p(ψⁿ)is heretermedthe priorprobability,asit representsourunderstanding ofthesystem priorto as- similatingtheinformationintheobservation.Thelikelihood p(yⁿ|ψⁿ)expressestheprobability ofobserving yⁿ underthe

(4)

assumptionthatψⁿ isthetruestateofthesystem.Themarginalprobability p(yⁿ),i.e.,theprobabilityofobserving yⁿ,acts mainlyasanormalizationconstantandensuresthattheresultingposteriorprobabilitydensityisapdf.

2.1. Standardparticleﬁlter

Thestandardparticleﬁlterisanensemble-baseddata-assimilationtechniquethatusesadirectevaluationofBayestheo- rem.Eachparticle(equivalenttoan ensemblemember),ψ_i,isassignedaweight w_i that givestherelative importanceof that particleintheensemble.Typically,all Ne particlesare initializedwithweight w⁰_i =¹/Ne,astheyare sampledinde- pendentlyfromthepdfoftheinitialconditions,p(ψ⁰).Eachparticleisthensimulatedindependentlyaccordingto(1) until observationtimetⁿ.Byapplying(4) directlywith(2) asthepriordensity,andbyconsideringthemarginalprobabilityasa normalizationconstant,theposteriordistributionisexpressedas

p

(ψ

ⁿ

|

^yⁿ

) ∝

Ne

i=¹

p

(

yⁿ

| ψ

ⁿ_i

)

_N_e

j=¹p

(

yⁿ

| ψ

ⁿ_j

) δ(ψ

ⁿ

− ψ

ⁿ_i

)

=

Ne

i=1

wⁿ_i

δ(ψ

ⁿ

− ψ

ⁿ_i

).

(5)

Here,thelikelihoodisusedtoupdatetheweights wⁿ_i foreachparticle,sothattheposteriorisrepresentedbyaweighted discretedistribution.Wecanevaluatethelikelihoodifweknowthepdffortheobservation.Forinstance,iftheobservation errorisGaussian, ⁿ∼^N(0,R),theweightforparticleψ_ibecomes

w_i

∝

^exp

−

¹ 2

yⁿ

−

^H

(ψ

ⁿ_i

)

T

R⁻¹

yⁿ

−

^H

(ψ

ⁿ_i

) .

(6)

Assomeparticlesinevitablyendupwithverylowweights,theynolongercarrysignificantstatisticalvalue.Toimprove the statisticalcoverage inthe high-probabilityregions, theensemble isresampled accordingtothe weightdistribution in (6), so that {ψⁿ_i}i=1,...,Ne ∼^p(ψⁿ|^yⁿ). All weights for the resampled particles are then reset to 1/Ne. This is known as sequentialimportance resampling.Severalschemescanbe usedforthisresampling [4],andinthisworkwe considerthe residualresamplingscheme [26].Notethatifthemodel(1) hasβ=^0,îtîsîmportant^that^duplicated^particlesâre^givenâ perturbationtoavoidensemblecollapseandcompletelyoverlappingparticletrajectories.Withastochasticmodel,however, exactduplicationswillevolvedifferentlythroughindependentrealizationsofβ_i.

One ofthe main advantages of the standard particle ﬁlteris that it preservesall physicalproperties throughout the simulation,astheﬁnalparticles aregeneratedfromsuccessfulsimulationrunsandnot throughmanipulationofthestate vectors. A drawback, however, is that the ensemble is prone to collapse when the dimension of the observation space increases [3–5].Inhigh-dimensional systems,all particlesendup inthetailofthe likelihood,withtheconsequencethat onlyveryfew particles(perhapsevenjustone)gainamuchhigherweightthanallothers.Thedistributionthencollapses asall N_e particlesareresampledfromfew(orasingle)particlesthathavenon-zeroweights.Thisproblemisoftenreferred toasthecurseofdimensionality.

2.2. Theimplicitequal-weightsparticleﬁlter

One technique used forovercomingthe curse ofdimensionalityisto sample the statesψⁿ_i froma proposaldensity,q, withanappropriate compensationintheweights.First,(1) showsthat thepdf ofthestate attimetⁿ isrelatedtothatof theprevioustimebytheMarkovianproperty

p

(ψ

ⁿ

) =

p

(ψ

ⁿ

|ψ

ⁿ⁻¹

)

p

(ψ

ⁿ⁻¹

)

d

ψ

ⁿ⁻¹

≈

¹ Ne

Ne

i=1

p

(ψ

ⁿ

|ψ

ⁿ_i⁻¹

),

(7)

whereweassumedthatallparticleshavethesameweightattimetⁿ⁻¹.Inthestandardparticleﬁlter,wedrawtheevolution of theparticlefrom p(ψⁿ|ψⁿ_i⁻¹), whichisequivalent to solving themodelequation forone time step.We canchoose it differently,byﬁrstmultiplyinganddividingtheargumentoftheintegralbyaproposaldensityq andthendrawtheparticle evolutionfromthatdensity,

p

(ψ

ⁿ

) =

¹ Ne

N_e

i=¹

p

(ψ

ⁿ

| ψ

ⁿ_i⁻¹

)

qi

(ψ

ⁿ

| ψ

ⁿ₁⁻_:_N¹

e

,

yⁿ

)

^qⁱ

(ψ

ⁿ

| ψ

ⁿ₁⁻_:_N¹

e

,

yⁿ

).

(8)

We have large freedom in how to choose q, butthe support of q is required to be equal toor larger than the support of p(ψⁿ|ψⁿ_i⁻¹),and itshould preferably be easy to sample from. Here, theproposal ischosen tobe conditioned on the

(5)

observation yⁿandallparticlestatesattheprevioustimestep,ψⁿ₁⁻_:_N¹_e,anditdependsontheparentstateψⁿ_i⁻¹viaindexi. UsingtheproposaldensityinBayestheorem(4) givesus

p

(ψ

ⁿ

|

^yⁿ

) =

¹ N_e

Ne

i=¹

p

(

yⁿ

| ψ

ⁿ

)

p

(ψ

ⁿ

| ψ

ⁿ_i⁻¹

)

p

(

yⁿ

)

q_i

(ψ

ⁿ

| ψ

ⁿ₁⁻_:_N¹

e

,

yⁿ

)

^qⁱ

(ψ

ⁿ

|ψ

ⁿ₁⁻_:_N¹_e

,

yⁿ

).

(9)

Bynowsamplingψⁿ_i∼^qi(ψⁿ|ψⁿ₁⁻_:_N¹_e,yⁿ),theposteriorbecomes

p

(ψ

ⁿ

|

^yⁿ

) =

Ne

i=¹

wⁿ_i

δ(ψ

ⁿ

− ψ

ⁿ_i

),

with wⁿ_i

=

^p

(

yⁿ

| ψ

ⁿ_i

)

p

(ψ

ⁿ_i

| ψ

ⁿ_i⁻¹

)

Nep

(

yⁿ

)

qi

(ψ

ⁿ_i

| ψ

ⁿ₁⁻_:_N¹

e

,

yⁿ

) .

(10)

Onechoiceofqistheoptimalproposaldensity[27],inwhichqi(ψⁿ|ψⁿ₁⁻_:_N¹

e,yⁿ)=^p(ψⁿ_i|ψⁿ_i⁻¹,yⁿ).Byconsideringalinear observation operator H andGaussian modeland observationerrors, β∼^N(0,Q) and

_∼N(0,R),the optimal proposal densityisequivalentto N(ψⁿ_i^,^a,P),with

ψ

ⁿ_i^,^a

=

^M

(ψ

ⁿ_i⁻¹

) +

^{Q H}^T

H Q H^T

+

^R

₋1

dⁿ_i (11)

and P

=

Q⁻¹

+

H^TR⁻¹H

₋1

,

(12)

inwhich

dⁿ_i

:=

^yⁿ

−

^{H M}

(ψ

ⁿ_i⁻¹

)

(13)

iscalledtheinnovationforparticlei.Theproposalisoptimalinthesensethatitgivesoptimalvarianceintheweightsfor proposalsoftheformq(ψⁿ|ψⁿ_i⁻¹,yⁿ),butasitturnsout,itisnotsuﬃcienttoavoidensembledegeneracy [3,5,28].

Themainparticlefilterwewilluseinthisworkisan extensionoftheimplicitequal-weightsparticlefilter(IEWPF).In theIEWPF [29],q ischosensimilarbutnotidenticaltotheimplicitparticlefilter [30] bychoosingthenewparticlesas

ψ

ⁿ_i

= ψ

ⁿ_i^,^a

+ α

_i¹^/²P¹^/²

ξ

i

,

(14) inwhichξi isa drawfromthestandardmultivariateGaussian distributionξi∼^N(0,I)and

α

i isafunctionofboth ξ and ψ_iⁿ⁻¹.Furthermore,we choose

α

i suchthat the weights ofall particles becomeequal toa target weight,which isequal to the lowest optimalproposalweight of all the particles.This choice is neededto ensure that we keep all particles in theensemble, butcomeswithtwo drawbacks. Firstly,when thenumberofparticles increases, theworst particlewill be located furtherandfurther away fromthe observations,so thescheme enforces all particles tomove further away from theobservations.Secondly,numericalexperimentsshow thatthespreadoftheparticles becomesunderestimated inlow- dimensionalsystems(itsbehavior inhigh-dimensionalsystemsishardertoassessaswedonotknowthetrueanswer).Not withstandingthesenegatives,theIEWPFistheﬁrstparticleﬁlterthathasuniformweightsinhigh-dimensionalsystems.

Toalleviatethesetwoissues,Skauvoldetal. [31] extendedtheschemebyproposinganupdateequationforeachparticle oftheform:

ψ

ⁿ_i

= ψ

ⁿ_i^,^a

+ α

_i¹^/²P¹^/²

ξ

i

+ β

¹^/²P¹^/²

ν

i

,

(15) inwhich

ν

i isasecond randomvector

ν

i∼^N(0,I)andβ isacovariancescaling parametercommontoall particles.The introductionofthenewtermenablesustoremove theunderestimation oftheparticlespreadbytuning β.Furthermore, wecanchoose

α

iandβ suchthatthetargetweightisequaltothemeanoftheoptimalproposalweights.Theconsequence ofthischoiceisthattheparticlesarenotforcedawayfromtheobservationswhentheensemblesizeincreases.Withthis, bothproblemsaresolved,andthisnewschemeisthebasisforournumericalexperiments.Detailsoftheschemearegiven inAppendixA.

3. Simpliﬁedoceanmodelformassivelyparallelarchitectures

Traditional ocean circulation models [2,32] aregenerally written to resolve asmany of the physicalprocesses in the ocean aspossible,andtypicallyconsider conservationofmass,momentum,energy, andtracers(salt andtemperature)in threedimensions.Thismakesthemverycomputationallydemandingandlimitsthefeasiblenumberofensemblemembers.

Thenumberofmembersinanoperationalensemblepredictionsystemtodayisusuallybetween10and100.Insteadofafull three-dimensionaloceancirculationmodel,weassumethattheverticalvelocitiesarenegligiblecomparedtothehorizontal movement, andlet thenonlinear shallow-water equations in a rotational domainserve as a simpliﬁed model.Thus, we vastlyreducethestatespaceoftheproblem. Inoperationalsettings,thesimpliﬁedmodelmaybeinitialized basedonthe

(6)

mostrecentoceanstatefromatraditionaloceancirculationmodel,andbeusedfortoforecast short-termoceancurrents.

Furthermore,drift ofLagrangian objectsintheocean aretypically drivenby theocean currents,wind,andwave-induced forces(Stokesdrift) [33],whereasinthisworkweonlyconsiderthecontributionfromtheoceancurrents.

Theshallow-waterequationsareintheclassofhyperbolicconservationlaws,whichareoftensolvedusingexplicitfinite- volumemethods [34].Thisclassofproblemsiswell-suitedforefficientimplementationonmassivelyparallelhardware,such asGPUs [35–37].Byalsocarefullytailoringthedata-assimilationalgorithmstouselocaloperations,weareabletorunthe mostcomputationally demandingpartsofthe codeontheGPU. Controlflowandintrinsicserialoperations,however,are still carriedout on the CPU.This way,we useeach processor type forthe taskwhich it isbest suitedfor. Through this approach,wecanefficientlyrunanensembleofasimplifiedoceanmodeloncommodity-leveldesktopcomputers,reducing therequirementsforaccesstosupercomputers.

TheGPUisanextremecaseofamany-coreprocessor,withhundredsorthousandsofsimplecores.Measuredinfloating- point operationsper second(FLOPS), astandarddesktopGPUsurpassestheperformanceofthetop supercomputerinthe world ten years ago [38], and is today roughly ten times asfast as the CPU. GPUs were initially designed for efficient graphics operations, buthave becomeincreasingly popular forgeneral-purpose computingover thelast 15 years.Due to theirdesignforoptimizedthroughputofdata-paralleloperationsandlowpricesdrivenbythegamingmarket,theybecame attractiveacceleratorswhenthesteadilyincreasingCPUclockfrequencycametoanend [39].Programminglanguagessuch as CUDA and OpenCL, and easy access to highly specialized third-party libraries,¹ debuggers and profilers, have further contributedtomakethemaccessibleforawiderangeofcomputationalproblems.

TheprogrammingmodeloftheGPUisaccessedthroughkernels,whichareprogramswritteninspecializedlanguagesfor runningontheGPUinaSIMD/SIMT(SingleInstruction,MultipleData/Threads)fashion.Thethreadsareorganizedinblocks, whichagainareorganizedinagrid.Thegrid(andblocks)canbe one-,two- orthree-dimensional,andtheidealchoiceof block-sizeconﬁguration,denotedby(b_x,b_y),willvaryfordifferentkernelsandfordifferentGPUs.Eachthreadcancommu- nicate withotherthreadsinthesameblockthroughthesharedmemory,whichcanbedescribedasaprogrammablecache orscratchpadmemory.Communicationbetweenthreadsindifferentblocks,however,requirescostlyglobalsynchronization.

The GPUdoesnot sharethemainCPUmemory,andallrequireddatathereforeneedstobeexplicitlytransferredbetween the GPU andCPU. This operation is relatively expensiveand should be minimized foroptimal performance. For a more thoroughintroductiontoGPUcomputing;see,e.g.,SandersandKandrot [40].

Toachievebothcomputationalperformanceandcodedevelopmentefficiency,wetreatthecomputationalintensivepart of the codeandthe program flow indifferent ways. PyCUDA [41] isa Python package that exposesthecomplete CUDA run-timeAPI andallows ustocallnativeGPUkernelswritteninCUDAdirectlyfromPython.Thisway,one canwritethe program flow,aswellaspre- andpost-processingofthespecific applications,inhigh-level Python,andatthesametime ensure thatthecomputationallyexpensivesimulationlooprunsasefficientaspossiblethroughlow-levelCUDAC/C++.By takingadvantageofwidelyavailable andpopularpackages–includingNumPy [42] andmatplotlib [43],andenvironments suchastheJupyterNotebook [44] –thecodeandexperimentscanbedevelopedefficientlythroughrapidprototyping.

Intheremainderofthissectionwegive anoverviewofthemodelandthemodelerrors,andshowhowweutilizethe GPUtoincreasecomputationaleﬃciency.

3.1. Thesimpliﬁedoceanmodel

Theshallow-waterequationsconsiderthreeconservedvariables;theelevation

η

ofthefreeoceansurfacerelativetoits equilibriumlevel,andthevolumetransporthuandhv alongtheabscissaandordinate,respectively.Theequilibriumdepth isgivenbyHeq andishereassumedtobeconstant,sothatthefullheightofthewatercolumnbecomesh=^Heq+

η

.With gravitationalaccelerationgandCoriolisparameter f,theshallow-waterequationscanbewritten

( η )

t

+ (

hu

)

x

+ (

hv

)

y

=

0

, (

hu

)

_t

+

hu²

+

¹

2gh²

x

+ (

huv

)

_y

=

^{f hv}

, (

hv

)

t

+ (

huv

)

x

+

hv²

+

¹

2gh²

y

= −

^{f hu}

.

(16)

Theequationsrepresentahyperbolicconservationlaw,andcanbewritteninvectorformas

ψ

_t

+

^F

(ψ)

x

+

^G

(ψ)

y

=

^Sf

(ψ ),

(17)

forastatevectorψ= [

η

,hu,hv]^T^.^Here,^Fând^Gâre^flux^termsâlong^theâbsiccaândôrdinate,respectively,andS_f consists ofthesourcetermsduetotheCoriolisforces.

The modeloperator M(ψ) will be the numericalscheme that solves (16) and evolvesthe state forward in time. We usethehigh-resolution central-upwindschemeproposedby Chertocketal. [45],butwithareformulationthat avoidsthe

1 BLAS,RNG,FFT,imageandsignalprocessing,collectivecommunicationprimitives,graphanalytics,etc.

(7)

expensiverecursiveformulationofCoriolispotential terms [46].The schemeisdesignedtobe well-balancedwithrespect tothegeostrophicbalance,

hu

= −

^{g H}^eq f

∂ η

∂

y and hv

=

^{g H}^eq f

∂ η

∂

x

,

(18)

which permitsrotatingsteady-state solutionsby balancing thegravitationalandCoriolis forces.The numericalschemeis solvedonaCartesiangrid^M consistingofN_M=ⁿx×ⁿycells.Thesizeofeachcellisx×y,sothatthecellwithindex (j,k),containingthevalueψ_j_,_k,isthecellcenteredat

(

x_j

,

y_k

) =

j

+

¹₂

x

,

k

+

¹₂

y

.

(19)

The total size ofthe state vector ψ then becomes Nψ =^3NM. The time integrationis solved by a second-order strong- stability-preserving Runge-Kuttamethod, andthe storagerequirement forthe schemeis therefore 2Nψ, asthe full state mustbestoredfortwoconsecutivetimesteps.

ThestepsizeofthenumericalschemeislimitedbytheCFLcondition,

t_scheme

≤

¹ 4min

x maxM

u

±

g

(

H_eq

+ η ) ,

y maxM

v

±

g

(

H_eq

+ η )

,

(20)

in which the dominating term is the speed of gravitational waves,

g(Heq+

η

). Even though such waves occur in the ocean,perhapsmostnotablethroughtides,theircontributiontodriftermotionislimited.Eddiesandotherrotation-driven dynamicsaremuchmoreimportant,buttheyoperateonlongertimescales.Nevertheless,theCFL-conditionin(20) mustbe satisﬁedtoensurenumericalstability.Torunthedata-assimilationmodelonarelevanttimescale,wedecouplethemodel operatorM fromthetimestepofthenumericalscheme,andlettheﬁxedmodeltimestept consistofasmanytscheme stepsasnecessary.Weevaluatetheconditionin(20) continuouslytoadaptt_scheme tothemostrecentmodelstate,using aCourantnumberof0.8.

3.2. Smallscalemodelerrors

Toaccountforerrorsinourmodel(e.g.,missingphysics),weintroducesmall-scaleperturbationsthroughthestochastic variable,β= [δ

η

,δhu,δhv]^T^,^so^thatβ isapproximatelydrawnfromN(0,Q).Thismodelerrorisgeneratedbysamplinga randomvectorξ∼^N(0,I)andapplyingacovarianceoperator,

β =

Q¹^/²

ξ.

(21)

Thiserrorisaddedtothemodelstateaftereachmodeltimestept.Wedesignthecovarianceoperatorbasedontwore- quirements.First,sinceweaimtoimplementallcomponentsinthedata-assimilationsystemtoruneﬃcientlyonmassively parallelarchitectures,wedesignthecovarianceoperator Q¹^/² intermsoflocaloperations.Second,itisimportantthatthe stochasticmodelerrordoesnotintroducediscontinuitiesornon-physicalmodelstatestothesolution.

Tomaketheperturbationoftheoceansurfaceδ

η

suﬃcientlysmooth,itisgeneratedaccordingtoasecond-orderauto- regressive(SOAR)functiongivenby

δ η

_j_,_k

₌

nx

a=¹ ny

b=¹ Q_SOAR¹^/²

_j_,_k

,

_a_,_b

ξ

_a_,_b

,

(22)

inwhich

Q_SOAR¹^/²

(

j,k

,

a,b

) =

^q0

1

+

^dist

(

_j_,_k

,

_a_,_b

)

L₀

exp

−

^dist

(

_j_,_k

,

_a_,_b

)

L₀

.

(23)

Here,q0 isascalingparameterfortheamplitudeofδ

η

,L0isameasureofthecorrelationlengthscale,anddist(j,k,a,b) istheEuclideandistancebetweenthecenterofthecellswithindices(j,k)and(a,b).Sincethecovariancebetweenpoints thatare farfromeachother relativeto L0 becomeszero,thecomputationalwork canbelimitedtooperateonlocaldata pointsonly,andthissatisﬁestheﬁrstdesignrequirement.Equation(22) canthenbewrittenas

δ η

j,k

=

j+

cSOAR a=j−cSOAR

k+

cSOAR b=k−cSOAR

Q_SOAR¹^/²

j,k

,

a,b

ξ

a,b

,

(24)

in which cSOAR is our cut-off value, tuned so that there are no contribution to δ

η

_j_,_k from a distance larger than cSOARmin(x,y)fromcellj,k.Operationssuchas(24) areverywellsuitedforimplementationontheGPU.

A drawback to the expression in (24) is that the computational work and data dependency of the stencil is tightly connectedtotheratiobetweenL0 andthecellsize.Tohavebettercontrolofthisworkload,weintroduceacoarserandom

(8)

Fig. 1.Alignmentofnestedgridswithc=^3.^The^grid^M containscellsandisusedforevolvingthenumericalmodel,whereasthegrid^R contains pointvaluesandisusedforapplyingtheSOARfunctiononsampledrandomnumbersfromN(0,I).Forbestpossibleassimilationofobservations,anoffset canbeappliedto^R sothatoneofitsgridpointsisco-locatedwiththecellin^Minwhichtheobservationwasmade.

numbergrid^R,onwhichthestandard normaldistributedrandom numbersξ are sampled,andapply theSOARfunction here. We choose the discretization of ^R so that we obtain a good trade-off betweencomputational efficiencyof (24), while maintaining a good spreadof informationwithin the correlated areas.The coarse grid will have grid cells ofsize (˜x,˜y)=^c(x,y),wherec isan oddnumberrepresentingthecoarseness of^R.Valueson ^R areinterpretedas pointvalues,andwedenotethenumberofgridpointsin^R byNR.Byrequiringthatcisodd,weensurethatthepoint values definedon ^R areco-located withcellcenters of^M,asshownin Fig.1.Furthermore,we choosethecoarsening factorcsothatthecut-offfactorin(24) canbechosenascSOAR=^2.Âfter^havingôbtainedδ

η

on^R through(24),weuse bicubicinterpolation,denotedbytheoperatorI,toobtaincell-averagedvalueson^M.

Toavoidthat theperturbation β produces non-physicalmodelstates(the seconddesign requirement),weuse(18) to ensurethatβisingeostrophicbalance.Bydiscretizing(18) withcentraldifferencesonthe^M grid,δhuandδhvarefound fromδ

η

by

δ

hu_j_,_k

= −

^{g H}^eq f

δ η

_j_,_k₊₁

₋ δ η

_j_,_k₋₁

2

y and

δ

hv_j_,_k

=

^{g H}^eq f

δ η

_j₊₁_,_k

₋ δ η

_j₋₁_,_k

2

x

.

(25)

Thisoperationisdenotedby Q_{G B}¹^/².Itshouldbenotedthatthederivativesofδ

η

areapproximatedby(25),eventhoughthey are analyticallyavailabledirectlyfromthebicubicinterpolation.Thereasonisthatgeostrophic balanceisonlymaintained by the numericalscheme withrespect to the grid resolution. The bicubic surface, however,is continuously deﬁnedand will typically containoscillations on sub-grid scale,meaning that thederivatives ofthebicubic surface oftenwillnot be represented bythediscrete valuesonthegrid.The centraldifferencesin(25) arethereforebetter suitedforgeneratinga modelstatethatisinbalanceunderthenumericalscheme.

Evaluatingthecompletemodelerrornowconsistsoffouroperations,

β =

^Q¹^/²

ξ =

^Q_{G B}¹^/²^IQ_SOAR¹^/²

ξ,

(26) inwhichtheﬁrststepistosampleξ∼^N(0,I).Notethat Q_{G B}¹^/² andQ_SOAR¹^/² are linearoperators,whereas I isanonlinear stencil.Theinputandoutputforeachoftheoperationsare

Q_SOAR¹^/²

:

^R

→

^R

,

I

:

^R

→

^M

,

Q_{G B}¹^/²

:

^M

→

³

×

^M

,

(27)

makingthecovarianceoperatoractas

Q¹^/²

:

^R

→

³

×

^M

.

(28)

TheseoperationsareillustratedinFig.2.First,therandomfieldξ issampledonthecoarsegrid^R,andtheSOARoperator Q_SOAR¹^/² isappliedtogenerateacoarsecorrelatedfield.Then,thecorrelatedfieldisinterpolatedontothecomputationalgrid ^M using I,andδhuandδhvarecomputedtobeingeostrophicbalancewithrespecttoδ

η

.

It should be noted that our choice of the model error leads to a non-symmetric square root Q¹^/², and that this implementation-oriented deﬁnitionof Q¹^/² makes useofsigniﬁcantly fewer randomnumbersthan variablesin thestate vector. Using c=3,illustrated inFig.1,asan example,wesample one randomnumberforevery nine

η

variables,and none forhu andhv, since δhu andδhv are computedfrom(25). This resultsin one randomnumber forevery 27state variables.Itshouldﬁnallybe notedthatbecause Q¹^/² isanonlinearoperatorduetothebicubicinterpolation,theβ’sare notstrictlyGaussiandistributed.This,however,isnotaproblemasthecovarianceoftheβ’sisstillsymmetricpositivesemi deﬁnite.

(9)

Fig. 2.Thesmallscalemodelperturbationβ= [δη,δhu,δhv]^T îs^generated^by^first^sampling^random^numbers^fromâ^standard^normaldistributionξ∼ N(0,I)onthecoarsegrid^R.WethengivetherandomfieldacovariancestructureaccordingtotheSOARfunctionQ¹_SOAR^/²,beforeinterpolatingthecoarse randomfieldontothefinemodelgrid^M throughItogetδη.Finally,wecalculatethecorrespondingmomentumδhuandδhvtoimposegeostrophic balance.

3.3. Eﬃcientimplementationofmodelerrors

The SOAR function, bicubic interpolation, and geostrophic balance are all local stencil operations that are simple to parallelize, as each element of their output can be found independently from all other output elements. Generation of randomnumbersξ canfurtherbedonethroughthecuRANDlibraryavailablethroughtheCUDAtoolkit.Thesamplingofβ isthereforewell-suitedforimplementationontheGPU.

TheSOAR functionin(24) with cSOAR=^{2 consists}ôfâ^stencil ôperation^depending ôn⁵×^{5 input}^values^centeredôn thetargetcell.WeuseoneGPUthreadperoutputelement.Tominimizetheamountofdatareadfromglobalmemory,all threadswithinthesameblockcooperatetoreadthecollectivelyrequiredinputdataintosharedmemory.

Inthebicubicinterpolation I,eachvalueintheﬁnegrid^M dependsonthe4×^{4 points} ⁱⁿ^the^coarse^grid^R that surroundsitsposition.Thismeansthatthec×^coutputvaluesthatarelocatedbetweenthesamefourcoarsegridpoints haveoverlappingdatadependencies.WestillapplyoneGPUthreadperoutputelement,andobtaingeostrophicallybalanced δhuandδhvwithinthesamekernel.Eachblockcomputes(bx+²)×(by+²)valuesofδ

η

andstoresthemtemporarilyin sharedmemory,sothatbx×^byvaluesofδhuandδhv eﬃcientlycanbecomputedusing(25).

Thememoryfootprintofobtainingβ istwobuffersofsize NR,holdingξ andtheresultfrom Q_SOAR¹^/² ξ,respectively.The memoryfootprintoftherandomnumbergeneratorcomesinadditiontothis.Notethatweneverstoreβitself,butaddit directlyintothestatevectorψ.

3.4. Synthetictruthandobservations

The experiments in thispaper are so-calledidentical twinexperiments, meaning that the same modelequations are usedtogeneratethesynthetictruestateandtoevolvetheensemble.The truestate ψ_true isgeneratedfromaknownset ofinitial conditionsby runningthe numericalschemewithstochastic modelerrors asdescribed above.Furthermore, ND Lagrangian drifters(driftingbuoys)are simulatedtobe advected passivelyalong theoceancurrentaccordingto asimple forwardEulerintegrationscheme.