UNIVERSITYOFOSLODepartmentofInformatics VerticallyintegratedmodelsofCO migration:GPUacceleratedsimulations MartinErtsås May6,2011

(1)

UNIVERSITY OF OSLO Department of Informatics

Vertically

integrated models of CO ₂ migration:

GPU accelerated simulations

Martin Ertsås

May 6, 2011

(2)

StorageofCO

2

ⁱⁿ^geologialformations,suhasoilandgasreservoirs,isonsid- eredanimportmeanstoredueemissionsofCO

2

^into^theatmosphere. Aurate modelingoftheCO

2

^migrationîsânîmportant^tool^toânalyse^the^riskôf^leakage

inpotentialinjetionsites. Toaountforunertaintiesinthegeologialmodel

weneedto runthesimulationsseveral times, withhanges in theparameters,

for therisk analysis. Alongwith several simulationswealso needlonger time

salesthanonenormallyhaswhensimulationowinporousmedium,sinethe

CO

2

^should ^stay underground a lot longer than the time it takes to extrat fossilfuels. Beauseofthis,wedonotonlyrequireanauratemodel,butalso

afast model forsimulation. Usingthe vertial equilibrium assumption, anda

vertiallyintegratedmodel, hasbeenshown togivegood performanebenets

withrespettothefull3Dmodelsusedinthepetroleumindustrytoday,aswell

asbeinganauratemodel.

InthisthesiswewillinvestigatehowtheGPUanbeusedasanaeleratorunit

forsuhvertiallyintegratedmodels. Wewill omparetheperformanegained

from using aGPU aeleratedsolverand amulti oresolver,with respetto

the performane from a serial solver. Fromthis wewill demonstrate that the

GPUisagoodaeleratorunitforthese model.

The solverswill be demonstrated to sale well both on simple grids with no

faults, as well as on real world data with faults and geologialtraps for the

CO

2

^. ^Lastly ^we ^will ômpare ^the êrrorôbtained ôn^the ^GPU ^by ûsing ^single

preision oating point numbers instead of the double preision used on the

CPU,and showthat thiserrorisnegligible.

(3)

This master thesis has been written at SINTEF ICT as part of the MAT-

LABReservoirSimulationToolboxprojet[38℄.Itorrespondstoapproximately

twelvemonthsworkoveraperiodofoneandahalfyear,withinreasingfous

onthisthesiseveryhalf year. Theworkpresentedhereisarriedoutindividu-

ally,andbuildsupontheworkdonebyLigaardenandNilsen[34℄ onvertially

integratedmodelsforCO

2

^migration.

Toworkwithsuhaprojetoveralongertimesalehasbeenagreatexperiene,

andhasbroughtmemanylessons,bothpersonallyandprofessionally,that will

help me further in life. The thesis has resultedin long days and nights with

littlesleep,butIfeelproud oftheresultpresentedintheend.

The omputerodesdevelopedhavebeentested torun on threedierentma-

hines running dierent GNU/Linux distros (CentOS 5.5, Ubuntu 10.04 and

ArhLinux)using g,aswellasonmahinesrunning32and 64bitWindows

7ompiledwithVisual Studio2005. Everytesthavebeendoneusing aMAT-

LABinterfae,alled fromMATLABR2010a. Asaonsequenetheomputer

odesare onlyveried to work usingMATLAB R2010a,but should alsowork

onnewerMATLABreleases,andpossiblysomeolder.

Tothe reader: Thisthesishasbeensplit intoseveralhapters,whihshould

be possible to read by themselves. However it is reommended to read the

thesis sequentially, to get agrasp of theonept most eiently. I havetried

to giveathoroughexplanationasto whyweuseavertiallyintegratedmodel,

but reommend readersto alsoread Ligaardenand Nilsen [34℄ to getabetter

understanding,asthisthesisbuildsuponthat paper.

Aknowledgements: First of all, I wish to thank my supervisors at SIN-

TEFICT,Knut-AndreasLieandJosteinR.Natvig,whohasprovidedmewith

guidane and advie whenever I needed it, I also appreiate all the red pens

Knut-Andreashas saried to make mythesis what it is today. Myinternal

supervisor, MartinReimers, alsodeservesto bethanked foralwaystaking the

time to sit down with me when needed. For providing me with a well writ-

ten bakground paper, aswell as help spei to CO

2

^migration,^I ^also ^have

to thankHalvorMøllNilsenand Ingeborg Ligaarden. This thesiswould never

have been possible in its urrentstate without your help. A big thanks also

goestoSINTEFICTforprovidingmewithagreatworkingenvironment,both

withregardstohardwareandpeople, andtheUniversityofOsloforholdingfar

toomanyinterestingoursestotinto veyears,aswellasprovidingmewith

(4)

I wish to thank my friends for keeping me motivated, as well as helping me

during mywhole studiesat UiO.I willespeiallythankmybelovedMarenfor

keepingme saneduring mymoststressfulperiods, andmakingsureI alsohad

alifeonthesideofmymasterthesis.

Lastly,Iwishtothankmyfamilyforallthelove,supportandinterestyouhave

shown. A speialthanksgoestomymomanddad whoalwayshavesupported

meinmydeisionsandhelpedmeoutwhenneeded. Iwouldnothavebeenthe

personIamtodayhaditnotbeenforyou.

(5)

Prefae 1

1 Introdution 5

1.1 Introdution. . . 5

1.2 Preisionandauray . . . 7

1.2.1 Unitofleastpreision . . . 7

1.3 Researhquestions . . . 8

1.4 Notation. . . 9

2 Bakground 10 2.1 CO

2

^Capture^and^Storage ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ¹⁰

2.1.1 Trappingmehanisms . . . 12

2.2 ReservoirBasis . . . 12

2.2.1 Parameters . . . 13

2.2.2 Massonservation . . . 13

2.3 TheVertialEquilibriumassumption . . . 15

2.4 Parallelomputation . . . 16

2.5 General-PurposeomputationonGraphisProessingUnits . . . 18

2.6 CUDA . . . 20

2.6.1 Programmingmodel . . . 20

2.6.2 Threadhierarhy . . . 21

2.6.3 Performaneoptimization . . . 22

3 Mathematialmodelsand numerialsimulations 25 3.1 TheVertialEquilibriummodel . . . 25

(6)

3.2 TheFiniteVolumeMethod . . . 28

3.2.1 TheUpstreamMobilityWeightingSheme . . . 30

3.3 TheCFLondition . . . 30

3.4 Implementation . . . 31

3.4.1 Algorithm . . . 31

3.4.2 GPUoptimizations. . . 34

4 Resultsand Disussion 36 4.1 Hardwareandtesting . . . 36

4.1.1 Test setup. . . 36

4.1.2 Note . . . 37

4.2 Idealizedgrids . . . 39

4.2.1 Horizontalresolution . . . 39

4.3 Vertialresolution . . . 40

4.4 Synthetitestsuite. . . 40

4.5 Auray . . . 43

4.6 Migrationpressure . . . 45

4.6.1 Auray . . . 45

4.6.2 Performane . . . 49

4.7 Hardwareutilization . . . 49

5 Conlusion 51 5.1 SummaryofResults . . . 51

5.1.1 Performanebenets . . . 52

5.1.2 Auray . . . 52

5.2 Furtherwork . . . 52

5.2.1 MultipleGPUs . . . 53

5.2.2 Higher-ordersolvers . . . 53

5.2.3 Adaptivity . . . 53

(7)

Introdution

In lateryears, publiawarenessonlimate hanges hasinreased, and several

tehnologieshavebeenproposed to help stopthe,possiblyhumanintrodued,

limatehanges. OneofthesetehnologiesisCO

2

^Capture^and^Storage,^whih

should help redue CO

2

^emissions ^to ^the atmosphere. For seure storage of CO

2

ît îsîmportant ^with^thorough ^riskânalysis, ^to ^minimize^the^risk ôf ^leak-

agefrom thestoragesite, whih meansthat simulationsare important forthe

implementationofCO

2

^storage.

Inthisthesis, westudytheuseofaGPUasan aeleratorunit forsimulating

CO

2

^migrationⁱⁿâ^porous^medium. ^We^willômpare^GPUâelerated^simula-

tionsofavertiallyintegratedmodeltosingle-andmulti-oreCPUsimulations.

1.1 Introdution

Today, weare releasing inreasing amounts of CO

2

^into ^the atmosphere, and while the CO

2

onentration has risen signiantly sine the Industrial Rev- olution, the temperature on earth has also risen slightly. There is reason to

believethatthesetwophenomenaarelinked,andin1992theinternationalon-

ern about limate hange led to the United Nations Framework Convention

onClimateChange. Theultimateobjetiveofthisonventionisthestabiliza-

tionofgreenhousegasonentrationintheatmosphereatalevelthatprevents

dangerousanthropogeniinterferenewiththelimate system [47℄.

One of the tehniquesonsidered to help redue emissions to the atmosphere

is CO

2

^Capture ând^Storage ^(CCS).^This^tehniqueônsistsôf âpturing ^CO

2

duringtheombustionandextrationproesses,transportingtheapturedCO

2

toaninjetionsite,andinjetingitintothesubsurfae. TheUNhaveonluded

that CCS is an important step to redue CO

2

êmissions, âs^well âs^being ân

eonomiallyfeasiblesolution[47℄.Theyalso pointoutthat there areseurity

risks onehave to take into onsideration, likeleakagesfrom the injetion site

and raks made from the inrease in pressure. This alls for thorough risk

analysis,whereomputersimulationsplayahugerole. Thefousofthisthesis

(8)

will beon CO

2

^storage, ^and ^more^speially ^simulating ^the^migration ^of ^the

injetedCO

2

^.

CO

2

^storage^means, ^for^our^problem, ^injeting^CO

2

^into ^a^geologial^rok^for-

mation,likeoilandgasreservoirsorsalineaquifers. Thetehnologyrequiredis

thesameasusedwheninjetingwaterorgasforenhanedoilandgasreovery,

whih means that the problem is alreadywellunderstood and the tehnology

present[4℄. CO

2

^storage^has âlso ^been ^deployed ⁱⁿ ^large-saleômmerial âp-

pliations,aswellassmallerresearh projets. Todate, thereexists fourlarge

failities for CO

2

^storage: Weyburn-Midale, Sleipner, Snøhvit, and In Salah,

whih ombinedinjet

∼

^6.4 ^million ^tonnes ^of ^CO

2

^every ^year ^[8, ^13, ^46, ^49℄.

Anotherformation onsideredfor CO

2

^storage ^is^the^Johansen ^formation^out-

side of Norway, whih an sustain a yearly injetion of

∼

³ ^million ^tonnes ^of

CO

2

^[2℄.^Clearly,^the^amount^of^CO

2

^vented^into^the^atmosphere^will^be^greatly

reduedbyimplementingCO

2

^storageⁱⁿ ^large^sales. Ôneêxampleôf^theîm-

patobtainedbyCCSisanonshoreeldoutsideofLongyearbyenonSvalbard.

ThiseldisakeypartinthegoaltomakeLongyearbyentheworld'srstCO

2

neutral ity by 2025, where CO

2

^will ^be ^aptured ^from ^the ^oal-fueled ^power

plantandinjetedintothisformation[48,50℄.

AlternativestoinjetingCO

2

înto^rok^formationsâs^desribed,îs^toînjet^CO

2

intooalseamsandintotheoeanatdepthsgreaterthan1,000m. Bothofthese

tehniquesarestillinaresearhphase,andwillnotbedisussedfurtherinthis

thesis[4,9℄.

Reservoir simulations have been apart of oil-reservoirmanagement for more

than fty years[1℄, andthese kind ofsimulationsanbe usedto simulate mi-

gration of CO

2

ⁱⁿ ^oil ^and ^gas reservoirs. The biggest dierene between the reservoirsimulationsdonein oil andgas reovery, andsimulating CO

2

^migra-

tion,isthedierenttimesales. WhensimulatingCO

2

^migrationîtîsîmportant

tomakesurethereisnoleakages,andonehavetosimulateuntilalmostallthe

CO

2

^is ^trapped, ^whih ^might ^take ^several ^thousand ^years. ^When ^simulating

foroilandgasreoveryontheotherhand,oneseldomsimulatesformorethan

one entury, This means we need both fast and reliable simulations for CO

2

migrationproblems.

Modelsbasedonthevertialequilibrium(VE)assumptionhavelongtraditions

fordesribingowin porousmedia, andwasearlyonextended tohandletwo-

phaseandthree-phaseow[16,36,37℄.Asomputationalpowerhasinreased,

modelsbasedontheVE assumptionhavebeenlessused, andonehaveinstead

used the full 3D model. Reently, there have been a renewed interest in VE

based models for fast simulation of CO

2

^migration, ^when ân âssumption ôf â

sharpinterfaebetweenCO

2

^and^brine ^may^be^reasonable^[14, ^22,^23,^34,^35℄.

The usageof graphisproessing units (GPUs) asaeleratorunits for hyper-

bolipartialdierentialequations,forwhihexpliitshemesarefeasible,have

provento givegood performane benetsoveraCPU onlyimplementation[6,

10,12℄.TherehasalsobeendonesomeworkonGPUaeleratingimpliittrans-

port,whihhaveproventogivegoodperformanebenetsbothwithrespetto

omputingthepreonditionerandsolvingthearisinglinearsystem[5,31℄,but

not asgood as for the expliitase. It is reason to believe that the equation

that arisesin ourproblem, willbewell suitedforexpliittransport, aswellas

(9)

Furtherinthisthesis,wewillusethevertialequilibriumassumptiontomakea

fastsimulationmodel. Thearisingmodelwillbedisretizedandimplementedon

theCPUforbothserialandparallelexeution,togetherwithaGPUaelerated

implementation. The main goalis to see howsuited this problem is for GPU

aeleration,andhowtheusageofsingle-preisionoating-pointnumbersaet

theresults. ThethesisisbasedonapaperbyLigaardenandNilsen[34℄,where

theyexploredthenumerialaspetsofusingthevertialequilibriumassumption

for simulatingCO

2

sequestration. Theybelievethat makingGPU aelerated solversforthisproblemwillgivethepossibilityoflarge-saledesktopsimulations

ofVE modelsonoarsegrids.

1.2 Preision and auray

Thewordspreision and auray willbekeywordsinparts ofthis thesis,we

willthereforestatetheirdenitionhere. Inthedomainofomputing,Websters

ditionarydenes:

Auray: Howlose tothe real value ameasurement is.

Preision: Thenumberof deimalplaes towhih anumberisomputed.

Inthis thesiswewill taitlyassumethat thedoublepreisionvalueisthe real

answer,andusethewordauraytodesribehowlosetothedoublepreision

answeroursinglepreisionsolutionis. Thexedpreisiononaomputermeans

that we only have a nite set of numbers to represent the innite set of real

numbers. Further thismeansthat the lowerthepreisionis, themorelimited

the nite set is, and so the set made up of the singlepreision oating point

numbersisalot smallerthanthe setmade upofthedouble preisionoating

pointnumbers. Clearlytheaurayishighlydependentonthepreision,and

we do not expet our single preision answer to be the same as the double

preisionanswer.

1.2.1 Unit of least preision

Unitofleastpreision(ULP),isameasureofpreisioninnumerialalulations.

Itsdenition wasoriginallydened as:

ULP [29℄:ULP(x)isthegapbetween thetwooating-pointnumbersnearestx,

even ifxisone ofthem.

Bytheintrodution oftheIEEE 754oating-pointstandard,the denition of

ULPhadtohange tohandleNaNandInf. Thenewdenition was:

ULP [29℄: ULP(x) is the gap between the two nite oating-point numbers

nearestx,evenif xisone ofthem. (ButULP(NaN)isNaN.)

TheIEEE754oating-pointstandardrequiresthattheresultsofanelementary

arithmetioperation(addition,subtration,multipliation,division,andsquare

roots)shouldbewithing0.5ULPofthemathematiallyexatanswer,meaning

(10)

Eahtimewedoasinglepreisionarithmetioperation,wewillgainaround-o

error with respet to the double preision,due to the ULP. The errorwill be

small for eah operation, but as the number of operations inrease this error

will be aumulated, and thereby grow. This means we mightend upwith a

largeerror,eventhoughtheerrorintroduedforeahoperationissmall.

1.3 Researh questions

Thegoalofthisthesisis togiveompleteandpreise answersto thefollowing

questions. Thequestionswillbesummedupin Chapter5.

i. Howgoodanasimulationusingthevertial equilibriumassumptionutilize

the GPUhardware?

ii. Howdoesthe performane sale onthe GPU withrespettothe horizontal

andvertialresolution?

iii. How does the performane sale on the GPU with respet to the inlusion

of faultsaswell asvaryinggeometry inthe reservoir?

iv. Aretheresultsobtainedusingsinglepreision ontheGPUaurateenough

for pratial purposes?

The rst question will be answeredby looking at the oupany of the GPU

during exeution. Theseond andthird questionwill beansweredbyrunning

severaltest-asesusingbothidealizedgridsandgridswithrealistifeatures,and

ompared to botha serial and amulti-ore CPU solution. The last question

will beansweredbyomparing thesingle-preisionGPU solutionto adouble-

preisionmulti-oresolution,usingagridoftheJohansenformationoutsideof

Norway.

Thethesisispartitionedintovehapters:

Chapter 1: Ashort introdutiontothethesis.

Chapter 2: Gives the bakground needed for the work done in this thesis.

Thisinludes moreonCCS,reservoirsimulation,thevertialequilibrium

assumption,andGPUprogramming.

Chapter 3: Showshowthestandardtransportandpressureequationsanbe

vertiallyintegrated,andusesthenitevolumemethodtogetanexpliit

sheme. Also disusses alternative algorithms, as well as GPU spei

problems.

Chapter 4: ReportsanddIsussesthetestresultsfromournumerialtests.

Chapter 5: Conludesthendingsfromourtests,anddisussesfurtherwork

onthissubjet.

(11)

Thenotationused throughoutthethesis, willmostlybestandardnotationfor

PDEs. Topreventonfusionhowever,wewillstillestablishitformally.

Our domain will be denoted

Ω

^, ^and ^the ^size ^of ^the ^domain ^will ^be ^denoted

|Ω|

^. ^Fûrther,â^disreteêllⁱⁿ â^2D^domain ^will^be^denoted

C i,j

^,^where

C i,j = [x i−1/2 , x i+1/2 ]×[y j−1/2 , y j+1/2 ]

^will^be^the

i

^th^ellⁱⁿ^the

x

^-diretion^and^the

j

^th

ell in the

y

^-diretion. ^We ^also^have ^the^disrete ^sizes

∆x i = x i+1/2 − x i−1/2

and

∆y j = y j+1/2 − y j−1/2

^, ^whih îs ^the ^size ôf â êllⁱⁿ

x

^and

y

^diretions,

respetively, as well as the size of a ell

|C i,j | = ∆x i ∆y j

^. ^In ^our ^disrete

problem,weareonlyinterestedintwodimensions,hereisthereforenoneedto

establishanequivalentnotationfora3Dspae.

Along with the spae domain, the time domain is also disretized. We let

t n = P n−1

l=0 ∆t l

^,^where

∆t l = t l+1 − t l

^.

Inthisthesis,

x

^and

y

^will^mean^the^normal

x

^and

y

^diretionsⁱⁿ^a^2D^oordinate

systemwhile

x

^,^that^is^a^boldfae^x,^will^mean^the^diretional^vetor

(x, y)

^. ^This

willalsobeusedfurtherin integration,where:

Z

C i,j

f dx =

Z x i+1/2

x i − 1/2

Z y j+1/2

y j − 1/2

f dx dy

Wealsohavetoestablishsomenotationforderivatives. In3Dweusethenormal

notation

∂f

∂z i

as thederivativeof afuntion

f

ⁱⁿ ^diretion

z i

^, ^where

z i

^an^be

eithertime

t

ôrâ^spatial ôordinate

x

^,

y

^,

z

^. ^Further^we^have

∇f = ( ^∂f _∂x , ^∂f _∂y , ^∂f _∂z )

^.

The divergene willbe denoted

∇ · f = ^∂f _∂x + ^∂f _∂y + ^∂f _∂z

^. ^We ^will ^use^a ^vertial

equilibriumassumptiononour3Dmodeltogeta2Dmodel,duringthisproess

wewillintroduethederivatives

∇ || f

^and

∇ || · f

^. ^Thesederivativesisthesame asfor3D,butintwodimensions,andthediretionsarenowparalleltoasurfae,

in thisasethetopsurfaeofthereservoir.

(12)

Bakground

Here wewill givetheneessarybakgroundfor therest ofthe thesis. We will

start bya short introdutionto CCS, before givingabakgroundin reservoir

simulation,thevertialequilibriumassumptionandGeneral-Purposeomputa-

tiononGraphisProessingUnits(GPGPU).

2.1 CO

2

^Capture ^and ^Storage

CO

2

^Captureând^Storageônsistsôf^three^parts,âptureôf^CO

2

^,transportation ofCO

2

^,ândînjetionôf ^CO

2

^.

TheaptureofCO

2

^is^likely^to^be^applied^to^large^point^soures^like^fossil^fuel

plants,fuelproessingplants,andlargeindustrialplants. CapturingCO

2

^from

smallandmobilesoures,suhasintransportationandthebuildingindustry,is

likelytobemorediulttoimplement,aswellastooexpensivewithregardsto

theontributionthiswillgivetotheenvironment. Therearefourbasisystems

forapturingCO

2

^fromûsesôf^fossil^fuelsând^biomass: ^Capture^fromîndustrial

proessstreams,post-ombustionapture,pre-ombustionaptureandoxy-fuel

apture[3℄.

CO

2

^an^betransportedineithergas,liquid,orsolidstate. Themostommon transportationalternativeis touse pipelines orshipsfor transportof gas and

liquidCO

2

^. ^In^atmospheri^pressure,^CO

2

^takes^up^a^large^volume,^whih^means

thatlargefailitiesareneeded. Toreduetransportosts,oneaninreasethe

pressure,whih meansthat moreCO

2

^an^betransportedin thesamevolume.

TofurtherdereasethevolumeneededoneanliquefytheCO

2

^,^whih^is^needed

if shipsare used fortransportation. Liquefationis an establishedtehnology

for naturaland petroleum gases, and the existing tehnology an be used to

liquefyCO

2

^. SolidiationofCO

2

îsûrrently^tooêxpensive^to^be^feasible^[17℄.

The dierenes in transportation ost using ships and pipelines an be seen

in Figure 2.1. One an learly see that using pipelines is the most eonomi-

al solutionif thedistane is lessthan

∼

^1,000 ^km, ^whih ^means ^that ^pipeline

transportationisthemosteonomialfortheCO

2

^storage^failitiesⁱⁿ ^use.

(13)

Figure2.1: CostanalysisofCO

2

^transport. Îmage^from ^Colemanêtâl.^[17℄.

Figure 2.2: Illustration of methods forstoring CO

2

ⁱⁿ underground geologial formations. ImagefromAndersonetal.[4℄.

(14)

CO

2

^storageân^be^doneⁱⁿêitherâ^geologial^formation,^likeôilând^gasêlds,

salineaquifersandoalseams,ordiretlyintheoean. WheninjetingCO

2

^into

the oean, weneed to injet it in depths greater than800 meter. Over time,

most of the injeted CO

2

^will ^dissolve ând ^beome ^part ôf ^the ^global ârbon

yle. Bothoeanstorageandinjetionintooalseamsis,todate,inaresearh

phaseandhasyetto bedemonstratedatapilotsale[4,9℄.

Figure 2.2 illustratesbothoshoreand onshoreCO

2

^storage. ^This^also ^shows

howCO

2

^storageân^beûsedⁱⁿênhanedôilând^gas^reovery^. În^this^thesis^we

willonlyfousongeologialstoragein oilandgasreservoirsbelow800meters.

2.1.1 Trapping mehanisms

OnetheCO

2

^is^injetedunderground,theCO

2

^must^be^trapped^to^be^seurely

stored. TheCO

2

ân^be^subjet^to^strutural,^residual,^solubilityâs^wellâs^min-

eraltrapping[15℄.Wewillpresentallfourmehanismshere,butonlystrutural

andresidualtrappingwillbeaountedforin oursimulations.

Struturaltrappingisthemostommontrappingmehanisms,andmeansthat

theCO

2

^perolates^upwardⁱⁿ^the^reservoir,^sine^the^CO

2

^is^more^buoyant^than

theotherliquids. UltimatelytheCO

2

^will^reah^the^top^of^the^reservoir^where

it meets an impermeable ap-rok,and is trapped. The CO

2

ânâlso ^meetâ

fault,whihtheCO

2

ân^notôw^through,^whih^might^trap^someôf^the^CO

2

^.

AstheCO

2

ôws^through^the^porous^rok,ît^displaes^theûidâlready^present.

While theCO

2

ôntinues ^to ôw, ôther ûids^will ^replaeît, ^but ^someôf ^the

CO

2

^will^stay^behind^as^residual^dropletsⁱⁿ ^the^pore ^spae,^thereby^the^name

residualtrapping. Thisanbeomparedto howwateristrappedinasponge.

WhenCO

2

îsînjetedînto^brine,^someôf^the^CO

2

^will^be^dissolvedⁱⁿ^the^brine.

ThiswillmakethedissolvedCO

2

^slightly^heavier^than^the^brine,^and^it^will^be

trappedbysinkingtothebottom. Thisisknownassolubilitytrapping.

Mineral trappingis aslowproess that goesonfor thousandsofyears. When

CO

2

îs^dissolvedⁱⁿ^water,â^weakârbonâidîs^formed. Ôver^time^thisâidân

reat with theminerals in therok and form solidarbonate minerals, whih

eetivelytrapsCO

2

^by^binding^it^to^the^rok.

2.2 Reservoir Basis

Reservoirsimulation haisbeen performed for nearly half aentury to aid the

petroleum industry in deiding where oneshould plae, and how one should

operate, injetion and extration wells to optimize extration of fossil fuels.

There areseveral hallenges inreservoirsimulation,inludingupsaling ofthe

rok parameters, transport models, and numerial models [1℄. In this setion

wewillfousonthebasitransportandpressuremodel, whilesomeupsaling

usingthevertialequilibriumassumption willbehandledlater.

(15)

This partisasummarizedversionofanintrodutiongiven by Aarnes,Gimse,

andLie[1℄.

Porosityisthevoidvolumefrationofthemedium. Theporosityisdenotedby

φ

^, ^and^isⁱⁿ ^the^range

0 ≤ φ < 1

^. ^Sine^the^porosity^is ^dependent^on^the^pres-

sure, therokis atuallyompressible. Thisompressibilityis oftennegleted

however,andoneassumesthat

φ

^only^depends^on^the^spatialoordinates. For aNorthSeareservoirthepermeabilityistypiallyintherange

0.1 − 0.3

^.

Thepermeabilityisameasureoftherok'sabilitytotransmitasingleuidat

ertainonditions. Theporeorientationandinteronnetionbetweenporesare

essentialfortheow,andthepermeabilityisstronglyorrelatedtotheporosity,

but notneessarily proportional to it. Thepermeability is denoted

K

^and ^is

generally a tensor, however we are often able to diagonalize

K

^by ^a ^hange

ofbasis. TheSI-unitforpermeabilityism

2

,but itisommonlyrepresentedin

Dary(D)ormillidary(mD).OneDaryorrespondstoabout

9.869×10 ⁻¹³

^m

²

^.

The void pores in the medium are lled with dierent phases, likeoil, water,

CO

2

ând^brine. ^The^volume^frationôupied^byâ^phaseîsâlled^the^saturation

of that phase,denoted

s i

^for ^phase

i

^. ^Together^the ^phases^will ^oupy ^all^the

voidvolume,andthesumofsaturationsinaporewillthereforealwayssumto

1

^.

Eah phase haveadensity (

ρ

⁾ ^and ^a^visosity⁽

µ

^), ^whih ^are ^funtions ^of ^the

phasepressureandtheompositionofeahphase. Thedensityandvisosityare

ompressible,buttheompressibilityismostimportantforgasphasesandare

thereforeoftenignoredin uidphases. Dueto interfaialtension,wealsohave

to dene a apillary pressure between phases, being the dierene in density

betweenthetwophases:

p cij = p i − p j

^.

Assumingthatallphasesmaybepresentatthesameloation,itturnsoutthat

theowofonephasedependsontheenvironmentinthatloation. Thismeans

thatthepermeabilityexperienedbyonephasedependsonthesaturationofthe

otherphasesat thatloation,aswellassthe phases'interation withthepore

walls. Wethereforehavetointroduearelativepermeability

k i

^,^whih^desribes

how phase

i

ôws ⁱⁿ ^the ^presene ôf ^the ôthers. ^The êetive permeability experienedbyphase

i

ⁱⁿ^the^presseneôfôther^phasesîs^thereby^given^by

k i K

^.

2.2.2 Mass onservation

Theinompressibletransportequationdesribeshowthesaturationofaphase

hanges overtime. Oneimportantpartof this transportequation isthat it is

mass onservative, whih means that the total volume of aphase should not

hangeexeptforthevolumeinjetedorextratedofthatphase[1℄.Onesimple

modelforthetransportofaninompressiblephase

i

îs^theôntinuityêquation:

∂ρ i φs i

∂t + ∇ · (ρ i v i ) = q i

^(2.1)

(16)

ImagefromDary[19℄.

Here wehaveintrodued asoureterm,

q i

^, ^whih^is ^the ^volume ^rate^of ^phase

i

^. ^If ^we^dene

v = P n

i=1 v _i

^to ^be^the^total ûxând âssumeâninompressible ow,meaning theuid densityis onstant,wean sum Equation(2.1)for all

phases,whih givesus:

∇ · v = X q i

ρ i

(2.2)

The veloities will be modeled using an empirialrelation alled Dary's law,

aftertheFrenhengineerHenryDary.

Dary's law

WhereasFourier'slawdesribesheatondutivityandFik'slawdesribesdif-

fusion,Dary'slawdesribesowinaporousmedium. Dary'slawwasformu-

lated by the Frenh engineer Henry Dary in 1856 [19℄.While experimenting

with waterleaning througha sandlter, Dary foundout that theltration

veloityis proportional to aombinationof the gradient of theuid pressure

andpull-downeetsdue tothegravity,meaningtheveloityis relatedtothe

pressureandgravityforesthroughtherelation:

v _i = − k i

µ i

K(∇p i − ρ i g)

^(2.3)

Where

p i

^is^the^phase^pressure^and

g

^is^the^gravity^vetor.In^Figure ^2.3^we^see

askethof the apparatusused byDary. He lled the largertube with sand,

(17)

twosmallertubesto therightandusedthistoformulatehis law.

SimilartoFourier'slaw,whihstatesthatheatowsfromwarmtooldplaes,

andFik'slaw,whihstatesthattheuxgoesfromregionsofhighonentration

tolowonentration,Dary'slawstatesthattheveloitygoesfromregionswith

highpressuretoregionswithlowpressure.

Dary's lawwill be used to onvert (2.1)into an equation only dependent on

thetotalveloity,aswellasbeingusedtoalulatethepressuredrivenveloity.

Assumingatwo-phase,inompressibleowwithequalpressure,whilenegleting

gravity,anddening

λ i = _µ ^k ⁱ

i

,and

λ t = λ n + λ w

^one^an^write^the^total^veloity

as:

v = −(λ n + λ w )K∇p

^(2.4)

andusethistoalulate thephaseveloitygiventhetotalveloityby:

v i = −λ i K∇p

= − λ i

λ t

Kλ t ∇p

= λ i

λ t

v

Puttingthisinto (2.1)givesusthenewequation:

∂ρ i φs i

∂t + ∇ · (ρ i

λ i

λ t

v) = q i

^(2.5)

Thiswillbedoneinmoredetail,withoutnegletingthegravity,in hapter 3.

2.3 The Vertial Equilibrium assumption

Fortheriskassessmentneededin CO

2

^storage,^fastsimulationsarevital. The vertialequilibriumassumption,inhydrologyknownastheDupuitassumption

[20℄,is an assumptionof equilibrium of ow in thevertialdiretion, andan

be used to simplify the problem and aelerate the simulation. Theassump-

tionhaslongtraditionsin reservoirengineering, andwasearly onextendedto

handle multi-phase ow [16, 36, 37℄. Equilibrium of vertial ow means that

theowupwardin thereservoirisequalto the owdownward. Oneommon

misoneption is that this equalsto no ow in the vertial diretion,but the

assumptionisatuallyequivalenttoanassumptionofinnitevertialow[16℄.

Usingthis assumption,weansolveforthedepth-averagedlateralow,trans-

formingthe3Dmodelintoa2Dmodel. Thisresultsintwobenetswithrespet

to performane: Lessdata to ompute anda loosertime step restrition[34℄.

Thederease in datato omputeonisthemostobvious,aswegofrom 3Dto

(18)

butomesfromthefatthatthetimestepoftenisrestritedbytheoarsening

ofthevertialdiretion. Thebenetsofusingthevertialequilibriumassump-

tion with respet to the timestep restritionhas beenexplored by Ligaarden

andNilsen [34℄.

Another aspet of the risk assessmentis to use reliable simulations, so we do

not introdue more errorsthan neessary. At rst glane, the removalof one

dimensioninthevertialequilibriumassumptionmayappeartointroduelarge

errorswithrespettothe3Dmodel,buttheerrorintroduedisoftenlessthan

the errors introdued by the oarse resolution needed to make the 3D model

omputationallytratable[34℄.Thismeansthat, inmanyases,amodelbased

on the vertial equilibrium assumption is both a fast and reliable simulation

tool,giventhevalidityoftheassumption.

As atypial aquifertargeted forCO

2

^injetion ^is^several ^kilometers ^wide,^but

only20-100metershigh, itseemsnaturalto assumeequilibriumin thevertial

diretion. Ithasbeenshownthatthevertialequilibriumassumptionisvalidif

thequantity

ω = K x /K z (∇h) ² ≪ 1

^[34℄,^whih ^will^be^valid^for^many^aquifers

exeptforasmall areaaroundtheinjetionwell.

IthasalreadybeenshownbyLigaardenandNilsen[34℄,thatonewillgaingood

performanebenets, with respetto the full 3Dmodel, byusing the vertial

equilibrium assumption on amodel forCO

2

^migration. ^F^urther ⁱⁿ ^this ^thesis

wewilltakeadvantageoftheparallelarhitetureoftheomputertoaelerate

thesimulationsfurther,bothusingthemulti-orearhitetureoftheCPU,and

theextremeparallelismof theGPU.

2.4 Parallel omputation

Theideaofparallelomputationisto splitthetaskathandintosmallertasks,

whihanbeperformedinparallelonseparateores,therebytakingadvantage

of all the ores of the mahine. Highly parallel tasks, suh as Monte-Carlo

simulations and expliit stenil omputation, an often perform many times

betteronhighly parallelarhitetures.

Overthelastvedeades,omputationalperformanehaveinreasedexponen-

tially, by an almost ontinuous inrease in lok frequeny, losely following

Moore'slaw[40℄.Afewyearsago,thistrendeasedduetophysialrestritions

ofwhatsilionhipsanwithstandwithregardstoheat,andnowperformane

inreasesthroughmultipleoresatlowerfrequeniesonthesamehip. Another

limitationenounteredwasthevonNeumannbottlenek,whihmeansthatthe

inreaseinbandwidthbetweentheCPUandthemainmemoryhavetobepro-

portionalto theperformane inrease,whihhasnothappened[7℄. Byadding

moreoresonthesamehip,oneangainbetter performane usingthesame

amountofenergy,ormoreperformaneonthesameenergybudget[45℄.Along

with moretheoretialperformane,youalsogainfastommuniationbetween

ores,asalloresresideonthesamehip,andtherebyshareon-hipmemory.

Even though standard CPUs are beoming inreasingly parallel, most of the

(19)

tetures. One ore on the CPU is marked, and the transistors dediated to

omputations are marked in the "Exeution Units" setion of the ore. The

green partof the Nvidia hip are the CUDA ores, whih eah have one FP

Unit and oneINT Unit whih are dediated for oat-point and integerarith-

metirespetively. Imagesfrom Intel[25℄ andNvidia[44℄.

(20)

CPUsandGPUs. ImagefromNvidia[44℄.

transistorson aCPU arestill used for non-omputationaltasks likelogi and

ahe. InFigure 2.4aCPUoreisompared toaGPUhip. Weseethat the

CPU ore has alot less transistors dediated to pure omputations than the

GPU.Usingthemulti-oreproessortogetherwithoneormorehighly parallel

aeleratorunits,liketheGPU,givesusaheterogeneousarhiteturewhihhas

proventobebeneialomparedto ahomogeneousarhiteturewhereaCPU

didalltheomputationalwork. Theaeleratorunitstypiallyusemanyores

runningatalowerfrequenythantheCPU,butmaximiseperformanegivena

xedpowerortransistorbudget[11℄.

2.5 General-Purpose omputation on Graphis

Proessing Units

TheuseoftheGPUasanaeleratorunitforproblems notdiretly relatedto

rendering,isalledGeneral-PurposeomputationonGraphisProessingUnits

(GPGPU). These kindof problemsinlude,butare notlimitedto: stenilal-

ulations,linearalgebra,audioproessing,andvideoproessing. Thereasonfor

usingtheGPUasanaeleratorunit,istheextremeparallelarhiteturewhih

is also needed for the main purpose of a GPU, 3D rendering. By transform-

ingthealgorithmsusedintoparallelalgorithms,oneantakeadvantageofthe

GPUtoaeleratetheproblem. Otherhyperboliproblems,suhastheshallow

waterequation, andsingle-phaseowofoilinaporousmedia, havepreviously

beenshowntobesuitableforGPUaeleration[10,12℄,so weexpetthatour

problemwill alsobesuitableforGPUaeleration.

Thetheoretial performane ofGPUs andCPUs areshownin Figure 2.5. We

learly see that the GPU will outperforma CPU if wean takeadvantageof

all thepoweronaGPU. Oneother importantobservationis that theGPU is

notaetedbythevonNeumannbottlenek[7℄,thisisbeauseoftheseparate

memory hanneleah multi-proessorhaveto themain memory,and that the

main memory is on the same hip as the proessors. The GPU is also an

inexpensiveaelerator unit,ifwe omparetheNvidia GeforeGTX480with

thefastestCPUontheWestmerearhiteture,IntelCorei7Extreme990X,the

(21)

respetively 1

.

Sine the GPU was originally made for 3D rendering, for whih there is no

needforhighpreision,thesedevieshaveuntilreentlyonlysupportedsingle-

preisionoating-pointnumbers. Thisisthereasonwhytherewasno support

forhigher preisionupuntil NvidiareleaseditsGT200 arhiteture June2008

[42℄. The main drawbak with this arhiteture was its double-preision per-

formane, whih was

1/4

^of ^the single-preision performane. Inmarh 2010, NvidiareleaseditsGF100arhiteture,bestknownasFermi, whih featureda

double-preisionperformaneof

1/2

^of ^thesingle-preision performane, along withafullyIEEE754-2008ompliantoating-pointimplementation[42℄.Look-

ing at Figure 2.5, we an see this sudden inreasein double-preision perfor-

manewhihame withthelatestrelease.

TheGPUarhitetureisdierentfromvendortovendor,andherewewillonly

explain the arhiteture of Nvidia GPUs, for the newest Nvidia arhiteture

youan useFigure 2.4 for referene. The newest Nvidia arhiteture onsists

of everything from 1 to 16 multiproessors, whih eah onsists of 32 ores.

Previousarhiteturesonsistedofupto30multiproessors,whiheahonsists

of8ores. TheseoresarethemainomputationalunitsontheGPU,onsisting

of oneoating point unit and oneinteger unit. The warpshedulers are the

mainontrolunits ofthemultiproessor,andtellseahorewhihinstrutions

to perform, as well as swith threads on the ores. Eah warp sheduler an

manage 16K registers, giving a theoretial 1024 threads for eah warp, but

the number of threads managed by one warp sheduler will derease as the

number of registers used inreases. Theoretially, the Fermi arhiteture an

holdupto32,768threads,whiletheGT200arhitetureanhold upto30,720

threads. The GT200 arhiteture also ontain one double preision unit on

eahmultiproessor,whiletheFermiarhitetureatuallyusestwooresasone

doublepreisionunit.

There are twowarpshedulers for eah multiproessor on Fermi, and one on

previous arhitetures. The newest arhiteture also onsists of four speial

funtion units(SFU)formathematialoperations,liketrigonometrifuntions

andsquareroots, wherethepreviousarhiteturesonsistedoftwosuhunits.

Lastlywehaveloadandstoreunits, whih read andwrites to globalmemory.

Thereare16suhunitsontheFermiarhiteture,whiletheseunitswasplaed

ontheoreonolderarhitetures.

SeveralAPIsonsists fortaking advantageoftheGPU asanaeleratorunit.

A fewyearsago,onehad tousethegraphis pipelinealongwitha3Drender-

ing API,like OpenGL and Diret3D,to getaess to the GPU. This requires

knowledgeof3Drendering,to solveproblemswhihmight havenothingto do

with 3D rendering. Several attempts for makingan APIto theGPU without

using 3Drendering APIs havebeen made [28, 51℄, but none of them reahed

thewantedpopularity. In2006Nvidia releaseditsown, platformindependent,

API, the Compute Unied Devie Arhiteture (CUDA) [44℄, and sine then

theinterestinGPGPU haveinreasedbothin theresearhandprivatesetor.

The main drawbakwith CUDA is the requirement of aNvidia GPU, and in

1

Priesfromhttp://www.komplett.noat2011-04-09

(22)

vendorindependentalternativetoCUDA[30℄.

Inthisthesiswewill useCUDAasourAPI, andtherearemainly tworeasons

for us not hoosing OpenCL instead. One of them is that we already have

a good knowledge of CUDA, and therefor do not have to learna brand new

API. Another, and more important reason, is that even though OpenCL is

vendorindependent,theperformaneisnotvendorindependent. OpenCLonly

guaranteesthattheprogramompilesandrunsondierentplatforms,but the

performanewillnotbethesame,eventhoughthearhitetureshavethesame

theoretial performane. In OpenCL one still haveto write platform spei

odetogainthesameperformane. Beauseofthis,webelievethatusingCUDA

asourAPIwillbebeneialinshowinghowourproblemanbeaeleratedby

usingtheGPU.

2.6 CUDA

In this setionwewill giveabrief introdution to the programmingmodel in

CUDA,and howtooptimizetheoderunningontheGPU[44℄.

2.6.1 Programming model

SinetheGPUhasnooperatingsystem,CUDAusestheCPUtoinitializedata,

opydatatotheGPU,andstarttheGPUalulations.

Hostand devies

IntheCUDAprogrammingmodel,theCPUontrolstheowofdata,whilethe

GPUonlyexeutessmallprogramswhentoldtobytheCPU.Thismaster-slave

relationship is also reeted in the jargon used in CUDA, where the CPU is

referredtoasthehostandtheGPUisthedevie. Itisalsopossibleforthehost

toontrolseveraldevies,butadevie anonlybeusedbyonehostatatime.

Kernels

Funtions exeuted on the GPU are referredto askernels. The kernelshave

aesstotheGPUmemoryareas,suh asglobalmemory,texturememoryand

onstantmemory,anddatamustbeopiedfromthehostmemorytothedevie

memory in order for the kernels to use the data. Whena kernelnishes the

results must be written bak to the main memory to be available for further

proessing,ortobeopiedbaktothehostmemory.

(23)

The interfaeprovided makesit easyto movedata betweenthe GPUand the

CPU,howeverthisisanexpensiveoperationanditisreommendedtokeepthe

numberofopyoperationstoandfromtheGPU ataminimum. Alternatively

oneoulduseasynhronousdatatransfers,andtherebybeabletoontinuethe

programowwhilethedataisopiedin thebakground.

2.6.2 Thread hierarhy

When starting a kernel, the multiproessor reates, manages, shedules and

exeutesthreadsin groupsof32 threads,alled awarp. Thesewarpsstartthe

sameprogram,butarefreeto branhandexeuteindependentlyofeahother.

Thewarpsarenotpartoftheprogramingmodelitself,butarevitaltobeaware

ofsinethebloksaremadeupofwarps,andeahblokishandledbyonewarp

sheduler, meaningthe numberof warps in ablokis equal to thesize of the

blok divided by

32

^. Êah ^thread ⁱⁿ â ^blok ^have îtsôwn ^thread îd, ^starting

with

0

ⁱⁿ ^the^rst ^warp. Âll^threads ⁱⁿâ^blok^haveâess^to ^the^same^shared

memoryspae, whih anbeusedto synhronizeexeutionsandommuniate

betweenthreads. Asthethreadsinablokhavetheirownthreadid,thebloks

inagridalsohavetheirownblokid. Theseidsareoftenusedtondthedata

athreadshouldworkwith.

The bloks form thegrid, whih basially only ontainsthe dimensionsin x,y

and zdiretionof thedata,as oneanorderthebloksandthreads in botha

1D,2D,and3Dordering.

Along with the thread hierarhy, there is also a orresponding memory hier-

arhy. The memoryspaes are physial spaes, andreside oneither the GPU

hip,themultiproessor,orontheomputationaloresthemselves. Thismeans,

threads ondierentmultiproessors,annotusethememoryonthemultipro-

essors to ommuniate. There are three main memory spaes on the GPU,

loalmemory,sharedmemoryandglobalmemory. Theloalmemoryresideon

the omputational ores, and no ommuniationan bedone in this memory

spae. Onewould atually tryto not use this memory spae at all, and only

usetheregistersas,these arealotfaster thanusingtheloal memoryspae.

Themainmemoryusedforommuniationisthesharedmemory. Thismemory

isonaper-blokbasis,soeahblokanusethismemoryspaetoommuniate

internally and synhronize the data if needed. Theshared memory is ahed

andresidesonthemultiproess,whihmakesitalmostasfastasusingtheloal

memoryspae.

Thelastmainmemoryspaeistheglobalmemory. Thisisthememoryspaethe

whole gridanaess,anditanbeusedto ommuniate betweenallthreads;

however, this memory spae is slow and it is reommended to minimize read

andwrite operationsin theglobalmemoryspae.

Figure 2.6showshowthe threads belong toablok, whih againbelongsto a

grid. Italsoshowstheorrespondingmemoryhierarhywhereathreadaesses

the per-thread loal memory, eah thread in ablok anaess the per-blok

(24)

[44℄

shared memory, and everythread anaess the global memory. We also see

that twodierent gridsanaess theglobal memoryat the sametime. This

is something that rst ame with the Fermi arhiteture, whih allowed for

exeutionoftwokernelsatthesametime. OnGPUsreleasedbeforeFermi,this

wasnotpossible,andonlyonegridouldaess theglobalmemoryat atime.

Aonsequeneofthisisthat onehavetobearefulonthenewerarhiteture,

so two gridsdo not work on the same set of data in global memory, asthat

ouldleadtoorruptdata,aswellraeonditionsthatinhibitperformane.

2.6.3 Performane optimization

Even though a straightforward GPU implementation of highly parallel prob-

lems often will givegoodperformane gains byitself, the performaneanbe

inreasedfurtherbyavoidingextensivebranhing,memoryfething,andmem-

oryopies[43℄.

Branhing

Eah warp started by the multiproessor runs the same instrutions simulta-

neously. Sowhen one,orseveral, threads divergefrom therest ofthethreads

beauseof somedata-dependentbranhing,therest ofthethreads inthewarp

(25)

willaetthethroughputofdatainanegativeway,andonehavetomakesure

toavoiddatadependentbranhingasmuh aspossible.

As the warp size is 32, whih is the number of ores on a multiproessor, it

will bebeneial to keep thesize of thebloksamultiple of32. This wayall

theoresonamultiproessoranworkwith thesameblokat thesametime,

makingsurenothreads areleft idle.

Avoiding branhes is not neessarilyan easy task, as thebranhesdepend on

the data. What often is done is to alulate the outome of all the possible

branhes,and thendeidewhihshould beusedforthisthread. Thisway,one

will stillhaveabranh attheend, but thisis shortand youwill minimizethe

numberofylestheotherthreadswillhavetowait.Onsomeproblemsonean

atuallyremovebranhingalltogetherbyusingmax/minoperationsaswellas

bitoperationsinstead.

Memoryfething

OneoftheslowestoperationsontheGPUitselfisreadingandwritingtoglobal

memory, sine the global memory is not ahed. There are, however, ahed

memoryspaesthatanbeused: Constantandtexture. Bothofthesememory

spaes are read-only, whih meansyouwill haveto aess the global memory

spae when you are storing you result, but for reading it is preferred to use

eitheronstantortexturememory.

It is also important to do oalesed memory reads and writes. This means

that you align the data so the data needed for one blok lays onseutivein

memory. By doing this, you an read more data at the same time, up to 64

bytes at a time on arhitetures before Fermi and up to 128 bytes on Fermi

arhitetures,drastiallyreduingthenumberofmemoryfethesandinreasing

theperformane.

Anothertrikoneandotoinreasespeed,istowritebaktoglobalmemoryas

soonasyouhavealulatedthedata. SinethereadandwritesontheGPUare

asynhronous,thewritebaktoglobalmemorywillhappeninthebakground

whilethethreadsareontinuingtheiromputations.

Memoryopies

As already stated, oneshould minimize the numberof memoryopies. What

this meansisthat datathat neverhanges duringexeution, should beopied

over when starting the appliation, and never again. This gives an overhead

when starting the simulations, but minimizes the number of opy operations

during thesimulations. As anexample,weopythe whole grid,wells, et. to

the GPU when starting the simulation and only swap the new ux and well

dataduring thesimulations.

(26)

Even thoughCUDA is madeto inreaseperformane, the rules of ode read-

abilityand performane optimizationthat exists forprogrammers, should still

apply when writing GPU aelerated programs. It is, as with any other pro-

grams, highly unlikely that no one will ever touh your ode again, so it is

importantthatyouwriteodethatispossibleforotherstoread. Withregards

toperformaneoptimizationitisstated: Weshouldforgetaboutsmalleien-

ies,sayabout97%ofthetime: premature optimizationistherootofallevil.

Knuth[32,page268℄.

This does not mean optimization should not be done at all, but wait until

youhaveaworking solutionbefore startingtheoptimization. Anotherbenet

of waiting with optimizing, is that it is easy to optimize smarter. You an

analyse theprogram by runningit throughaproler, and then onlyoptimize

the bottleneks of your ode, instead of randomly optimizing the parts you

believeare theslowest. Further it meansthat youin 97%of theasesshould

negletsmall inreasesin performane, whihleaves3%where youshould not

neglet small inreases in performane. Sine wewish to run the simulations

several times, a speedup of 2% will give us one extra simulation every 50th

simulation,whihin thelongrun willgrowand makeitpossibleto workmore

eiently.

(27)

Mathematial models and

numerial simulations

In this hapter we will introdue the full 3D model of our problem. Wewill

then use the vertial equilibrium assumption and integrate the model into a

orresponding 2D equation before we derive a numerial sheme of the new

model, using anite volumemethod. Next, we introdue theCFL ondition,

beforedisussing dierentalgorithmsforimplementingthesheme. Lastly, we

willdisussinmoredetailthehoiesmadetoinreaseperformaneofourGPU

solver.

3.1 The Vertial Equilibrium model

Toderiveourmodel,westartwiththebasitransportequationfromEquation

(2.1), along with Dary's law for onephase ow from Equation (2.3). Inour

problem weareonlyinterestedintwophases,CO

2

^and^brine. ^We^also^neglet

bothrokanduidompressibility,aswellasassumeasharpinterfaebetween

CO

2

ând^the ^brine. ^The^reason^weânâssume^that ^CO

2

^is inompressible, is that CO

2

îsâtuallyⁱⁿâ^liquid^stateât^the^depths^weâreînterestedîn.

Assuming a two-phase, inompressible ow with equal phase pressures, and

dening

λ i = _µ ^k ⁱ _i K

^,

λ t = λ co 2 + λ w

^, ^and

f (S) = ^λ _λ ^co _t ²

^, ^we ^an^write ^the ^total

veloityas:

v = −λ t ∇p + (λ co 2 ρ co 2 + λ w ρ w ) g

^(3.1)

whihgivesusanexpressionfortheveloityofCO

2

^,^given^the^total^veloity:

(28)

Figure3.1: IllustrationoftheCO

2

^plumeâsâssumed^by^the^VE^model. Îmage

fromLigaardenandNilsen [34℄.

v _co ₂ = −λ co 2 (∇p − ρ co 2 g)

= −f (s)(λ t ∇p − λ t ρ co 2 g)

= f (s)(−λ t ∇p + λ t ρ co 2 g)

= f (s)(v + λ w ρ w g − λ w ρ co 2 g)

= f (s)( v − λ w ∆ρ g )

where

∆ρ = ρ co 2 − ρ w

^. ^By^adding^and ^subtrating

λ co 2 ρ w g

ⁱⁿ ^Equation^(3.1),

andletting

S

^be^the^saturation^of^CO

2

^,^weân^rewriteôur^systemînto:

∂φS

∂t + ∇ · f (S)(v + λ w (S)∆ρg) = q co 2

v = −λ t (∇p − (f (S)ρ co 2 + (1 − f (S))ρ w ) g )

∇ · v = q tot

(3.2)

whih istheproblem we areinterestedin. Our fouswill beonimplementing

theontinuityequation,while Dary'slawwillbeusedtoupdate thepressure.

The next stepis to vertially integrate Equation(3.2). Todo this, we dene

theaveragedsaturation

s = h/H

^to^be^the^relative^height^of^the^CO

2

^plume^as

illustratedinFigure3.1. Thus

s

îsâ^funtionôf ^time

t

^and^spatial^oordinates

x

^of^the^reservoir.

Thevertialequilibriumassumptionanbeusedinseveraldierentways;where

themoststraightforwardisavertialaveragingoftherokparameters. Aswe

ansee from Figure 3.1, the CO

2

ônly^takes ûpâ^small ^portion ôf ^the ^height

of theaquifer. This mightresultin largeerrorswhere thepermeabilityis not

homogeneous in the vertial diretion, whih easily an be seenby using the

Johansen formation, where the permeability inreases towards the top of the

formation. Thismeansowwouldgotooslowbyonlyaveragingthepermeabil-

ity. Byslightlymodifyingtheassumptionto avertialequilibriumin theCO

2

(29)

lledpartoftheaquifer;weaninsteadintegratethepermeabilityintherange

0

to

h

^,^whih^is^thepermeabilityatuallyinueningtheCO

2

^ow. ^The^same^ar-

gumentholdsfortheporosity,butsinetheporosityisrelativelyhomogeneous,

theerrorwill notbeaslarge. It hasbeenshownbyLigaardenand Nilsen [34℄

that integrating the permeability will givea better approximation than aver-

agingthepermeability. Itwillalso requireus toalulate thepermeabilityfor

everytimestep,meaningweinreasetheomputationalostsome.

3.1.1 Vertial Integration

Forthe vertial integration, we haveto dene avertially integrated porosity

andintegratedpseudomobilities[34℄.Theintegratedporosityisgivenby:

Φ(s, x) =

Z sH(x) 0

φ(z, x) dz

^(3.3)

whiletheintegratedpseudomobilitiesaregivenby:

λ ˜ co 2 (s, x) =

Z sH(x) 0

k co 2 (1) µ co 2

K || (z, x) dz, λ ˜ w (s, x) = Z H(x)

sH (x)

k w (1) µ w

K || (z, x) dz

(3.4)

Thesepseudomobilitiesgivetheorrespondingfrationalowfuntions:

f ˜ (s, x ) = ˜ λ co 2 (s, x )

λ ˜ co 2 (s, x) + ˜ λ w (s, x) , f ˜ g (s, x ) = ˜ λ w (s, x ) ˜ f (s, x )

^(3.5)

LastlyourVE equivalentofthe apillaryterms reads,when disregardingap-

illary fores in theunderlying model,

p c (s, x) = H (x)g ⊥ ∆ρs

^, ^where

g ⊥

^is ^the

gravity omponent perpendiular to thesurfae. This givesus thenew, verti-

allyintegratedmodel:

∂Φ(s, x)

∂t + ∇ || · h

( ˜ f (s, x)v ^ve + ˜ f g (s, x)[g || (x) + ∇p c (s, x)] i

= q co 2 (x, t)

∇ || · v ^ve = q tot (x, t) v ^ve = − ˜ λ t (s, x) h

∇ || p t −

f ˜ (s, x)ρ co 2 + (1 − f(s, ˜ x))ρ w

g _|| (x) i

−

˜ λ w (s, x)∇ || p c (s, x)

(3.6)

Toonsider residualtrapping, the evaluation of the relative permeabilities in-

volvestheresidualsaturations

S rw

^and

S rco 2

^for^brine ^and^CO

2

^. ^Then

k co 2

^is

evaluated at

1 − S rw

^, ^while

k w

îs êvâluated ât

1 − S rco 2

^where ^the ^historial

(30)

tion. Moreover

Φ(s, x )

^is ^multiplied^by

1 − (S rw + S rco 2 )

^where ^the^historial

maximumislargerthantheurrentaveragedsaturation,and

1 −S rw

^elsewhere.

If theporosity is onstant in thevertialdiretion,weansimplify

Φ(s, x)

^to

φ(x) · sH(x)

^. ^Using^this,^the^transport^equation^takes^the^simpler^form:

φ( x ) ∂H( x )s

∂t + ∇ · h

f ˜ (s, x ) v ^ve + ˜ f g (s, x )[ g _|| ( x ) + ∇p c (s, x )] i

= q co 2 ( x , t)

^(3.7)

Tosimplify notationsin thenextsetion,wedenethetotaluxas:

f (s, x) := ˜ f(s, x)v ^ve + ˜ f g (s, x)[g || (x) + ∇p c (s, x)]

3.2 The Finite Volume Method

Inthissetionwewillusethenitevolumemethodtoderiveashemewhihan

beusedtosolveourmodelprobleminEquation(3.7). Thismeanspartitioning

the model into ells, and integrating overthese ells. Looking at ell

C i,j

^we

get:

Z

C i,j

φ(x) ∂H(x)s(x, t)

∂t dx +

Z

C i,j

∇ · f (s, x) dx = Z

C i,j

q co 2 (x, t) dx

^(3.8)

If weassumethe sizeof theellstends towardszero,weanalsoassumethat

theporosityisonstantinsidetheell.

φ i,j

Z

C i,j

∂H(x)s(x, t)

∂t dx +

Z

C i,j

∇ · f (s, x) dx = Z

C i,j

q co 2 (x, t) dx

^(3.9)

IntegratingEquation(3.9)intime,wegetanexpressionforthenexttimestep:

φ i,j

Z

C i,j

H(x)(s(x, t n+1 )) − s(x, t n )) dx + Z t n+1

t n

Z

C i,j

∇ · f (s, x) dx dt

= Z t n+1

t n

Z

C i,j

q co 2 (x, t) dx dt

(3.10)

Theintegrationoftheuxin spae,anbewritten as:

(31)

Z

C i,j

∇ · f (s, x) dx =

Z x i+1/2

x i − 1/2

f (s, x, y j+1/2 ) − f(s, x, y j−1/2 ) dx +

Z y j+1/2

y j − 1/2

f (s, x i+1/2 , y) − f (s, x i−1/2 , y) dy

(3.11)

Nowwesimplifytheequationsfurther,bydeningtheaveragedsaturationina

ell

S i,j

^,^the^fae^uxes

F

^and

G

^,âs^wellâs^the^total^volume^rateôf^soureⁱⁿâ

ell

Q

^by:

S _i,j ⁿ = 1

|C i,j | Z

C i,j

H (x)s(x, t n ) dx

F _i,j ⁿ = 1

|C i,j | Z t n+1

t n

Z x i+1/2

x _i− _1/2

f (s, x, y j ) dx

G _i,j ⁿ = 1

|C i,j | Z t n+1

t n

Z y j+1/2

y j − 1/2

f (s, x i , y) dy

Q ⁿ _i,j = 1

|C i,j | Z t n+1

t n

Z

C i,j

q co 2 ( x , t) d x

Thesedenitions simplifyournumerialshemeinto:

S _i,j ⁿ⁺¹ = S _i,j ⁿ − 1 φ i,j

h F _i,j+1/2 ⁿ − F _i,j−1/2 ⁿ + G _i+1/2,j ⁿ − G _i−1/2,j ⁿ − Q ⁿ _i,j i

(3.12)

Asweanseefromtheequation,thehangeinellsaturationonlydependson

thesoure inside theellitselfand theuxesovertheellfaes. Thisis, asit

shouldbe,adisreteequivalenttoEquation(3.7),whihstatesthatthehange

at apointin thereservoironlydependsonthesouresatthatpointaswellas

ontheuxesintoandoutofthatpoint.

Up tothispoint,noapproximationexeptfordisretizationofthedomainhas

beendone,whihmeansthatthisequationisexatasthesizeoftheellstends

towardszero. Fromhereonout,however,wewill havetoapproximatethefae

uxesaswellasthetimeintegralsto getafullyexpliitsheme.

Therststepinmakinganexpliitshemeistoapproximatethetimeintegrals

bythevalueat theprevioustimestep. Thismeansusing:

Z t n+1

t n

f (t) dt ≈ ∆tf (t n )

Alternatively we ould have used the same time step as we are solving for,

whih would have made an impliit sheme, or a Gaussian quadrature rule

to approximate the integral,whih would make aRunge-Kutta time-stepping

(32)

Raphson method to solve the orresponding disrete nonlinear system, whih

is notthat easy to transforminto aparallelalgorithm, this isthe reasonwhy

we stik to an expliit sheme. In reent publiations, however, the impliit

sheme hasbeenusedwith suesswheretheGPUwasusedbothtoalulate

thepreonditionerandsolvethelinearsystem[5,31℄.

Ourhangeoverafaeisgivenby:

F _i,j−1/2 ⁿ =

Z x i+1/2

x i − 1/2

f ˜ i,j−1/2

h

v ^ve _i,j−1/2 + ˜ λ ^w _i,j−1/2 (g || i,j − 1/2 + ∇p c i,j− 1/2 ) i

(3.13)

Sinetheveloityandgravityarealulatedonaperfaebasis,theproblemlay

inndinganapproximationtothefrationalowrateandthepseudomobilities.

Sine thefrational owrate isexpressed from thepseudomobilities, weonly

havetondanexpressionforthepseudomobilitiesoverafae.

3.2.1 The Upstream Mobility Weighting Sheme

Theupstreammobilityweightingshemeassumesthattheowoveraellonly

travelsin onediretion,and themobilityoverafaeis onlydependent onthe

mobilitiesoverthetwoneighboringells. Wealsoassumeapieewiseonstant

saturationovereahell. Thisgivesthefaepseudomobilities:

˜ λ ^co _i,j−1/2 ² =



 

 

˜ λ ^co _i,j ²

if

(v ^ve > 0

^and

(g || + ∇p c ) > 0)

^or

( v ^ve − λ ˜ ^w _i,j−1/2 ( g _|| + ∇p c ) > 0)

˜ λ ^co _i,j−1 ²

^else

(3.14)

˜ λ ^w _i,j−1/2 =



 

 

˜ λ ^w _i,j

if

(v ^ve > 0

^and

(g || + ∇p c ) > 0)

^or

(v ^ve − λ ˜ ^co _i,j−1/2 ² (g || + ∇p c ) > 0)

˜ λ ^w _i,j−1

^else

(3.15)

This alsoholdsfor

λ ˜ i,j+1/2

^,

λ ˜ i−1/2,j

^and

˜ λ i+1/2,j

ând^will ^beûsed^toâlulate

thefrationalowarossthefae.

3.3 The CFL ondition

The CFL ondition is a neessary ondition whih must be satised for the

numerialmethodtobestable,andtherebyonvergetothesolutionasthegrid

isrened.

CFL Condition [33℄: A numerial method an be onvergent only if its nu-

merial domain of dependene ontains the true domain of dependene of the

PDE, atleastinthe limitas

∆t

^and

∆x

^go ^to^zero.

UNIVERSITYOFOSLODepartmentofInformatics VerticallyintegratedmodelsofCO migration:GPUacceleratedsimulations MartinErtsås May6,2011