UNIVERSITY OF OSLO Department of Informatics
Vertically
integrated models of CO 2 migration:
GPU accelerated simulations
Martin Ertsås
May 6, 2011
StorageofCO
2
ingeologialformations,suhasoilandgasreservoirs,isonsid- eredanimportmeanstoredueemissionsofCO2
intotheatmosphere. Aurate modelingoftheCO2
migrationisanimportanttooltoanalysetheriskofleakageinpotentialinjetionsites. Toaountforunertaintiesinthegeologialmodel
weneedto runthesimulationsseveral times, withhanges in theparameters,
for therisk analysis. Alongwith several simulationswealso needlonger time
salesthanonenormallyhaswhensimulationowinporousmedium,sinethe
CO
2
should stay underground a lot longer than the time it takes to extrat fossilfuels. Beauseofthis,wedonotonlyrequireanauratemodel,butalsoafast model forsimulation. Usingthe vertial equilibrium assumption, anda
vertiallyintegratedmodel, hasbeenshown togivegood performanebenets
withrespettothefull3Dmodelsusedinthepetroleumindustrytoday,aswell
asbeinganauratemodel.
InthisthesiswewillinvestigatehowtheGPUanbeusedasanaeleratorunit
forsuhvertiallyintegratedmodels. Wewill omparetheperformanegained
from using aGPU aeleratedsolverand amulti oresolver,with respetto
the performane from a serial solver. Fromthis wewill demonstrate that the
GPUisagoodaeleratorunitforthese model.
The solverswill be demonstrated to sale well both on simple grids with no
faults, as well as on real world data with faults and geologialtraps for the
CO
2
. Lastly we will ompare the errorobtained onthe GPU by using singlepreision oating point numbers instead of the double preision used on the
CPU,and showthat thiserrorisnegligible.
This master thesis has been written at SINTEF ICT as part of the MAT-
LABReservoirSimulationToolboxprojet[38℄.Itorrespondstoapproximately
twelvemonthsworkoveraperiodofoneandahalfyear,withinreasingfous
onthisthesiseveryhalf year. Theworkpresentedhereisarriedoutindividu-
ally,andbuildsupontheworkdonebyLigaardenandNilsen[34℄ onvertially
integratedmodelsforCO
2
migration.Toworkwithsuhaprojetoveralongertimesalehasbeenagreatexperiene,
andhasbroughtmemanylessons,bothpersonallyandprofessionally,that will
help me further in life. The thesis has resultedin long days and nights with
littlesleep,butIfeelproud oftheresultpresentedintheend.
The omputerodesdevelopedhavebeentested torun on threedierentma-
hines running dierent GNU/Linux distros (CentOS 5.5, Ubuntu 10.04 and
ArhLinux)using g,aswellasonmahinesrunning32and 64bitWindows
7ompiledwithVisual Studio2005. Everytesthavebeendoneusing aMAT-
LABinterfae,alled fromMATLABR2010a. Asaonsequenetheomputer
odesare onlyveried to work usingMATLAB R2010a,but should alsowork
onnewerMATLABreleases,andpossiblysomeolder.
Tothe reader: Thisthesishasbeensplit intoseveralhapters,whihshould
be possible to read by themselves. However it is reommended to read the
thesis sequentially, to get agrasp of theonept most eiently. I havetried
to giveathoroughexplanationasto whyweuseavertiallyintegratedmodel,
but reommend readersto alsoread Ligaardenand Nilsen [34℄ to getabetter
understanding,asthisthesisbuildsuponthat paper.
Aknowledgements: First of all, I wish to thank my supervisors at SIN-
TEFICT,Knut-AndreasLieandJosteinR.Natvig,whohasprovidedmewith
guidane and advie whenever I needed it, I also appreiate all the red pens
Knut-Andreashas saried to make mythesis what it is today. Myinternal
supervisor, MartinReimers, alsodeservesto bethanked foralwaystaking the
time to sit down with me when needed. For providing me with a well writ-
ten bakground paper, aswell as help spei to CO
2
migration,I also haveto thankHalvorMøllNilsenand Ingeborg Ligaarden. This thesiswould never
have been possible in its urrentstate without your help. A big thanks also
goestoSINTEFICTforprovidingmewithagreatworkingenvironment,both
withregardstohardwareandpeople, andtheUniversityofOsloforholdingfar
toomanyinterestingoursestotinto veyears,aswellasprovidingmewith
I wish to thank my friends for keeping me motivated, as well as helping me
during mywhole studiesat UiO.I willespeiallythankmybelovedMarenfor
keepingme saneduring mymoststressfulperiods, andmakingsureI alsohad
alifeonthesideofmymasterthesis.
Lastly,Iwishtothankmyfamilyforallthelove,supportandinterestyouhave
shown. A speialthanksgoestomymomanddad whoalwayshavesupported
meinmydeisionsandhelpedmeoutwhenneeded. Iwouldnothavebeenthe
personIamtodayhaditnotbeenforyou.
Prefae 1
1 Introdution 5
1.1 Introdution. . . 5
1.2 Preisionandauray . . . 7
1.2.1 Unitofleastpreision . . . 7
1.3 Researhquestions . . . 8
1.4 Notation. . . 9
2 Bakground 10 2.1 CO
2
CaptureandStorage . . . . . . . . . . . . . . . . . . . . . . 102.1.1 Trappingmehanisms . . . 12
2.2 ReservoirBasis . . . 12
2.2.1 Parameters . . . 13
2.2.2 Massonservation . . . 13
2.3 TheVertialEquilibriumassumption . . . 15
2.4 Parallelomputation . . . 16
2.5 General-PurposeomputationonGraphisProessingUnits . . . 18
2.6 CUDA . . . 20
2.6.1 Programmingmodel . . . 20
2.6.2 Threadhierarhy . . . 21
2.6.3 Performaneoptimization . . . 22
3 Mathematialmodelsand numerialsimulations 25 3.1 TheVertialEquilibriummodel . . . 25
3.2 TheFiniteVolumeMethod . . . 28
3.2.1 TheUpstreamMobilityWeightingSheme . . . 30
3.3 TheCFLondition . . . 30
3.4 Implementation . . . 31
3.4.1 Algorithm . . . 31
3.4.2 GPUoptimizations. . . 34
4 Resultsand Disussion 36 4.1 Hardwareandtesting . . . 36
4.1.1 Test setup. . . 36
4.1.2 Note . . . 37
4.2 Idealizedgrids . . . 39
4.2.1 Horizontalresolution . . . 39
4.3 Vertialresolution . . . 40
4.4 Synthetitestsuite. . . 40
4.5 Auray . . . 43
4.6 Migrationpressure . . . 45
4.6.1 Auray . . . 45
4.6.2 Performane . . . 49
4.7 Hardwareutilization . . . 49
5 Conlusion 51 5.1 SummaryofResults . . . 51
5.1.1 Performanebenets . . . 52
5.1.2 Auray . . . 52
5.2 Furtherwork . . . 52
5.2.1 MultipleGPUs . . . 53
5.2.2 Higher-ordersolvers . . . 53
5.2.3 Adaptivity . . . 53
Introdution
In lateryears, publiawarenessonlimate hanges hasinreased, and several
tehnologieshavebeenproposed to help stopthe,possiblyhumanintrodued,
limatehanges. OneofthesetehnologiesisCO
2
CaptureandStorage,whihshould help redue CO
2
emissions to the atmosphere. For seure storage of CO2
it isimportant withthorough riskanalysis, to minimizetherisk of leak-agefrom thestoragesite, whih meansthat simulationsare important forthe
implementationofCO
2
storage.Inthisthesis, westudytheuseofaGPUasan aeleratorunit forsimulating
CO
2
migrationinaporousmedium. WewillompareGPUaeleratedsimula-tionsofavertiallyintegratedmodeltosingle-andmulti-oreCPUsimulations.
1.1 Introdution
Today, weare releasing inreasing amounts of CO
2
into the atmosphere, and while the CO2
onentration has risen signiantly sine the Industrial Rev- olution, the temperature on earth has also risen slightly. There is reason tobelievethatthesetwophenomenaarelinked,andin1992theinternationalon-
ern about limate hange led to the United Nations Framework Convention
onClimateChange. Theultimateobjetiveofthisonventionisthestabiliza-
tionofgreenhousegasonentrationintheatmosphereatalevelthatprevents
dangerousanthropogeniinterferenewiththelimate system [47℄.
One of the tehniquesonsidered to help redue emissions to the atmosphere
is CO
2
Capture andStorage (CCS).Thistehniqueonsistsof apturing CO2
duringtheombustionandextrationproesses,transportingtheapturedCO
2
toaninjetionsite,andinjetingitintothesubsurfae. TheUNhaveonluded
that CCS is an important step to redue CO
2
emissions, aswell asbeing aneonomiallyfeasiblesolution[47℄.Theyalso pointoutthat there areseurity
risks onehave to take into onsideration, likeleakagesfrom the injetion site
and raks made from the inrease in pressure. This alls for thorough risk
analysis,whereomputersimulationsplayahugerole. Thefousofthisthesis
will beon CO
2
storage, and morespeially simulating themigration of theinjetedCO
2
.CO
2
storagemeans, forourproblem, injetingCO2
into ageologialrokfor-mation,likeoilandgasreservoirsorsalineaquifers. Thetehnologyrequiredis
thesameasusedwheninjetingwaterorgasforenhanedoilandgasreovery,
whih means that the problem is alreadywellunderstood and the tehnology
present[4℄. CO
2
storagehas also been deployed in large-saleommerial ap-pliations,aswellassmallerresearh projets. Todate, thereexists fourlarge
failities for CO
2
storage: Weyburn-Midale, Sleipner, Snøhvit, and In Salah,whih ombinedinjet
∼
6.4 million tonnes of CO2
every year [8, 13, 46, 49℄.Anotherformation onsideredfor CO
2
storage istheJohansen formationout-side of Norway, whih an sustain a yearly injetion of
∼
3 million tonnes ofCO
2
[2℄.Clearly,theamountofCO2
ventedintotheatmospherewillbegreatlyreduedbyimplementingCO
2
storagein largesales. Oneexampleoftheim-patobtainedbyCCSisanonshoreeldoutsideofLongyearbyenonSvalbard.
ThiseldisakeypartinthegoaltomakeLongyearbyentheworld'srstCO
2
neutral ity by 2025, where CO
2
will be aptured from the oal-fueled powerplantandinjetedintothisformation[48,50℄.
AlternativestoinjetingCO
2
intorokformationsasdesribed,istoinjetCO2
intooalseamsandintotheoeanatdepthsgreaterthan1,000m. Bothofthese
tehniquesarestillinaresearhphase,andwillnotbedisussedfurtherinthis
thesis[4,9℄.
Reservoir simulations have been apart of oil-reservoirmanagement for more
than fty years[1℄, andthese kind ofsimulationsanbe usedto simulate mi-
gration of CO
2
in oil and gas reservoirs. The biggest dierene between the reservoirsimulationsdonein oil andgas reovery, andsimulating CO2
migra-tion,isthedierenttimesales. WhensimulatingCO
2
migrationitisimportanttomakesurethereisnoleakages,andonehavetosimulateuntilalmostallthe
CO
2
is trapped, whih might take several thousand years. When simulatingforoilandgasreoveryontheotherhand,oneseldomsimulatesformorethan
one entury, This means we need both fast and reliable simulations for CO
2
migrationproblems.
Modelsbasedonthevertialequilibrium(VE)assumptionhavelongtraditions
fordesribingowin porousmedia, andwasearlyonextended tohandletwo-
phaseandthree-phaseow[16,36,37℄.Asomputationalpowerhasinreased,
modelsbasedontheVE assumptionhavebeenlessused, andonehaveinstead
used the full 3D model. Reently, there have been a renewed interest in VE
based models for fast simulation of CO
2
migration, when an assumption of asharpinterfaebetweenCO
2
andbrine maybereasonable[14, 22,23,34,35℄.The usageof graphisproessing units (GPUs) asaeleratorunits for hyper-
bolipartialdierentialequations,forwhihexpliitshemesarefeasible,have
provento givegood performane benetsoveraCPU onlyimplementation[6,
10,12℄.TherehasalsobeendonesomeworkonGPUaeleratingimpliittrans-
port,whihhaveproventogivegoodperformanebenetsbothwithrespetto
omputingthepreonditionerandsolvingthearisinglinearsystem[5,31℄,but
not asgood as for the expliitase. It is reason to believe that the equation
that arisesin ourproblem, willbewell suitedforexpliittransport, aswellas
Furtherinthisthesis,wewillusethevertialequilibriumassumptiontomakea
fastsimulationmodel. Thearisingmodelwillbedisretizedandimplementedon
theCPUforbothserialandparallelexeution,togetherwithaGPUaelerated
implementation. The main goalis to see howsuited this problem is for GPU
aeleration,andhowtheusageofsingle-preisionoating-pointnumbersaet
theresults. ThethesisisbasedonapaperbyLigaardenandNilsen[34℄,where
theyexploredthenumerialaspetsofusingthevertialequilibriumassumption
for simulatingCO
2
sequestration. Theybelievethat makingGPU aelerated solversforthisproblemwillgivethepossibilityoflarge-saledesktopsimulationsofVE modelsonoarsegrids.
1.2 Preision and auray
Thewordspreision and auray willbekeywordsinparts ofthis thesis,we
willthereforestatetheirdenitionhere. Inthedomainofomputing,Websters
ditionarydenes:
Auray: Howlose tothe real value ameasurement is.
Preision: Thenumberof deimalplaes towhih anumberisomputed.
Inthis thesiswewill taitlyassumethat thedoublepreisionvalueisthe real
answer,andusethewordauraytodesribehowlosetothedoublepreision
answeroursinglepreisionsolutionis. Thexedpreisiononaomputermeans
that we only have a nite set of numbers to represent the innite set of real
numbers. Further thismeansthat the lowerthepreisionis, themorelimited
the nite set is, and so the set made up of the singlepreision oating point
numbersisalot smallerthanthe setmade upofthedouble preisionoating
pointnumbers. Clearlytheaurayishighlydependentonthepreision,and
we do not expet our single preision answer to be the same as the double
preisionanswer.
1.2.1 Unit of least preision
Unitofleastpreision(ULP),isameasureofpreisioninnumerialalulations.
Itsdenition wasoriginallydened as:
ULP [29℄:ULP(x)isthegapbetween thetwooating-pointnumbersnearestx,
even ifxisone ofthem.
Bytheintrodution oftheIEEE 754oating-pointstandard,the denition of
ULPhadtohange tohandleNaNandInf. Thenewdenition was:
ULP [29℄: ULP(x) is the gap between the two nite oating-point numbers
nearestx,evenif xisone ofthem. (ButULP(NaN)isNaN.)
TheIEEE754oating-pointstandardrequiresthattheresultsofanelementary
arithmetioperation(addition,subtration,multipliation,division,andsquare
roots)shouldbewithing0.5ULPofthemathematiallyexatanswer,meaning
Eahtimewedoasinglepreisionarithmetioperation,wewillgainaround-o
error with respet to the double preision,due to the ULP. The errorwill be
small for eah operation, but as the number of operations inrease this error
will be aumulated, and thereby grow. This means we mightend upwith a
largeerror,eventhoughtheerrorintroduedforeahoperationissmall.
1.3 Researh questions
Thegoalofthisthesisis togiveompleteandpreise answersto thefollowing
questions. Thequestionswillbesummedupin Chapter5.
i. Howgoodanasimulationusingthevertial equilibriumassumptionutilize
the GPUhardware?
ii. Howdoesthe performane sale onthe GPU withrespettothe horizontal
andvertialresolution?
iii. How does the performane sale on the GPU with respet to the inlusion
of faultsaswell asvaryinggeometry inthe reservoir?
iv. Aretheresultsobtainedusingsinglepreision ontheGPUaurateenough
for pratial purposes?
The rst question will be answeredby looking at the oupany of the GPU
during exeution. Theseond andthird questionwill beansweredbyrunning
severaltest-asesusingbothidealizedgridsandgridswithrealistifeatures,and
ompared to botha serial and amulti-ore CPU solution. The last question
will beansweredbyomparing thesingle-preisionGPU solutionto adouble-
preisionmulti-oresolution,usingagridoftheJohansenformationoutsideof
Norway.
Thethesisispartitionedintovehapters:
Chapter 1: Ashort introdutiontothethesis.
Chapter 2: Gives the bakground needed for the work done in this thesis.
Thisinludes moreonCCS,reservoirsimulation,thevertialequilibrium
assumption,andGPUprogramming.
Chapter 3: Showshowthestandardtransportandpressureequationsanbe
vertiallyintegrated,andusesthenitevolumemethodtogetanexpliit
sheme. Also disusses alternative algorithms, as well as GPU spei
problems.
Chapter 4: ReportsanddIsussesthetestresultsfromournumerialtests.
Chapter 5: Conludesthendingsfromourtests,anddisussesfurtherwork
onthissubjet.
Thenotationused throughoutthethesis, willmostlybestandardnotationfor
PDEs. Topreventonfusionhowever,wewillstillestablishitformally.
Our domain will be denoted
Ω
, and the size of the domain will be denoted|Ω|
. Further,adisreteellin a2Ddomain willbedenotedC i,j
,whereC i,j = [x i−1/2 , x i+1/2 ]×[y j−1/2 , y j+1/2 ]
willbethei
thellinthex
-diretionandthej
thell in the
y
-diretion. We alsohave thedisrete sizes∆x i = x i+1/2 − x i−1/2
and
∆y j = y j+1/2 − y j−1/2
, whih is the size of a ellinx
andy
diretions,respetively, as well as the size of a ell
|C i,j | = ∆x i ∆y j
. In our disreteproblem,weareonlyinterestedintwodimensions,hereisthereforenoneedto
establishanequivalentnotationfora3Dspae.
Along with the spae domain, the time domain is also disretized. We let
t n = P n−1
l=0 ∆t l
,where∆t l = t l+1 − t l
.Inthisthesis,
x
andy
willmeanthenormalx
andy
diretionsina2Doordinatesystemwhile
x
,thatisaboldfaex,willmeanthediretionalvetor(x, y)
. Thiswillalsobeusedfurtherin integration,where:
Z
C i,j
f dx =
Z x i+1/2
x i − 1/2
Z y j+1/2
y j − 1/2
f dx dy
Wealsohavetoestablishsomenotationforderivatives. In3Dweusethenormal
notation
∂f
∂z i
as thederivativeof afuntion
f
in diretionz i
, wherez i
anbeeithertime
t
oraspatial oordinatex
,y
,z
. Furtherwehave∇f = ( ∂f ∂x , ∂f ∂y , ∂f ∂z )
.The divergene willbe denoted
∇ · f = ∂f ∂x + ∂f ∂y + ∂f ∂z
. We will usea vertialequilibriumassumptiononour3Dmodeltogeta2Dmodel,duringthisproess
wewillintroduethederivatives
∇ || f
and∇ || · f
. Thesederivativesisthesame asfor3D,butintwodimensions,andthediretionsarenowparalleltoasurfae,in thisasethetopsurfaeofthereservoir.
Bakground
Here wewill givetheneessarybakgroundfor therest ofthe thesis. We will
start bya short introdutionto CCS, before givingabakgroundin reservoir
simulation,thevertialequilibriumassumptionandGeneral-Purposeomputa-
tiononGraphisProessingUnits(GPGPU).
2.1 CO
2
Capture and StorageCO
2
CaptureandStorageonsistsofthreeparts,aptureofCO2
,transportation ofCO2
,andinjetionof CO2
.TheaptureofCO
2
islikelytobeappliedtolargepointsoureslikefossilfuelplants,fuelproessingplants,andlargeindustrialplants. CapturingCO
2
fromsmallandmobilesoures,suhasintransportationandthebuildingindustry,is
likelytobemorediulttoimplement,aswellastooexpensivewithregardsto
theontributionthiswillgivetotheenvironment. Therearefourbasisystems
forapturingCO
2
fromusesoffossilfuelsandbiomass: Capturefromindustrialproessstreams,post-ombustionapture,pre-ombustionaptureandoxy-fuel
apture[3℄.
CO
2
anbetransportedineithergas,liquid,orsolidstate. Themostommon transportationalternativeis touse pipelines orshipsfor transportof gas andliquidCO
2
. Inatmospheripressure,CO2
takesupalargevolume,whihmeansthatlargefailitiesareneeded. Toreduetransportosts,oneaninreasethe
pressure,whih meansthat moreCO
2
anbetransportedin thesamevolume.TofurtherdereasethevolumeneededoneanliquefytheCO
2
,whihisneededif shipsare used fortransportation. Liquefationis an establishedtehnology
for naturaland petroleum gases, and the existing tehnology an be used to
liquefyCO
2
. SolidiationofCO2
isurrentlytooexpensivetobefeasible[17℄.The dierenes in transportation ost using ships and pipelines an be seen
in Figure 2.1. One an learly see that using pipelines is the most eonomi-
al solutionif thedistane is lessthan
∼
1,000 km, whih means that pipelinetransportationisthemosteonomialfortheCO
2
storagefailitiesin use.Figure2.1: CostanalysisofCO
2
transport. Imagefrom Colemanetal.[17℄.Figure 2.2: Illustration of methods forstoring CO
2
in underground geologial formations. ImagefromAndersonetal.[4℄.CO
2
storageanbedoneineitherageologialformation,likeoilandgaselds,salineaquifersandoalseams,ordiretlyintheoean. WheninjetingCO
2
intothe oean, weneed to injet it in depths greater than800 meter. Over time,
most of the injeted CO
2
will dissolve and beome part of the global arbonyle. Bothoeanstorageandinjetionintooalseamsis,todate,inaresearh
phaseandhasyetto bedemonstratedatapilotsale[4,9℄.
Figure 2.2 illustratesbothoshoreand onshoreCO
2
storage. Thisalso showshowCO
2
storageanbeusedinenhanedoilandgasreovery. Inthisthesiswewillonlyfousongeologialstoragein oilandgasreservoirsbelow800meters.
2.1.1 Trapping mehanisms
OnetheCO
2
isinjetedunderground,theCO2
mustbetrappedtobeseurelystored. TheCO
2
anbesubjettostrutural,residual,solubilityaswellasmin-eraltrapping[15℄.Wewillpresentallfourmehanismshere,butonlystrutural
andresidualtrappingwillbeaountedforin oursimulations.
Struturaltrappingisthemostommontrappingmehanisms,andmeansthat
theCO
2
perolatesupwardinthereservoir,sinetheCO2
ismorebuoyantthantheotherliquids. UltimatelytheCO
2
willreahthetopofthereservoirwhereit meets an impermeable ap-rok,and is trapped. The CO
2
analso meetafault,whihtheCO
2
annotowthrough,whihmighttrapsomeoftheCO2
.AstheCO
2
owsthroughtheporousrok,itdisplaestheuidalreadypresent.While theCO
2
ontinues to ow, other uidswill replaeit, but someof theCO
2
willstaybehindasresidualdropletsin thepore spae,therebythenameresidualtrapping. Thisanbeomparedto howwateristrappedinasponge.
WhenCO
2
isinjetedintobrine,someoftheCO2
willbedissolvedinthebrine.ThiswillmakethedissolvedCO
2
slightlyheavierthanthebrine,anditwillbetrappedbysinkingtothebottom. Thisisknownassolubilitytrapping.
Mineral trappingis aslowproess that goesonfor thousandsofyears. When
CO
2
isdissolvedinwater,aweakarbonaidisformed. Overtimethisaidanreat with theminerals in therok and form solidarbonate minerals, whih
eetivelytrapsCO
2
bybindingittotherok.2.2 Reservoir Basis
Reservoirsimulation haisbeen performed for nearly half aentury to aid the
petroleum industry in deiding where oneshould plae, and how one should
operate, injetion and extration wells to optimize extration of fossil fuels.
There areseveral hallenges inreservoirsimulation,inludingupsaling ofthe
rok parameters, transport models, and numerial models [1℄. In this setion
wewillfousonthebasitransportandpressuremodel, whilesomeupsaling
usingthevertialequilibriumassumption willbehandledlater.
This partisasummarizedversionofanintrodutiongiven by Aarnes,Gimse,
andLie[1℄.
Porosityisthevoidvolumefrationofthemedium. Theporosityisdenotedby
φ
, andisin therange0 ≤ φ < 1
. Sinetheporosityis dependentonthepres-sure, therokis atuallyompressible. Thisompressibilityis oftennegleted
however,andoneassumesthat
φ
onlydependsonthespatialoordinates. For aNorthSeareservoirthepermeabilityistypiallyintherange0.1 − 0.3
.Thepermeabilityisameasureoftherok'sabilitytotransmitasingleuidat
ertainonditions. Theporeorientationandinteronnetionbetweenporesare
essentialfortheow,andthepermeabilityisstronglyorrelatedtotheporosity,
but notneessarily proportional to it. Thepermeability is denoted
K
and isgenerally a tensor, however we are often able to diagonalize
K
by a hangeofbasis. TheSI-unitforpermeabilityism
2
,but itisommonlyrepresentedin
Dary(D)ormillidary(mD).OneDaryorrespondstoabout
9.869×10 −13
m2
.The void pores in the medium are lled with dierent phases, likeoil, water,
CO
2
andbrine. Thevolumefrationoupiedbyaphaseisalledthesaturationof that phase,denoted
s i
for phasei
. Togetherthe phaseswill oupy allthevoidvolume,andthesumofsaturationsinaporewillthereforealwayssumto
1
.Eah phase haveadensity (
ρ
) and avisosity(µ
), whih are funtions of thephasepressureandtheompositionofeahphase. Thedensityandvisosityare
ompressible,buttheompressibilityismostimportantforgasphasesandare
thereforeoftenignoredin uidphases. Dueto interfaialtension,wealsohave
to dene a apillary pressure between phases, being the dierene in density
betweenthetwophases:
p cij = p i − p j
.Assumingthatallphasesmaybepresentatthesameloation,itturnsoutthat
theowofonephasedependsontheenvironmentinthatloation. Thismeans
thatthepermeabilityexperienedbyonephasedependsonthesaturationofthe
otherphasesat thatloation,aswellassthe phases'interation withthepore
walls. Wethereforehavetointroduearelativepermeability
k i
,whihdesribeshow phase
i
ows in the presene of the others. The eetive permeability experienedbyphasei
inthepresseneofotherphasesistherebygivenbyk i K
.2.2.2 Mass onservation
Theinompressibletransportequationdesribeshowthesaturationofaphase
hanges overtime. Oneimportantpartof this transportequation isthat it is
mass onservative, whih means that the total volume of aphase should not
hangeexeptforthevolumeinjetedorextratedofthatphase[1℄.Onesimple
modelforthetransportofaninompressiblephase
i
istheontinuityequation:∂ρ i φs i
∂t + ∇ · (ρ i v i ) = q i
(2.1)ImagefromDary[19℄.
Here wehaveintrodued asoureterm,
q i
, whihis the volume rateof phasei
. If wedenev = P n
i=1 v i
to bethetotal uxand assumeaninompressible ow,meaning theuid densityis onstant,wean sum Equation(2.1)for allphases,whih givesus:
∇ · v = X q i
ρ i
(2.2)
The veloities will be modeled using an empirialrelation alled Dary's law,
aftertheFrenhengineerHenryDary.
Dary's law
WhereasFourier'slawdesribesheatondutivityandFik'slawdesribesdif-
fusion,Dary'slawdesribesowinaporousmedium. Dary'slawwasformu-
lated by the Frenh engineer Henry Dary in 1856 [19℄.While experimenting
with waterleaning througha sandlter, Dary foundout that theltration
veloityis proportional to aombinationof the gradient of theuid pressure
andpull-downeetsdue tothegravity,meaningtheveloityis relatedtothe
pressureandgravityforesthroughtherelation:
v i = − k i
µ i
K(∇p i − ρ i g)
(2.3)Where
p i
isthephasepressureandg
isthegravityvetor.InFigure 2.3weseeaskethof the apparatusused byDary. He lled the largertube with sand,
twosmallertubesto therightandusedthistoformulatehis law.
SimilartoFourier'slaw,whihstatesthatheatowsfromwarmtooldplaes,
andFik'slaw,whihstatesthattheuxgoesfromregionsofhighonentration
tolowonentration,Dary'slawstatesthattheveloitygoesfromregionswith
highpressuretoregionswithlowpressure.
Dary's lawwill be used to onvert (2.1)into an equation only dependent on
thetotalveloity,aswellasbeingusedtoalulatethepressuredrivenveloity.
Assumingatwo-phase,inompressibleowwithequalpressure,whilenegleting
gravity,anddening
λ i = µ k i
i
,and
λ t = λ n + λ w
oneanwritethetotalveloityas:
v = −(λ n + λ w )K∇p
(2.4)andusethistoalulate thephaseveloitygiventhetotalveloityby:
v i = −λ i K∇p
= − λ i
λ t
Kλ t ∇p
= λ i
λ t
v
Puttingthisinto (2.1)givesusthenewequation:
∂ρ i φs i
∂t + ∇ · (ρ i
λ i
λ t
v) = q i
(2.5)Thiswillbedoneinmoredetail,withoutnegletingthegravity,in hapter 3.
2.3 The Vertial Equilibrium assumption
Fortheriskassessmentneededin CO
2
storage,fastsimulationsarevital. The vertialequilibriumassumption,inhydrologyknownastheDupuitassumption[20℄,is an assumptionof equilibrium of ow in thevertialdiretion, andan
be used to simplify the problem and aelerate the simulation. Theassump-
tionhaslongtraditionsin reservoirengineering, andwasearly onextendedto
handle multi-phase ow [16, 36, 37℄. Equilibrium of vertial ow means that
theowupwardin thereservoirisequalto the owdownward. Oneommon
misoneption is that this equalsto no ow in the vertial diretion,but the
assumptionisatuallyequivalenttoanassumptionofinnitevertialow[16℄.
Usingthis assumption,weansolveforthedepth-averagedlateralow,trans-
formingthe3Dmodelintoa2Dmodel. Thisresultsintwobenetswithrespet
to performane: Lessdata to ompute anda loosertime step restrition[34℄.
Thederease in datato omputeonisthemostobvious,aswegofrom 3Dto
butomesfromthefatthatthetimestepoftenisrestritedbytheoarsening
ofthevertialdiretion. Thebenetsofusingthevertialequilibriumassump-
tion with respet to the timestep restritionhas beenexplored by Ligaarden
andNilsen [34℄.
Another aspet of the risk assessmentis to use reliable simulations, so we do
not introdue more errorsthan neessary. At rst glane, the removalof one
dimensioninthevertialequilibriumassumptionmayappeartointroduelarge
errorswithrespettothe3Dmodel,buttheerrorintroduedisoftenlessthan
the errors introdued by the oarse resolution needed to make the 3D model
omputationallytratable[34℄.Thismeansthat, inmanyases,amodelbased
on the vertial equilibrium assumption is both a fast and reliable simulation
tool,giventhevalidityoftheassumption.
As atypial aquifertargeted forCO
2
injetion isseveral kilometers wide,butonly20-100metershigh, itseemsnaturalto assumeequilibriumin thevertial
diretion. Ithasbeenshownthatthevertialequilibriumassumptionisvalidif
thequantity
ω = K x /K z (∇h) 2 ≪ 1
[34℄,whih willbevalidformanyaquifersexeptforasmall areaaroundtheinjetionwell.
IthasalreadybeenshownbyLigaardenandNilsen[34℄,thatonewillgaingood
performanebenets, with respetto the full 3Dmodel, byusing the vertial
equilibrium assumption on amodel forCO
2
migration. Further in this thesiswewilltakeadvantageoftheparallelarhitetureoftheomputertoaelerate
thesimulationsfurther,bothusingthemulti-orearhitetureoftheCPU,and
theextremeparallelismof theGPU.
2.4 Parallel omputation
Theideaofparallelomputationisto splitthetaskathandintosmallertasks,
whihanbeperformedinparallelonseparateores,therebytakingadvantage
of all the ores of the mahine. Highly parallel tasks, suh as Monte-Carlo
simulations and expliit stenil omputation, an often perform many times
betteronhighly parallelarhitetures.
Overthelastvedeades,omputationalperformanehaveinreasedexponen-
tially, by an almost ontinuous inrease in lok frequeny, losely following
Moore'slaw[40℄.Afewyearsago,thistrendeasedduetophysialrestritions
ofwhatsilionhipsanwithstandwithregardstoheat,andnowperformane
inreasesthroughmultipleoresatlowerfrequeniesonthesamehip. Another
limitationenounteredwasthevonNeumannbottlenek,whihmeansthatthe
inreaseinbandwidthbetweentheCPUandthemainmemoryhavetobepro-
portionalto theperformane inrease,whihhasnothappened[7℄. Byadding
moreoresonthesamehip,oneangainbetter performane usingthesame
amountofenergy,ormoreperformaneonthesameenergybudget[45℄.Along
with moretheoretialperformane,youalsogainfastommuniationbetween
ores,asalloresresideonthesamehip,andtherebyshareon-hipmemory.
Even though standard CPUs are beoming inreasingly parallel, most of the
tetures. One ore on the CPU is marked, and the transistors dediated to
omputations are marked in the "Exeution Units" setion of the ore. The
green partof the Nvidia hip are the CUDA ores, whih eah have one FP
Unit and oneINT Unit whih are dediated for oat-point and integerarith-
metirespetively. Imagesfrom Intel[25℄ andNvidia[44℄.
CPUsandGPUs. ImagefromNvidia[44℄.
transistorson aCPU arestill used for non-omputationaltasks likelogi and
ahe. InFigure 2.4aCPUoreisompared toaGPUhip. Weseethat the
CPU ore has alot less transistors dediated to pure omputations than the
GPU.Usingthemulti-oreproessortogetherwithoneormorehighly parallel
aeleratorunits,liketheGPU,givesusaheterogeneousarhiteturewhihhas
proventobebeneialomparedto ahomogeneousarhiteturewhereaCPU
didalltheomputationalwork. Theaeleratorunitstypiallyusemanyores
runningatalowerfrequenythantheCPU,butmaximiseperformanegivena
xedpowerortransistorbudget[11℄.
2.5 General-Purpose omputation on Graphis
Proessing Units
TheuseoftheGPUasanaeleratorunitforproblems notdiretly relatedto
rendering,isalledGeneral-PurposeomputationonGraphisProessingUnits
(GPGPU). These kindof problemsinlude,butare notlimitedto: stenilal-
ulations,linearalgebra,audioproessing,andvideoproessing. Thereasonfor
usingtheGPUasanaeleratorunit,istheextremeparallelarhiteturewhih
is also needed for the main purpose of a GPU, 3D rendering. By transform-
ingthealgorithmsusedintoparallelalgorithms,oneantakeadvantageofthe
GPUtoaeleratetheproblem. Otherhyperboliproblems,suhastheshallow
waterequation, andsingle-phaseowofoilinaporousmedia, havepreviously
beenshowntobesuitableforGPUaeleration[10,12℄,so weexpetthatour
problemwill alsobesuitableforGPUaeleration.
Thetheoretial performane ofGPUs andCPUs areshownin Figure 2.5. We
learly see that the GPU will outperforma CPU if wean takeadvantageof
all thepoweronaGPU. Oneother importantobservationis that theGPU is
notaetedbythevonNeumannbottlenek[7℄,thisisbeauseoftheseparate
memory hanneleah multi-proessorhaveto themain memory,and that the
main memory is on the same hip as the proessors. The GPU is also an
inexpensiveaelerator unit,ifwe omparetheNvidia GeforeGTX480with
thefastestCPUontheWestmerearhiteture,IntelCorei7Extreme990X,the
respetively 1
.
Sine the GPU was originally made for 3D rendering, for whih there is no
needforhighpreision,thesedevieshaveuntilreentlyonlysupportedsingle-
preisionoating-pointnumbers. Thisisthereasonwhytherewasno support
forhigher preisionupuntil NvidiareleaseditsGT200 arhiteture June2008
[42℄. The main drawbak with this arhiteture was its double-preision per-
formane, whih was
1/4
of the single-preision performane. Inmarh 2010, NvidiareleaseditsGF100arhiteture,bestknownasFermi, whih featuredadouble-preisionperformaneof
1/2
of thesingle-preision performane, along withafullyIEEE754-2008ompliantoating-pointimplementation[42℄.Look-ing at Figure 2.5, we an see this sudden inreasein double-preision perfor-
manewhihame withthelatestrelease.
TheGPUarhitetureisdierentfromvendortovendor,andherewewillonly
explain the arhiteture of Nvidia GPUs, for the newest Nvidia arhiteture
youan useFigure 2.4 for referene. The newest Nvidia arhiteture onsists
of everything from 1 to 16 multiproessors, whih eah onsists of 32 ores.
Previousarhiteturesonsistedofupto30multiproessors,whiheahonsists
of8ores. TheseoresarethemainomputationalunitsontheGPU,onsisting
of oneoating point unit and oneinteger unit. The warpshedulers are the
mainontrolunits ofthemultiproessor,andtellseahorewhihinstrutions
to perform, as well as swith threads on the ores. Eah warp sheduler an
manage 16K registers, giving a theoretial 1024 threads for eah warp, but
the number of threads managed by one warp sheduler will derease as the
number of registers used inreases. Theoretially, the Fermi arhiteture an
holdupto32,768threads,whiletheGT200arhitetureanhold upto30,720
threads. The GT200 arhiteture also ontain one double preision unit on
eahmultiproessor,whiletheFermiarhitetureatuallyusestwooresasone
doublepreisionunit.
There are twowarpshedulers for eah multiproessor on Fermi, and one on
previous arhitetures. The newest arhiteture also onsists of four speial
funtion units(SFU)formathematialoperations,liketrigonometrifuntions
andsquareroots, wherethepreviousarhiteturesonsistedoftwosuhunits.
Lastlywehaveloadandstoreunits, whih read andwrites to globalmemory.
Thereare16suhunitsontheFermiarhiteture,whiletheseunitswasplaed
ontheoreonolderarhitetures.
SeveralAPIsonsists fortaking advantageoftheGPU asanaeleratorunit.
A fewyearsago,onehad tousethegraphis pipelinealongwitha3Drender-
ing API,like OpenGL and Diret3D,to getaess to the GPU. This requires
knowledgeof3Drendering,to solveproblemswhihmight havenothingto do
with 3D rendering. Several attempts for makingan APIto theGPU without
using 3Drendering APIs havebeen made [28, 51℄, but none of them reahed
thewantedpopularity. In2006Nvidia releaseditsown, platformindependent,
API, the Compute Unied Devie Arhiteture (CUDA) [44℄, and sine then
theinterestinGPGPU haveinreasedbothin theresearhandprivatesetor.
The main drawbakwith CUDA is the requirement of aNvidia GPU, and in
1
Priesfromhttp://www.komplett.noat2011-04-09
vendorindependentalternativetoCUDA[30℄.
Inthisthesiswewill useCUDAasourAPI, andtherearemainly tworeasons
for us not hoosing OpenCL instead. One of them is that we already have
a good knowledge of CUDA, and therefor do not have to learna brand new
API. Another, and more important reason, is that even though OpenCL is
vendorindependent,theperformaneisnotvendorindependent. OpenCLonly
guaranteesthattheprogramompilesandrunsondierentplatforms,but the
performanewillnotbethesame,eventhoughthearhitetureshavethesame
theoretial performane. In OpenCL one still haveto write platform spei
odetogainthesameperformane. Beauseofthis,webelievethatusingCUDA
asourAPIwillbebeneialinshowinghowourproblemanbeaeleratedby
usingtheGPU.
2.6 CUDA
In this setionwewill giveabrief introdution to the programmingmodel in
CUDA,and howtooptimizetheoderunningontheGPU[44℄.
2.6.1 Programming model
SinetheGPUhasnooperatingsystem,CUDAusestheCPUtoinitializedata,
opydatatotheGPU,andstarttheGPUalulations.
Hostand devies
IntheCUDAprogrammingmodel,theCPUontrolstheowofdata,whilethe
GPUonlyexeutessmallprogramswhentoldtobytheCPU.Thismaster-slave
relationship is also reeted in the jargon used in CUDA, where the CPU is
referredtoasthehostandtheGPUisthedevie. Itisalsopossibleforthehost
toontrolseveraldevies,butadevie anonlybeusedbyonehostatatime.
Kernels
Funtions exeuted on the GPU are referredto askernels. The kernelshave
aesstotheGPUmemoryareas,suh asglobalmemory,texturememoryand
onstantmemory,anddatamustbeopiedfromthehostmemorytothedevie
memory in order for the kernels to use the data. Whena kernelnishes the
results must be written bak to the main memory to be available for further
proessing,ortobeopiedbaktothehostmemory.
The interfaeprovided makesit easyto movedata betweenthe GPUand the
CPU,howeverthisisanexpensiveoperationanditisreommendedtokeepthe
numberofopyoperationstoandfromtheGPU ataminimum. Alternatively
oneoulduseasynhronousdatatransfers,andtherebybeabletoontinuethe
programowwhilethedataisopiedin thebakground.
2.6.2 Thread hierarhy
When starting a kernel, the multiproessor reates, manages, shedules and
exeutesthreadsin groupsof32 threads,alled awarp. Thesewarpsstartthe
sameprogram,butarefreeto branhandexeuteindependentlyofeahother.
Thewarpsarenotpartoftheprogramingmodelitself,butarevitaltobeaware
ofsinethebloksaremadeupofwarps,andeahblokishandledbyonewarp
sheduler, meaningthe numberof warps in ablokis equal to thesize of the
blok divided by
32
. Eah thread in a blok have itsown thread id, startingwith
0
in therst warp. Allthreads inablokhaveaessto thesamesharedmemoryspae, whih anbeusedto synhronizeexeutionsandommuniate
betweenthreads. Asthethreadsinablokhavetheirownthreadid,thebloks
inagridalsohavetheirownblokid. Theseidsareoftenusedtondthedata
athreadshouldworkwith.
The bloks form thegrid, whih basially only ontainsthe dimensionsin x,y
and zdiretionof thedata,as oneanorderthebloksandthreads in botha
1D,2D,and3Dordering.
Along with the thread hierarhy, there is also a orresponding memory hier-
arhy. The memoryspaes are physial spaes, andreside oneither the GPU
hip,themultiproessor,orontheomputationaloresthemselves. Thismeans,
threads ondierentmultiproessors,annotusethememoryonthemultipro-
essors to ommuniate. There are three main memory spaes on the GPU,
loalmemory,sharedmemoryandglobalmemory. Theloalmemoryresideon
the omputational ores, and no ommuniationan bedone in this memory
spae. Onewould atually tryto not use this memory spae at all, and only
usetheregistersas,these arealotfaster thanusingtheloal memoryspae.
Themainmemoryusedforommuniationisthesharedmemory. Thismemory
isonaper-blokbasis,soeahblokanusethismemoryspaetoommuniate
internally and synhronize the data if needed. Theshared memory is ahed
andresidesonthemultiproess,whihmakesitalmostasfastasusingtheloal
memoryspae.
Thelastmainmemoryspaeistheglobalmemory. Thisisthememoryspaethe
whole gridanaess,anditanbeusedto ommuniate betweenallthreads;
however, this memory spae is slow and it is reommended to minimize read
andwrite operationsin theglobalmemoryspae.
Figure 2.6showshowthe threads belong toablok, whih againbelongsto a
grid. Italsoshowstheorrespondingmemoryhierarhywhereathreadaesses
the per-thread loal memory, eah thread in ablok anaess the per-blok
[44℄
shared memory, and everythread anaess the global memory. We also see
that twodierent gridsanaess theglobal memoryat the sametime. This
is something that rst ame with the Fermi arhiteture, whih allowed for
exeutionoftwokernelsatthesametime. OnGPUsreleasedbeforeFermi,this
wasnotpossible,andonlyonegridouldaess theglobalmemoryat atime.
Aonsequeneofthisisthat onehavetobearefulonthenewerarhiteture,
so two gridsdo not work on the same set of data in global memory, asthat
ouldleadtoorruptdata,aswellraeonditionsthatinhibitperformane.
2.6.3 Performane optimization
Even though a straightforward GPU implementation of highly parallel prob-
lems often will givegoodperformane gains byitself, the performaneanbe
inreasedfurtherbyavoidingextensivebranhing,memoryfething,andmem-
oryopies[43℄.
Branhing
Eah warp started by the multiproessor runs the same instrutions simulta-
neously. Sowhen one,orseveral, threads divergefrom therest ofthethreads
beauseof somedata-dependentbranhing,therest ofthethreads inthewarp
willaetthethroughputofdatainanegativeway,andonehavetomakesure
toavoiddatadependentbranhingasmuh aspossible.
As the warp size is 32, whih is the number of ores on a multiproessor, it
will bebeneial to keep thesize of thebloksamultiple of32. This wayall
theoresonamultiproessoranworkwith thesameblokat thesametime,
makingsurenothreads areleft idle.
Avoiding branhes is not neessarilyan easy task, as thebranhesdepend on
the data. What often is done is to alulate the outome of all the possible
branhes,and thendeidewhihshould beusedforthisthread. Thisway,one
will stillhaveabranh attheend, but thisis shortand youwill minimizethe
numberofylestheotherthreadswillhavetowait.Onsomeproblemsonean
atuallyremovebranhingalltogetherbyusingmax/minoperationsaswellas
bitoperationsinstead.
Memoryfething
OneoftheslowestoperationsontheGPUitselfisreadingandwritingtoglobal
memory, sine the global memory is not ahed. There are, however, ahed
memoryspaesthatanbeused: Constantandtexture. Bothofthesememory
spaes are read-only, whih meansyouwill haveto aess the global memory
spae when you are storing you result, but for reading it is preferred to use
eitheronstantortexturememory.
It is also important to do oalesed memory reads and writes. This means
that you align the data so the data needed for one blok lays onseutivein
memory. By doing this, you an read more data at the same time, up to 64
bytes at a time on arhitetures before Fermi and up to 128 bytes on Fermi
arhitetures,drastiallyreduingthenumberofmemoryfethesandinreasing
theperformane.
Anothertrikoneandotoinreasespeed,istowritebaktoglobalmemoryas
soonasyouhavealulatedthedata. SinethereadandwritesontheGPUare
asynhronous,thewritebaktoglobalmemorywillhappeninthebakground
whilethethreadsareontinuingtheiromputations.
Memoryopies
As already stated, oneshould minimize the numberof memoryopies. What
this meansisthat datathat neverhanges duringexeution, should beopied
over when starting the appliation, and never again. This gives an overhead
when starting the simulations, but minimizes the number of opy operations
during thesimulations. As anexample,weopythe whole grid,wells, et. to
the GPU when starting the simulation and only swap the new ux and well
dataduring thesimulations.
Even thoughCUDA is madeto inreaseperformane, the rules of ode read-
abilityand performane optimizationthat exists forprogrammers, should still
apply when writing GPU aelerated programs. It is, as with any other pro-
grams, highly unlikely that no one will ever touh your ode again, so it is
importantthatyouwriteodethatispossibleforotherstoread. Withregards
toperformaneoptimizationitisstated: Weshouldforgetaboutsmalleien-
ies,sayabout97%ofthetime: premature optimizationistherootofallevil.
Knuth[32,page268℄.
This does not mean optimization should not be done at all, but wait until
youhaveaworking solutionbefore startingtheoptimization. Anotherbenet
of waiting with optimizing, is that it is easy to optimize smarter. You an
analyse theprogram by runningit throughaproler, and then onlyoptimize
the bottleneks of your ode, instead of randomly optimizing the parts you
believeare theslowest. Further it meansthat youin 97%of theasesshould
negletsmall inreasesin performane, whihleaves3%where youshould not
neglet small inreases in performane. Sine wewish to run the simulations
several times, a speedup of 2% will give us one extra simulation every 50th
simulation,whihin thelongrun willgrowand makeitpossibleto workmore
eiently.
Mathematial models and
numerial simulations
In this hapter we will introdue the full 3D model of our problem. Wewill
then use the vertial equilibrium assumption and integrate the model into a
orresponding 2D equation before we derive a numerial sheme of the new
model, using anite volumemethod. Next, we introdue theCFL ondition,
beforedisussing dierentalgorithmsforimplementingthesheme. Lastly, we
willdisussinmoredetailthehoiesmadetoinreaseperformaneofourGPU
solver.
3.1 The Vertial Equilibrium model
Toderiveourmodel,westartwiththebasitransportequationfromEquation
(2.1), along with Dary's law for onephase ow from Equation (2.3). Inour
problem weareonlyinterestedintwophases,CO
2
andbrine. Wealsonegletbothrokanduidompressibility,aswellasassumeasharpinterfaebetween
CO
2
andthe brine. Thereasonweanassumethat CO2
is inompressible, is that CO2
isatuallyinaliquidstateatthedepthsweareinterestedin.Assuming a two-phase, inompressible ow with equal phase pressures, and
dening
λ i = µ k i i K
,λ t = λ co 2 + λ w
, andf (S) = λ λ co t 2
, we anwrite the totalveloityas:
v = −λ t ∇p + (λ co 2 ρ co 2 + λ w ρ w ) g
(3.1)whihgivesusanexpressionfortheveloityofCO
2
,giventhetotalveloity:Figure3.1: IllustrationoftheCO
2
plumeasassumedbytheVEmodel. ImagefromLigaardenandNilsen [34℄.
v co 2 = −λ co 2 (∇p − ρ co 2 g)
= −f (s)(λ t ∇p − λ t ρ co 2 g)
= f (s)(−λ t ∇p + λ t ρ co 2 g)
= f (s)(v + λ w ρ w g − λ w ρ co 2 g)
= f (s)( v − λ w ∆ρ g )
where
∆ρ = ρ co 2 − ρ w
. Byaddingand subtratingλ co 2 ρ w g
in Equation(3.1),andletting
S
bethesaturationofCO2
,weanrewriteoursysteminto:∂φS
∂t + ∇ · f (S)(v + λ w (S)∆ρg) = q co 2
v = −λ t (∇p − (f (S)ρ co 2 + (1 − f (S))ρ w ) g )
∇ · v = q tot
(3.2)
whih istheproblem we areinterestedin. Our fouswill beonimplementing
theontinuityequation,while Dary'slawwillbeusedtoupdate thepressure.
The next stepis to vertially integrate Equation(3.2). Todo this, we dene
theaveragedsaturation
s = h/H
tobetherelativeheightoftheCO2
plumeasillustratedinFigure3.1. Thus
s
isafuntionof timet
andspatialoordinatesx
ofthereservoir.Thevertialequilibriumassumptionanbeusedinseveraldierentways;where
themoststraightforwardisavertialaveragingoftherokparameters. Aswe
ansee from Figure 3.1, the CO
2
onlytakes upasmall portion of the heightof theaquifer. This mightresultin largeerrorswhere thepermeabilityis not
homogeneous in the vertial diretion, whih easily an be seenby using the
Johansen formation, where the permeability inreases towards the top of the
formation. Thismeansowwouldgotooslowbyonlyaveragingthepermeabil-
ity. Byslightlymodifyingtheassumptionto avertialequilibriumin theCO
2
lledpartoftheaquifer;weaninsteadintegratethepermeabilityintherange
0
to
h
,whihisthepermeabilityatuallyinueningtheCO2
ow. Thesamear-gumentholdsfortheporosity,butsinetheporosityisrelativelyhomogeneous,
theerrorwill notbeaslarge. It hasbeenshownbyLigaardenand Nilsen [34℄
that integrating the permeability will givea better approximation than aver-
agingthepermeability. Itwillalso requireus toalulate thepermeabilityfor
everytimestep,meaningweinreasetheomputationalostsome.
3.1.1 Vertial Integration
Forthe vertial integration, we haveto dene avertially integrated porosity
andintegratedpseudomobilities[34℄.Theintegratedporosityisgivenby:
Φ(s, x) =
Z sH(x) 0
φ(z, x) dz
(3.3)whiletheintegratedpseudomobilitiesaregivenby:
λ ˜ co 2 (s, x) =
Z sH(x) 0
k co 2 (1) µ co 2
K || (z, x) dz, λ ˜ w (s, x) = Z H(x)
sH (x)
k w (1) µ w
K || (z, x) dz
(3.4)
Thesepseudomobilitiesgivetheorrespondingfrationalowfuntions:
f ˜ (s, x ) = ˜ λ co 2 (s, x )
λ ˜ co 2 (s, x) + ˜ λ w (s, x) , f ˜ g (s, x ) = ˜ λ w (s, x ) ˜ f (s, x )
(3.5)LastlyourVE equivalentofthe apillaryterms reads,when disregardingap-
illary fores in theunderlying model,
p c (s, x) = H (x)g ⊥ ∆ρs
, whereg ⊥
is thegravity omponent perpendiular to thesurfae. This givesus thenew, verti-
allyintegratedmodel:
∂Φ(s, x)
∂t + ∇ || · h
( ˜ f (s, x)v ve + ˜ f g (s, x)[g || (x) + ∇p c (s, x)] i
= q co 2 (x, t)
∇ || · v ve = q tot (x, t) v ve = − ˜ λ t (s, x) h
∇ || p t −
f ˜ (s, x)ρ co 2 + (1 − f(s, ˜ x))ρ w
g || (x) i
−
˜ λ w (s, x)∇ || p c (s, x)
(3.6)
Toonsider residualtrapping, the evaluation of the relative permeabilities in-
volvestheresidualsaturations
S rw
andS rco 2
forbrine andCO2
. Thenk co 2
isevaluated at
1 − S rw
, whilek w
is evaluated at1 − S rco 2
where the historialtion. Moreover
Φ(s, x )
is multipliedby1 − (S rw + S rco 2 )
where thehistorialmaximumislargerthantheurrentaveragedsaturation,and
1 −S rw
elsewhere.If theporosity is onstant in thevertialdiretion,weansimplify
Φ(s, x)
toφ(x) · sH(x)
. Usingthis,thetransportequationtakesthesimplerform:φ( x ) ∂H( x )s
∂t + ∇ · h
f ˜ (s, x ) v ve + ˜ f g (s, x )[ g || ( x ) + ∇p c (s, x )] i
= q co 2 ( x , t)
(3.7)Tosimplify notationsin thenextsetion,wedenethetotaluxas:
f (s, x) := ˜ f(s, x)v ve + ˜ f g (s, x)[g || (x) + ∇p c (s, x)]
3.2 The Finite Volume Method
Inthissetionwewillusethenitevolumemethodtoderiveashemewhihan
beusedtosolveourmodelprobleminEquation(3.7). Thismeanspartitioning
the model into ells, and integrating overthese ells. Looking at ell
C i,j
weget:
Z
C i,j
φ(x) ∂H(x)s(x, t)
∂t dx +
Z
C i,j
∇ · f (s, x) dx = Z
C i,j
q co 2 (x, t) dx
(3.8)If weassumethe sizeof theellstends towardszero,weanalsoassumethat
theporosityisonstantinsidetheell.
φ i,j
Z
C i,j
∂H(x)s(x, t)
∂t dx +
Z
C i,j
∇ · f (s, x) dx = Z
C i,j
q co 2 (x, t) dx
(3.9)IntegratingEquation(3.9)intime,wegetanexpressionforthenexttimestep:
φ i,j
Z
C i,j
H(x)(s(x, t n+1 )) − s(x, t n )) dx + Z t n+1
t n
Z
C i,j
∇ · f (s, x) dx dt
= Z t n+1
t n
Z
C i,j
q co 2 (x, t) dx dt
(3.10)
Theintegrationoftheuxin spae,anbewritten as:
Z
C i,j
∇ · f (s, x) dx =
Z x i+1/2
x i − 1/2
f (s, x, y j+1/2 ) − f(s, x, y j−1/2 ) dx +
Z y j+1/2
y j − 1/2
f (s, x i+1/2 , y) − f (s, x i−1/2 , y) dy
(3.11)
Nowwesimplifytheequationsfurther,bydeningtheaveragedsaturationina
ell
S i,j
,thefaeuxesF
andG
,aswellasthetotalvolumerateofsoureinaell
Q
by:S i,j n = 1
|C i,j | Z
C i,j
H (x)s(x, t n ) dx
F i,j n = 1
|C i,j | Z t n+1
t n
Z x i+1/2
x i− 1/2
f (s, x, y j ) dx
G i,j n = 1
|C i,j | Z t n+1
t n
Z y j+1/2
y j − 1/2
f (s, x i , y) dy
Q n i,j = 1
|C i,j | Z t n+1
t n
Z
C i,j
q co 2 ( x , t) d x
Thesedenitions simplifyournumerialshemeinto:
S i,j n+1 = S i,j n − 1 φ i,j
h F i,j+1/2 n − F i,j−1/2 n + G i+1/2,j n − G i−1/2,j n − Q n i,j i
(3.12)
Asweanseefromtheequation,thehangeinellsaturationonlydependson
thesoure inside theellitselfand theuxesovertheellfaes. Thisis, asit
shouldbe,adisreteequivalenttoEquation(3.7),whihstatesthatthehange
at apointin thereservoironlydependsonthesouresatthatpointaswellas
ontheuxesintoandoutofthatpoint.
Up tothispoint,noapproximationexeptfordisretizationofthedomainhas
beendone,whihmeansthatthisequationisexatasthesizeoftheellstends
towardszero. Fromhereonout,however,wewill havetoapproximatethefae
uxesaswellasthetimeintegralsto getafullyexpliitsheme.
Therststepinmakinganexpliitshemeistoapproximatethetimeintegrals
bythevalueat theprevioustimestep. Thismeansusing:
Z t n+1
t n
f (t) dt ≈ ∆tf (t n )
Alternatively we ould have used the same time step as we are solving for,
whih would have made an impliit sheme, or a Gaussian quadrature rule
to approximate the integral,whih would make aRunge-Kutta time-stepping
Raphson method to solve the orresponding disrete nonlinear system, whih
is notthat easy to transforminto aparallelalgorithm, this isthe reasonwhy
we stik to an expliit sheme. In reent publiations, however, the impliit
sheme hasbeenusedwith suesswheretheGPUwasusedbothtoalulate
thepreonditionerandsolvethelinearsystem[5,31℄.
Ourhangeoverafaeisgivenby:
F i,j−1/2 n =
Z x i+1/2
x i − 1/2
f ˜ i,j−1/2
h
v ve i,j−1/2 + ˜ λ w i,j−1/2 (g || i,j − 1/2 + ∇p c i,j− 1/2 ) i
(3.13)
Sinetheveloityandgravityarealulatedonaperfaebasis,theproblemlay
inndinganapproximationtothefrationalowrateandthepseudomobilities.
Sine thefrational owrate isexpressed from thepseudomobilities, weonly
havetondanexpressionforthepseudomobilitiesoverafae.
3.2.1 The Upstream Mobility Weighting Sheme
Theupstreammobilityweightingshemeassumesthattheowoveraellonly
travelsin onediretion,and themobilityoverafaeis onlydependent onthe
mobilitiesoverthetwoneighboringells. Wealsoassumeapieewiseonstant
saturationovereahell. Thisgivesthefaepseudomobilities:
˜ λ co i,j−1/2 2 =
˜ λ co i,j 2
if
(v ve > 0
and(g || + ∇p c ) > 0)
or( v ve − λ ˜ w i,j−1/2 ( g || + ∇p c ) > 0)
˜ λ co i,j−1 2
else(3.14)
˜ λ w i,j−1/2 =
˜ λ w i,j
if
(v ve > 0
and(g || + ∇p c ) > 0)
or(v ve − λ ˜ co i,j−1/2 2 (g || + ∇p c ) > 0)
˜ λ w i,j−1
else(3.15)
This alsoholdsfor
λ ˜ i,j+1/2
,λ ˜ i−1/2,j
and˜ λ i+1/2,j
andwill beusedtoalulatethefrationalowarossthefae.
3.3 The CFL ondition
The CFL ondition is a neessary ondition whih must be satised for the
numerialmethodtobestable,andtherebyonvergetothesolutionasthegrid
isrened.
CFL Condition [33℄: A numerial method an be onvergent only if its nu-
merial domain of dependene ontains the true domain of dependene of the
PDE, atleastinthe limitas