Machine Learning for Exploration of Silica Structures Using Molecular
Dynamics
Hans Erlend Bakken Glad
Thesis submitted for the degree of
Master in Computational Science: Materials Science 60 credits
Department of Physics
Faculty of Mathematics and Natural Sciences
UNIVERSITY OF OSLO
Machine Learning for Exploration of Silica Structures Using
Molecular Dynamics
Hans Erlend Bakken Glad
© 2021 Hans Erlend Bakken Glad
Machine Learning for Exploration of Silica Structures Using Molecular Dynamics
http://www.duo.uio.no/
Printed: Reprosentralen, University of Oslo
Abstract
We use molecular dynamics and machine learning for exploring structures in alpha quartz, a crystalline form of silica. This material is simulated at the molecular level, and structures are created by carving out atoms from different locations in the material. By removing atoms, we can change the physical properties of the material, such as the yield stress. We use simplex noise (a type of pseudo-random noise) to sample different structures in alpha quartz, and we demonstrate that these structures represent materials with a wide range of yield stresses. The yield stresses are obtained by simulating tensile strain on the systems.
Using data from many systems of alpha quartz, we train a convolutional neural network (CNN) for predicting the yield stress of alpha quartz structures.
This CNN is used in a search algorithm that tries to find structures that are stronger than the structures used in training the CNN. The structures are restricted so that they have roughly the same porosity. The search algorithm was successful in finding stronger structures, and the yield stress of the strongest material is estimated at 3.412 GPa. We show that the search algorithm can be modified to instead look for weaker materials, and also that it can be used to search for materials with a specific yield stress. The weakest material found was estimated to have a yield stress of 2.103 GPa.
We find that the strongest alpha quartz structures follow a distinct pattern.
The cuts in these structures are arranged in a hexagonal shape, thus maximizing the minimum distance between individual cuts. In weak structures, we find that there are usually many cuts that are placed close to each other. These results indicate that the yield stress of alpha quartz has a notable dependence on how the cuts are placed relative to each other. To maximize the yield stress, cuts should be placed as far apart as possible.
Acknowledgements
First of all, I would like to thank my supervisors Anders Malthe-Sørenssen and Henrik Andersen Sveinsson. Their guidance in the form of concise feedback and suggestions on how I should go about this study has been invaluable. Our weekly meetings have been very important for the progression of this thesis. I would also like to thank my fellow student and collaborator Christer Dreierstad.
We had many fruitful discussions during the time we worked together.
Finally, I want to thank my friends and family for their support, and for helping me with staying motivated and sane throughout these years.
Contents
1 Introduction 1
1.1 Properties of graphene and silica . . . 1
1.2 Estimating material strength . . . 2
1.3 Designing material structures . . . 2
1.4 My contribution . . . 3
I Theory and Methods 5 2 Background 6 2.1 Stress . . . 6
2.2 Strain . . . 7
2.3 Griffith’s theory for fracture . . . 8
2.4 Material porosity and cut density . . . 8
2.5 Machine learning in materials science . . . 9
3 Molecular dynamics 10 3.1 Time integration . . . 11
3.1.1 Velocity-Verlet . . . 11
3.1.2 Statistical ensembles and time integration . . . 12
3.2 Simulating stress and strain . . . 13
3.2.1 Savgol smoothing of stress data . . . 14
3.3 MD precision . . . 15
3.4 MD potentials . . . 16
3.4.1 AIREBO . . . 17
3.4.2 Vashishta . . . 17
3.5 Thermostats . . . 18
3.5.1 Langevin . . . 18
3.5.2 Nose-Hoover . . . 19
4 Machine learning 20 4.1 Neural networks . . . 20
4.1.1 Feedforward neural networks . . . 20
4.2 Forward pass and activation functions . . . 21
4.3 Cost and loss functions . . . 23
4.4 Backpropagation . . . 24
4.5 Gradient descent . . . 27
4.5.1 Stochastic Gradient Descent (SGD) . . . 28
4.5.2 The Adam optimizer . . . 28
4.6 Convolutional neural network . . . 29
4.6.1 The convolutional layer . . . 30
4.6.2 Pooling . . . 30
4.6.3 Choosing a CNN architecture . . . 31
4.7 Measuring the performance of a neural network . . . 31
4.8 Implementing saliency maps for visualizing CNN decision-making 32 4.8.1 Comparison with von Mises stress . . . 33
5 Generating new structures by introducing cuts in materials 34 5.1 Predicting yield stress with a convolutional neural network . . . . 34
5.2 Types of systems studied . . . 35
5.2.1 Graphene sheet . . . 35
5.2.2 Crystalline silica (alpha quartz) . . . 36
5.2.3 Amorphous silica . . . 36
5.3 Sampling methods . . . 37
5.3.1 Cut sequences . . . 37
5.3.2 Random, uniform sampling . . . 37
5.3.3 Manual cuts . . . 38
5.3.4 Simplex noise . . . 40
5.3.5 Shuffling cuts in a cut configuration . . . 45
5.4 Results from graphene sheet deformation . . . 46
5.5 Overview of silica datasets . . . 46
5.5.1 Amorphous silica datasets . . . 47
5.5.2 Alpha quartz datasets . . . 52
6 Searching for systems with specific properties 56 6.1 Search algorithm for finding stronger structures . . . 56
6.2 Generating samples with desired yield stress . . . 57
7 Methods for analysis of structures 58 7.1 Radial distribution of cuts . . . 58
7.2 Heatmap of combined configurations . . . 60
8 Software and hardware 61 8.1 LAMMPS . . . 61
8.2 Python . . . 61
8.3 Computing clusters . . . 61
II Results 62 9 Circular cut in silica (Griffith’s theory test) 63 10 Exploration of simplex noise structures in alpha quartz 65 10.1 Training CNNs on structures generated with simplex noise . . . 65
10.2 Simplex scale and characteristic cluster length . . . 67
10.3 Configurations found using search algorithm . . . 67
10.3.1 Strongest performers . . . 69
10.3.2 Weakest performers . . . 70
10.3.3 Configurations generated with target yield stress . . . 70
10.3.4 Shuffle placement of cuts in strongest configuration . . . 72
11 Characterizations of structures in alpha quartz systems 74 11.1 Radial distribution of cuts . . . 74
11.2 Combined image of strongest and weakest configurations . . . 74
12 Saliency maps 77 III Summary and conclusions 81 13 Discussion 82 13.1 Crystalline vs. amorphous silica . . . 82
13.2 Comparisons with Griffith’s theory of fracture . . . 82
13.3 Effect of simplex scale on yield stress . . . 83
13.4 Average structure of strong and weak configurations . . . 83
13.5 Distribution of cuts . . . 84
13.6 Performance of search algorithm . . . 84
13.7 Designs with highest and lowest yield stress . . . 85
13.8 Search algorithm on shuffled cuts . . . 85
13.9 Samples generated with desired yield stress . . . 85
13.10Interpreting CNN predictions using saliency maps . . . 86
14 Conclusion 88 14.1 Usability of the search algorithm . . . 88
14.2 Effect of structure on material strength . . . 88
A Results from using poor Savgol smoothing 90
Chapter 1
Introduction
In everyday life, we are surrounded by materials of different types. A material is the matter from which an object is made or can be made out of. Wood, sand, gravel and oil are examples of naturally occurring materials, while some materials are man-made, such as steel and plastics. The applications of a material and its reliability depends on its physical properties.
In a modern home, everything from furniture and kitchen appliances, to the latest in smartphone and computer hardware is made with some combination of materials. On a larger scale, enormous amounts of different materials are used for building apartment complexes, skyscrapers, roads and bridges. In creating these various tools, appliances and large-scale constructions, the choice of material is important since different materials have different physical properties. A highway bridge should be built using materials that can sustain a high load without breaking, while at the same time the costs and practicality of working with the materials must be considered. When engineering electrical components and computer parts, other properties such as electrical conductivity and resistivity are more important, the material strength less so.
1.1 Properties of graphene and silica
Clearly, the ability to control material properties is valuable, as everything from mundane furniture to computer processors are limited by the materials they are comprised of. We will be looking at the properties of graphene and silica, and how these properties can be changed by altering their structure.
Graphene is a man-made material consisting of an atomically thin layer of carbon atoms [1], and was first synthesized by Novoselov et al. in 2004. In that study, graphene as thin as a single layer of atoms was produced. In addition to being very lightweight, this material has other desirable properties such as high tensile strength and high thermal conductivity [2]. Silica, or silicon dioxide, is found naturally in sand, rocks and even inside living organisms, and it is used in industries such as the production of computers and electronics [3]. The material is important to the semiconductor industry because it is an excellent insulator [4].
1.2 Estimating material strength
We would like to investigate how the properties of graphene sheets and silica blocks can be changed by making changes to their structures. Specifically, we are interested in the strength of the materials. A suitable experiment for investigating these properties is to stretch the material until it breaks. At the molecular level, materials break due to the breakage of interatomic bonds. This makes molecular dynamics (MD) simulations a reasonable choice for studying material strength and stretchability. Studies such as Bauchy et al. [5] and Wu et al. [6] have used MD simulations for estimating the strength (in addition to other properties) of cement and crystalline cellulose, respectively. We would like to use similar MD methods for estimating the strength of different graphene and silica structures.
1.3 Designing material structures
Recently, machine learning models have proven to be useful tools in designing new materials [7, 8]. Machine learning models are statistical models that learn from the data that they are provided, without being given explicit instructions on how the data should be fitted [9]. In the paperAccelerated search and design of stretchable graphene kirigami using machine learningby Hanakata et al. [8], designs that maximize the stretchability of graphene sheets were found with a search algorithm that made use of machine learning methods. Different designs were made by cutting out rectangular pieces of the material.
In this thesis, we start by implementing the same molecular dynamics and machine learning methods as Hanakata et al. [8]. This will be a sort of preparation that allows us to familiarize ourselves with the methods before moving on to more original work. In the main part of the thesis we will use the same procedure and similar methods on different materials, specifically blocks of silica. Instead of stretchability, we will focus on the strength, oryield stress, of the material. The schematic in Figure 1.1 provides an overview of the process of designing new materials with these methods. We will be using a more advanced method than Hanakata et al. [8] for sampling the possible cut shapes. The shape of the cuts in each structure will be sampled using simplex noise, an algorithm that has been used for generating textures in computer graphics [10]. With this type of noise, we will generate a wide range of unique silica structures. We will then investigate the degree at which these structures affect the yield stress of the material. For example, if we create multiple structures that all have the same amount of silica removed from them, are there structures that are significantly stronger than others? What distinguishes a weak structure from a strong structure? The search algorithm by Hanakata et al. [8] will also be implemented, and in the same vein we will use it to find optimal structures that maximize the yield stress. We will modify the search algorithm so that it can be used to search for weak structures as well, and we also make modifications that allow us to search for structures with a specific yield stress.
Figure 1.1:The process of designing new materials using machine learning. Illustration obtained from Liu et al. [7].
1.4 My contribution
In this thesis I have estimated the yield stresses of silica structures by performing thousands of molecular dynamics simulations. I generated numerous different silica structures by cutting out parts of the material, and the yield stress was obtained for each of these structures. Simplex noise [10] was used for sampling the cut patterns. I used this data to train a machine learning algorithm for predicting a silica structure’s yield stress with the structure itself as the input, in the form of an image. I then used the model in a search algorithm for finding stronger structures than previously seen. The implementation of the search algorithm was inspired by the search algorithm in Hanakata et al. [8], where the authors used it to search for graphene structures with high yield strain. I was also able to extend the functionality of my version of the search algorithm, allowing the search for weaker structures or structures with a specific yield stress.
As far as I’m aware, the application of simplex noise for creating new structures in atomic systems has not been explored to a significant degree. In this thesis, it has been demonstrated that simplex noise can be used to sample many different types of structures. I show that these material structures have a wide range of yield stresses, even when controlling for the amount of material that is carved out in order to create the structures. Tools for characterizing strong and weak structures were developed, and they were used to show that typical strong and weak structures in silica have different cut patterns.
The search algorithm was successful in finding structures that were stronger than previously seen. Using the search algorithm to search for weaker structures also worked well. Structures with specific (target) yield stresses were also found using the search algorithm. The precision at a given target yield stress seems to be proportional to the number of training samples the model has seen with
similar yield stresses.
Part I
Theory and Methods
Chapter 2
Background
This chapter contains definitions of central physical quantities such as stress and strain and theory on elastic fracture including Griffith’s theory on fracture. The application of machine learning in materials science is also introduced. The theory and definitions of stress and strain are fromAtomistic Modeling of Materials Failureby Buehler [11], and Griffith’s theory on fracture is covered using the book Fracture Mechanicsby Zehnder [12].
2.1 Stress
The stress in a material is a physical quantity that measures the internal forces present in the material. A material that has a forceFapplied to one of its surfaces with areaAgives rise to a stressσin the material:
σ= F
A , (2.1)
which is given in units of pascals, Pa=N/m2. Note that the above equation does not account for the direction of the force or which surface it is applied on. When studying materials in a three-dimensional space, it is suitable to instead work with a more general expression for stress. This is known as the stress tensor and contains multiple components that account for the directions of the forces and the orientations of each surface.
Defining the stress tensor: In the three-dimensional case, we use the index ito refer to the direction of surface normal iandj refers to the direction of the force. Each index can be either x, y or z, as in a familiar Cartesian coordinate system. The stress tensorσijis then defined as
σij = Fij
Ai , (2.2)
whereFijis the force acting on the surface with normal vectorniin the directionj andAiis the area of that surface. This is illustrated in Figure 2.1. The components in the tensor wherei = jrepresent normal stresses whilei 6= jrepresent shear stresses. Fori6= j, conservation of angular momentum leads to symmetry of the stress tensor, so that
σij =σji. (2.3)
In this thesis, we focus on tensile stress in they-direction. Note that the y- and z-axes are swapped in our simulations compared to Figure 2.1. Tensile stress is normal stress where the direction of the force is positive, meaning that the material is stretched.
Figure 2.1: The components in the stress tensor with surface normalsi = xandi = y.
Obtained from Buehler [11].
The yield stress: Zehnder [12] writes that the yield stress (also called yield strength) is the amount of stress required for the material to irreversibly deform.
We take this to be the highest amount of stress recorded during a molecular dynamics simulation, with some smoothing applied to the data beforehand (section 3.2.1 describes how a Savgol filter can be used for this purpose).
2.2 Strain
The straineon a material is a unitless quantity that expresses the relative change in the dimensions of the material. If we have a hypothetical one-dimensional material with initial lengthLthat is stretched so that the length increases by∆L, the strain becomes
e= ∆L
L = L+∆L
L −1 . (2.4)
Similar to the stress tensor, the strain tensor provides a general expression for strain in three dimensions. The strain tensor can be written in terms of the
deformationseij ,
eij = 1
2 eij+eji
= 1 2
∆Li Lj + ∆Lj
Li
. (2.5)
For tensile strain we have thati= j, so the strain tensor reduces to eii= ei = 1
2 ∆Li
Li + ∆Li Li
= ∆Li
Li . (2.6)
Tensile strain is the focus in this study. The yield strain is the amount of strain required for the material to irreversibly deform, which will be at the same point as the yield stress in a molecular dynamics simulation.
2.3 Griffith’s theory for fracture
The theoretical maximum strength of a material is limited by the interatomic bonds in the material. In practice, flaws in the material are what limits their strength. Griffith demonstrated that the fracture stress in glass is on the order of 100 times lower than its theoretical strength.
For a plate undergoing tension with a crack of length 2a, the fracture stressσf can be expressed as
σf =
rGcE
πa , (2.7)
where Gc is the toughness of the material and E is Young’s modulus. To test if the relation between fracture stress and half crack lengtha holds, A. Griffith performed an experiment using glass tubes and spheres. A glass cutter was used to introduce a crack in each of the samples, then pressure was applied until the object cracked. The pressure at this point is the fracture stress σf. Figure 2.2 shows the results from Griffith’s experiment, where we see that fracture stress is indeed linear with respect to 1/√
a.
We will later show that this relation also holds for alpha quartz systems simulated at a molecular level. By introducing circular cuts into the material and straining the system until breakage, we can obtain a measure of the yield stress.
2.4 Material porosity and cut density
The porosity φ of a material is the volume of empty space Vp in the material divided by the total volumeVof the material [13],
φ= Vp
V . (2.8)
In this study, new material structures are created by cutting out parts of the material, changing its porosity. When working with blocks of silica, cuts are only added to a small region in the material. The overall change in porosity is thus not very high, so it is more meaningful to talk about the porosity of the region where cuts are allowed (thecut region). We will refer to this local porosity as the
Figure 2.2:Fracture stress as function of half crack length, taken from Zehnder [12]. The points are recorded data points while the line is a linear fit to the data.
cut densityof the system. This is a useful measure because it lets us control for the overall reduction in material strength from removing parts of it. This makes it easier to investigate the properties of different structures by comparing structures with the same cut density.
2.5 Machine learning in materials science
Machine learning has opened up for new approaches in the exploration of physical properties of materials [7]. The use of molecular dynamics (MD) simulations for predicting the behaviour of materials is a well-established method with many uses, such as determining material strength (Bauchy et al.
[5], Wu et al. [6]), but there are limitations to this method. MD simulations can be computationally expensive and time-consuming, depending on factors such as the number of atoms in the system, the type of interactions being modelled, and the total number of timesteps to simulate.
With machine learning, statistical models can be built using data obtained from MD simulations. These models can be used to search for new materials with desirable properties more efficiently compared to only using traditional MD [7, 8]. In this thesis, machine learning will be used for similar purposes.
Chapter 3
Molecular dynamics
In this chapter, we go over the basics of molecular dynamics and its uses. The theory on this subject is primarily provided by Buehler [11]. Our implementation of molecular dynamics simulations using the software LAMMPS [14] is also detailed.
Molecular dynamics (MD) simulations is a valuable tool for studying materials. In these simulations, materials are represented by a system of atoms.
From Newton’s 2nd law, a particle with massmsubjected to a forceFexperiences an acceleration a = F/m. In MD this force is the result of interactions with other atoms and it is modelled using interatomic potentials. A few of these potentials are introduced later in this chapter. Assuming that the particle masses are known, the accelerationa on every particle in the system can be calculated with a given interatomic potential. Once the acceleration is known, the velocities and positions can be obtained through numerical integration. This is covered in the next section.
When the motion of all the particles is known, properties such as pressure, temperature and stress can be calculated using the contributions from every atom. For example, the temperature T in units kelvin can be calculated using the mean kinetic energyhKiof the particles:
T = 2 3
hKi
Nkb , (3.1)
where N is the number of atoms and kB is the Boltzmann constant, kB = 1.3806503·10−23m2kgs−2K−1.
An atomic system can be made to represent a material undergoing some physical process, allowing the use of MD to simulate that process at the molecular level. Material deformation is the process that is studied in this thesis and can be induced by forcing the atomic system to change its volume and/or shape [15]. The point of material failure where the material deforms irreversibly is of particular interest in this study, as we aim to use molecular dynamics to investigate the yield stress of materials.
Studies such as Bauchy et al. [5] and Wu et al. [6] have shown that MD simulations can provide a better understanding of material failure. In Bauchy et al. [5], the fracture toughness of calcium-silicate-hydrate (C-S-H) is evaluated at the atomic level. C-S-H is responsible for the mechanical properties of cement,
and studies like this can aid with the design of tougher cement paste. Wu et al.
[6] studied the effects of tensile deformation on crystalline cellulose, providing insight into the mechanical properties of the material such as the yield stress and strain.
3.1 Time integration
When the atomic system is initialized in a MD simulation, the current positions and velocities of each atom are set. To advance the molecular dynamics system forward in time, we require a set of equations that can be numerically integrated in order to compute the new positions and velocities. For each new set of computations, the simulation is advanced in time by a timestep ∆t. Figure 3.1 illustrates this flow in a MD simulation.
Figure 3.1: Numerical scheme for performing molecular dynamics simulations. Figure obtained from Buehler [11].
3.1.1 Velocity-Verlet
The velocity-Verlet algorithm is a finite difference method used for integrating Newton’s equations of motion. This is the default integration scheme that LAMMPS uses for MD simulations [16]. Using this method, the position and velocity of a single particle in a one-dimensional system is updated with the following equations [17]:
xn+1 =xn+vn∆t+ 1
2an(∆t)2, (3.2)
and
vn+1=vn+1
2(an+1+an)∆t. (3.3) Here,x is the position, v is the velocity and a is the acceleration. The timestep
∆t defines the time between the current step n and the next step n+1 in the
simulation. The acceleration for a particle with massmis calculated using a given potentialV(x). As long as the potential is only position-dependent, we get
an=−1 m
dV(xn)
dx and an+1= −1 m
dV(xn+1)
dx . (3.4)
The equations (3.2) and (3.3) can be extended to three dimensions by rewriting the quantities as vectors [11]:
rn+1 =rn+vn∆t+ 1
2an(∆t)2, (3.5) vn+1=vn+ 1
2(an+1+an)∆t. (3.6) The acceleration is calculated using the gradient of the potential:
an=−1
m∇V(rn), (3.7)
and similarly for the terman+1. As long as the initial positions and velocities of the particles are known, this numerical scheme can compute the motion for every particle over a given number of timestepsN.
3.1.2 Statistical ensembles and time integration
The equations of motion can be time integrated in LAMMPS while keeping the statistical quantities of the system in line with a given statistical ensemble.
According to Schlick [18], commonly used ensembles are the NVT, NVE and NPT ensembles, whereNis the number of particles,Vis the system volume,Tis the temperature,Eis the energy andPis the pressure. When using a given ensemble, the physical quantities indicated by the name of the ensemble are kept constant in the system. For example, if a system follows the NVT ensemble, the number of particles, the volume and the temperature is held constant at all times.
To perform time integration in LAMMPS, an appropriate ensemble must also be chosen. This is because the equations of motion are dependent on which ensemble is used [19]. If we want to simulate a system following the NVT ensemble, we can use thefix nvtcommand:
fix ID group - ID nvt temp T s t a r t T s t o p T d a m p
IDis the fix label andgroup-IDis the group of atoms that the fix is applied to. This command performs time integration of the equations of motion at each timestep.
The positions and velocities generated with these equations are sampled from the NVT ensemble. The parametertempindicates that the temperature should be controlled using a Nose-Hoover thermostat (more on this thermostat in section 3.5.2). Tstart andTstop indicate the desired temperature at the start and end of the run, respectively. Tdamp is a damping parameter given in time units that determines how rapidly the temperature should be relaxed.
We use this command to ensure that the graphene system introduced in section 5.2.1 is at the desired temperature before it is deformed:
fix f x n v t all nvt temp 4.2 4.2 ( 1 . 0 * dt )
wheredt=0.001 ps is the duration of a single timestep andallindicates to apply the fix to all atoms in the system.
The fix npt command can be used to keep the temperature and pressure at desired levels. This is used to equilibrate the silica systems before deformation begins. Equilibration with NPT ensures that the system is stable in the sense that fluctuations in pressure and temperature are minimized [18]. One implementation of thefix nptcommand is
fix ID group - ID npt temp T s t a r t T s t o p T d a m p a n i s o P s t a r t P s t o p P d a m p
where theanisokeyword indicates that the box dimensions are controlled inde- pendently of one another, using the corresponding pressure tensor components of each dimension. Pstart, Pstop and Pdamp are the pressure analogues to the previously-mentioned temperature parameters. Specifically, we want to keep the system at 300 K and the pressure at 1 Pa:
fix f x n p t all npt temp 300 300 $ ( 1 0 0 . 0 * dt ) a n i s o 1.0 1.0
$ ( 1 0 0 0 . 0 * dt )
During the deformation phase of the system,fix nve[20] is used. By integrating the equations of motion according to this scheme, the system is prevented from expanding or contracting on its own since the volume is held constant. The fix is applied to the system with
fix f x n v e all nve
wherefxnveis the label andallindicates to apply the fix to all atoms. There are no additional arguments to this command.
3.2 Simulating stress and strain
In LAMMPS, a system can be strained by using thefix deform-command [15]. This fix changes the volume and/or the shape of the system over time, allowing us to strain the system and consequently induce stresses in the material. The syntax for this command is
fix ID group - ID d e f o r m N p a r a m e t e r args ...
IDis the fix label and group-ID is the group of atoms that the fix is applied to.
Deformation is performed every N timesteps along the dimension associated with parameter, which can be one of the following: x, y, z, xy, xz, yz. Tensile strain is simulated when using x, y, or z, while xy, xz and yz represent shear deformation. Additional arguments such as deformation rate is represented by the general argumentargs.
For this study we are interested in deformation along the y-axis. The material should be deformed at a constant rate which can be achieved by using thetrate deformation style. This represents a "constant true strain rate" and the value is in units of 1/time. Setting a strain rate of 0.002 1/ps, we can apply this deformation at every timestep (N=1) on all atoms with
fix f i x d e f o r m all d e f o r m 1 y t r a t e 0 . 0 0 2
Obtaining strain and stress values in the simulation: Using thetratedeforma- tion style means that the box lengthLyin the y-dimension changes non-linearly over time,
Ly(t) =Ly(0)erate·t, (3.8) whererate is the strain rate and t is the time passed since deformation started.
The straineyat timetis computed with ey(t) = Ly(t)
Ly(0)−1 , (3.9)
i. e. strain starts at 0 and increases over time. A strain of 1 indicates that the system has grown in size by a factor of 2 in the strain direction. Note that in practice, ey and Ly are not actually computed as functions of time, instead the length Ly is recorded at every timestep by LAMMPS so that it can be used to computeey.
The stress σy along the y-dimension is obtained by flipping the sign of the pressure tensor (which LAMMPS automatically stores as a variable),
σy =−Pyy. (3.10)
Here, the vector notation indicates thatσy contains the stresses for all recorded timesteps in the simulation.
The yield strength of the simulated material is obtained by finding the highest recorded stress. At this point, the material has deformed irreversibly [12]. Figure 3.2 shows how the stress changes as a function of strain in a system undergoing tensile deformation, and we see that the yield strength of this material is around 3.2 GPa. Before the stress is used in machine learning later, the stress data is smoothed using Savgol smoothing as detailed below.
3.2.1 Savgol smoothing of stress data
The output stress from LAMMPS fluctuates over short time scales, making it difficult to determine exactly what the yield stress is. Simply taking the highest stress value recorded can give a misleading value for the yield stress. One way to avoid this issue is to apply a smoothing filter to the data, in an attempt to filter out the noise and approximate the true value of the function in question. In this case, we choose to use a Savitzky-Golay (Savgol) filter.
According to the authors of this method, Savitzky and Golay [21], the Savgol filter attempts to smooth out the noise fluctuations by first dividing the input data into multiple subsets, then using the least squares method to fit a polynomial curve to each subset of data. In practice, this filter is applied on our data using thesavgol_filter function from the Python packageSciPy [22]. Given raw stress data obtained from a simulation, we smoothen the data with 501 data points per fit and a polynomial degree of 3. These parameters were chosen by visually confirming that the smoothened data followed the trend of the raw data. In Python code, the smoothening is performed with
from s c i p y . s i g n a l i m p o r t s a v g o l _ f i l t e r
s t r e s s _ s m o o t h = s a v g o l _ f i l t e r ( stress , 501 , 3)
0.00 0.02 0.04 0.06 0.08
y
2 1 0 1 2 3
y [G Pa ]
Stress and strain during simulated tensile deformation
Data
Figure 3.2: Example of a stress-strain curve of an atomic system during tensile deformation.
Here,stressis the raw stress data obtained from a simulation and stress_smooth is the smoothened data. Figure 3.3 demonstrates the result of this procedure.
Figure 3.4 shows multiple stress-strain curves with accompanying yield stresses, using smoothened stress data.
3.3 MD precision
The MD precision for the yield stressσYis computed as in Hanakata et al. [8],
ησ = s
∑ni=−01(σiY−σ¯Y)
n , (3.11)
where ¯σYis the mean yield stress of thensamples. We calculate the MD precision for the yield stress in alpha quartz (see chapter 5.2.2 for more specifications on this system) since this is the material that this study is most focused on. The samples used in calculatingησ are 100 identical systems where no atoms have been removed. After running strain simulations on the systems and obtaining the 100 yield stresses, the precision is calculated to beησ=0.11 GPa.
0.063 0.064 0.065 0.066 0.067 0.068 0.069
y
3.35 3.36 3.37 3.38 3.39 3.40 3.41 3.42 3.43
y
[G Pa ]
Raw vs. smoothened stress curves
Raw data
Smoothened data
Figure 3.3: Raw stress data and smoothened stress plotted against strain. A Savitzky- Golay filter with window size of 501 and polynomial degree 3 was used for smoothening.
The plot is zoomed in to show the behaviour around the yield point. See Figure 3.2 for the zoomed-out version with only raw data plotted.
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07
y
0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0
y
[G Pa ]
Stress-strain curves for alpha quartz systems
Yy
= 3.14 GPa
Yy
= 2.854 GPa
Yy
= 1.722 GPa
Yy
= 1.299 GPa
Figure 3.4: Stress vs. strain for 4 different alpha quartz systems. Yield stresses are indicated by dots. The stress values have been smoothened using Savgol smoothing.
3.4 MD potentials
In a molecular dynamics simulation, atomic interactions are modelled using interatomic potentials [11]. These potentials approximate the potential energy
of an atom as some function of other atoms in the system. There are no potentials that are suitable for all kinds of materials and physics problems. Instead, many types of potentials with different purposes have been developed, some of which are explained in this chapter.
3.4.1 AIREBO
The Adaptive Intermolecular Reactive Empirical Bond Order (AIREBO) potential presented by Stuart et al. [23] is a potential for modelling intermolecular interactions in graphene systems. This potential "... has been designed for molecular systems such as liquid hydrocarbons and thin films" (Steven J. Stuart, 2000, p. 10), thus making it suitable for modelling thin graphene sheets. AIREBO is based on the Reactive Empirical Bond-Order (REBO) potential by Brenner et al.
[24]. By including intermolecular interactions in addition to the intramolecular interactions in the REBO potential, Stuart et al. model the total energy in the system with the following terms:
E= EREBO+ELJ+Etors. (3.12) The three energy contributions are (in order of appearance) intramolecular interactions as modelled by the REBO potential [24], repulsive and attractive interactions provided by the Lennard-Jones potential, and finally the energy contribution from torsional interactions.
3.4.2 Vashishta
A suitable and well-established potential for MD simulations on SiO2systems is the Vashishta potential. As presented in Vashishta et al. [25], the potential consists of two-body and three-body interactions:
V =
∑
i<j
V2(rij) +
∑
i<j<k
V3(rij,rjk,rik), (3.13) whererij is the distance between two particlesiandjfori,jin[0 . . .N−1]. The two-body termV2determines the effect of steric repulsion, Coulomb interactions and charge-dipole interaction:
V2 = Hij
rηij + ZiZj
r −
1
2(αiZ2j +αjZ2i)
r4 e−r/r4s . (3.14) Hij andηij determine the strength and exponents of the steric repulsion between particles iand j, separated by a distance r. The terms Zi andαi determine the effective charge and electronic polarizability of theith particle. The decay length r4sis a constant that scales the magnitude of the charge-dipole interaction.
The three-body term is expressed as
V3= Bjikf(rij,rik)p(θjikθ¯jik). (3.15) Here,Bjik is a scalar that represents the strength of the interaction and f and p are functions that determine the effects of bond stretching and bond bending,
respectively. θjikis the angle subtended by rji andrki at vertexi. If this angle is equal to ¯θjik, the three-body term becomes 0.
The values forZandαdepend on atom type. The steric repulsion termsηand Hdepend on which pair of atoms we are looking at (Si-Si, Si-O or O-O). Lastly,B and ¯θ depend on the three-body configuration, Si-O-Si or O-Si-O. See Table II in Vashishta et al. [25] for the actual values.
Vashishta 1994: This is a version of the Vashishta potential developed using simulations of amorphous silica [26]. It is thus used when we are working with systems of amorphous silica.
Vashishta 1997: Another Vashishta potential, but developed for use with crystalline silica [27].
3.5 Thermostats
The potentials that govern the interatomic interactions are conservative, meaning that the total energy in the system is conserved. A problem with this is that the temperature will not necessarily be at the level we want it to, which is essential if we wish to study a material at a given temperature. A way to control the temperature is to modify the equations of motion so that the kinetic energy of the particles changes, thereby changing the system temperature. These modifications are known as thermostats.
3.5.1 Langevin
One method for keeping the system at a desired temperature is by using Langevin dynamics. According to Ch. 14.4 inMolecular Modeling and Simulation by Schlick [18], Langevin dynamics is a stochastic alternative to Newtonian dynamics that introduces frictional forces and random forces, with the physical motivation of representing a heat bath by accounting for molecular collisions.
Further, the Langevin equation for the forceFat a timeton a single particle with massmin 1D is given by:
F(t) =−∇V(x(t))−γmv(t) +R(t), (3.16) where∇Vis the gradient of the interaction potential dependent on the position x, γ is the damping factor, v is the velocity andR is the value of the random- force. The damping factorγis in units 1/time and determines the strength of the frictional force. The random-forceRhas the following statistical properties:
hR(t)i=0 and hR(t)R(t0)Ti=2γkBTmδ(t−t0), (3.17) where hxi represents the expected value of x, kB is Boltzmann’s constant, T is the target temperature andδ is the Dirac delta function. Langevin dynamics are implemented in LAMMPS and can be invoked using thefix langevincommand [28]. Here, the total forceFon each atom is on the form
F= Fc+Ff +Fr, (3.18)
whereFcis the force arising from a given potential and Ff =− m
dampv (3.19)
Fr ∝ s
kBTm
dtdamp . (3.20)
The damp parameter is in time units and is equivalent to 1/γ. Naturally, the timestepdt is also given in time units. The value of dampdetermines how fast the temperature is relaxed to the given target temperature. A value of 100 means that the temperature is relaxed to the target temperature in the span of roughly 100 time units. In the case of usingmetalunits in LAMMPS, this equates to 100 ps.
Implementation in LAMMPS: The syntax for thefix langevincommand is fix ID group - ID l a n g e v i n T s t a r t T s t o p damp seed ...
ID is the fix label and group-ID is the group of atoms that the fix is applied to. Tstart and Tstop are the desired temperatures (normally given in Kelvins) at the start and end of the run. The parameterdamp determines the damping factor. Setting aseedensures that we get the same values forR(t)if we perform two identical simulations with the same seed, i.e. the results are reproducible provided that other
In most of our simulations we want to relax the temperature at 300K before any other forces are applied. With a damping factor of 100 (the LAMMPS parameter damp, not γ) and an arbitrarily chosen seed of 135, the Langevin thermostat can be applied in a LAMMPS script with:
fix f x l a n g all l a n g e v i n 300 300 100 135
By specifying the group-ID asallthe fix is applied to all atoms in the system.
3.5.2 Nose-Hoover
The Nose-Hoover thermostat is applied when using the fix nvt or fix npt command (see section 3.1.2) in LAMMPS. This thermostat functions by adding an extra term to the total energy of the system, representing a heat bath coupled to the system. This can be seen as adding a friction term to the force exerted on the atoms. This friction term is dynamically changed in order to move the system towards the desired temperature [29].
Chapter 4
Machine learning
Machine learning is an area of computational science with a multitude of applications. According to Understanding Machine Learning: From Theory to Algorithms[9], "machine learning refers to the automated detection of meaningful patterns in data" (Shalev-Shwartz and Ben-David, 2014, Preface-xvi). The use of machine learning for solving classification and prediction problems has seen great success in the past years, particularly through the use of neural networks [30, 31].
In this chapter, the theory behind neural networks and our implementation of them is outlined. The theory and methods in sections 4.1 through 4.5.1 are based on the bookNeural Networks and Deep Learningby Nielsen [32]. The section of this thesis that deals with the theory behind convolutional neural networks is heavily based on the syllabus from the Stanford University courseCS231n: Convolutional Neural Networks for Visual Recognition[33].
4.1 Neural networks
A neural network (NN) is a statistical model composed of multiple artificial neurons, sometimes also called nodes. Neurons are mathematical functions that take an input and produce an output according to its own parameters. These neurons are connected to each other so that their input can be the output from other neurons, with the output from the final set of neurons being the neural network’s output. The idea is to pass many samples of a chosen dataset through the network and allow it to tune the parameters of each neuron in order to fit the data better. After the network has learned from enough samples, the parameters are hopefully tuned in such a way that the model outputs are close to the true values.
In this chapter, the hat symbol ˆ is used to indicate predictions. The output value of a model with an input vectorxis written as ˆywhile the true value isy.
4.1.1 Feedforward neural networks
A regular feedforward neural network typically consists of an input layer, one or more hidden layers and an output layer. Feedforward refers to the process of propagating the input through the neural network until it reaches the output
layer. In the input layer, the neurons use the input variables for producing a set of output values. These values are then sent to the first hidden layer, and the neurons in this layer produce another set of outputs based on that input.
This continues until the final layer, the output layer, produces the model’s output based on the input from the last hidden layer. An illustration of a feedforward neural network is shown in Figure 4.1.
For the network to make useful or accurate predictions, the network must first be trained by allowing it to learn from patterns in a given dataset. This is done by letting the network tweak its own parameters to better fit the data. This process is described in detail later.
4.2 Forward pass and activation functions
As mentioned earlier, all the neurons in a neural network reside in layers. In the input layer, each neuron is connected to every neuron in the first hidden layer.
This hidden layer is connected to hidden layer no. 2 in the same way, which is connected to hidden layer no. 3 and so on, with the final hidden layer connected to the output layer. In this context, a connection implies that the output from a neuron is passed as an input to the other neuron. For each connection, there is an associatedweight, and each neuron has an associatedbiasterm.
Used together, these are used to compute theactivation zli of neuroniat layer numberl:
zil =bli+
n−1
∑
j=0wijlalj−1, l≥1 . (4.1) Here, bi is the bias term, wij is the weight term that connects neuron j in the previous layer with neuroniin the current layer andais the output vector from the previous layer. The superscripts indicate which layer the quantities belong to (the weights are labelled according to which layer they connectto). The layers are labelled consecutively from 0 toL−1 whereLis the total number of layers, so that the input layer is labelled 0 and the output layer is labelledL−1.
Once the neuron activation has been computed, a nonlinear activation function f is applied. This enables the model to find nonlinear dependencies on the output with respect to the input. Examples of activation functions include the sigmoid function, the hyperbolic tangent and the rectified linear unit (ReLU).
The latter is used in our neural network implementation. See Table 4.1 for the definitions of these functions.
Applying the activation function gives us the outputali of neuroniin layerl:
ali = f(zli). (4.2)
By repeating this calculation for every neuron in layer l we obtain the output vector from that layer. This output vector then becomes the input vector to the next layer, and the procedure is repeated for every hidden layer and finally the output layer. The output from the neural network is the output from the output layer, ˆy=aL−1. This constitutes a single forward pass through the network.
Example: Feedforward neural network for regression. Consider a feedfor- ward neural network designed for regression with an input layer, two hidden
Figure 4.1:Example of a feedforward neural network with two hidden layers. Generated using [34].
layers and an output layer. The purpose of this model is to predict a person’s monthly income based on 10 different inputs. This includes factors such as the person’s age, level of income and housing situation. Exactly what the other in- puts are is not important, this example is just meant to illustrate a possible use for a neural network and how it predicts an outcome based on some inputs.
The input layer has n0 = 10 inputs, the first and second hidden layers have n1 =8 andn2 = 4 neurons respectively, and the output layer hasn3 = 1 output.
See Figure 4.1 for a visual representation of this architecture. The hyperbolic tangent f(x) = tanh(x)is used as the activation function for the hidden layers.
In this example we expect thaty ≥ 0, i. e. the person’s income is 0 or higher.
This means that we cannot use a bounded activation function such as tanh(x)for the output layer, as the network would be unable to predict the values that we expect.
In order to compute the activation zi for the first neuron (i = 0) in the first hidden layer (l=1) with 10 neurons in the input layer, we have that
z10 =b01+
∑
9 j=0w10ja0j = b10+
∑
9 j=0w10jxj. (4.3)
When computing the activations in the first hidden layer,a0 is just the input to the model itself,x. The output of the neuron is calculated using the activation function,
a10=tanh z10
. (4.4)
After the output from every neuron i = (0, 1, . . . , 7) has been calculated, the activations for the second hidden layer can be computed using the same procedure with a different set of weights and biases. The input values become the outputs from the first hidden layer,a1, so for every neuroni = (0, 1, 2, 3)in the second layer we have:
a2i =tanh bi2+
∑
7 j=0w2ija1j
!
. (4.5)
Finally, the output (a single, discrete value in this example) from the network is calculated without using an activation function:
ˆ
y=b30+
∑
3 j=0w30ja2j . (4.6)
If this model is trained properly, it should be able to accurately predict a person’s incomeygiven that there is a correlation between the inputs (age, education, etc.) and their income. The next sections deal with cost functions, backpropagation and optimizers, which are all used for training a neural network.
Name Function Range
Sigmoid 1+exp1(−x) [0, 1] Hyperbolic tangent tanh(x) [−1, 1]
ReLU max(0,x) [0, x]
Table 4.1:Definitions of a few activation functions and the range of values that they can output.
4.3 Cost and loss functions
The terms cost and loss are sometimes used interchangeably. In this chapter, loss quantifies how good a single prediction from a model is. The cost will be used to refer to an average of losses. We distinguish these two so that we can be more concise later in this chapter. In the sections of this thesis that deal with results from training machine learning models, the loss is equal to the cost of the samples that the network is tested on.
Suppose we have a dataset of size Nwith labelsi = (0, 1, . . . ,N−1), where each target yi is associated with a set of predictors xi. An arbitrary model can make a prediction ˆyi given inputs xi. With a loss function we can quantify how good this prediction is compared to the true value yi. The square loss is commonly used as the loss function for regression problems [9] and is defined as Lsq(y,ˆ y) = (yˆ−y)2 . (4.7) where ˆy is the value predicted by the model and y is the true value. With this loss function, we can compute the cost function by summing over all available samples:
C(w,b) = 1 N
N−1 i
∑
=0(yˆi−yi)2 . (4.8)
4.4 Backpropagation
For the neural network to make accurate predictions based on input data, the weights and biases associated with every neuron must be adjusted so that the cost function is minimized. One way of doing this involves calculating the gradient of the loss with respec to the weights and biases of the network and using this to tweak the parameters. Backpropagation is an algorithm that calculates this gradient. The theory and derivations of the various equations in this section is from the book "Neural Networks and Deep Learning" by Michael A. Nielsen [32].
There are some slight differences in notation and the quadratic/square loss in the book has a factor 1/2 added.
We start by looking at the loss gradient with respect to the weights.Assume that we have a neural network that takes an input vectorxand predicts a set of values ˆy. A loss function L computes the loss using ˆy and the true values y.
The derivative of the loss with respect to a single weightwlijcannot be computed directly, so we rewrite the expression in terms of variables and functions with known derivatives. By multiplying with∂zli/∂zilwe get
∂L
∂wlij = ∂L
∂zli
∂zil
∂wlij . (4.9)
The second term on the RHS of the equation can be rewritten by deriving Eq.
(4.1) with respect to the weightswij. The activation for the neuron is zli =bil+
n−1 j
∑
=0wlij−1alj−1, (4.10) and we see that the derivative is just the output from neuron jin the previous layer,
∂zli
∂wij = alj−1. (4.11)
The first term on the RHS is the derivative of the loss function with respect to the activation. This will be defined as the error of neuroniin layerl:
δli = ∂L
δzli , (4.12)
so that
∂L
∂wlij =δilalj−1. (4.13) Instead of writing the loss in terms of each componentiandj, we can write it as a gradient(∇wL)l that represents the error for the entire layer with respect to the set of weights in layerl:
(∇wL)l =δlal−1. (4.14) The errorδl associated with each layer must be first be computed for the output layer, l = L−1. This is necessary because we are propagating the error in the output of the modelbackwardsthrough the network.
Output layer error: We find an expression for the output error δiL−1 by multiplying the RHS with∂aLj−1/∂aLj−1, and by using thatali = f(zli)for a given activation function f:
δiL−1= ∂L
∂aLi−1
∂aiL−1
∂ziL−1
= ∂L
∂aLi−1f0(ziL−1),
(4.15)
where f0 indicates the first derivative of f. The error is dependent on the choice of loss function as well as activation function. If we use the square error as loss function,L= Lsq, and use thatyˆ = aL−1, we get
δiL−1 = ∂
∂aiL−1 h
(aLi−1−yi)2if0(zLi−1)
=2
aiL−1−yi
f0(zLi−1).
(4.16)
This quantity is easily computed after performing a forward pass since this gives the outputaLl−1from the model. The term f0(zLi−1)will depend on the choice of activation function. For regression, the activation function for the output layer is typically just the activation itself, f(zL−1) =zL−1, so f0(zL−1) =1.
Computing the error in previous layers: In order to calculate the hidden layer errors, we cannot use the same procedure as above. If we try to use Eq.
(4.15) with layerL−2 instead, we run into a problem with the first term on the RHS because we do not know how the lossLdepends onaL−2. To work around this problem, we write the error on a different form:
δil = ∂L
∂zli = ∂L
∂zlj+1
∂zlj+1
∂zli , 1≤ l<L−1 , (4.17) The first term on the RHS is equivalent toδlj+1, the error in the next layer. We write out the last term by first expressingzlj+1in terms ofzli:
zlj+1= blj+1+
∑
i
wlji+1f(zli), (4.18) so the derivative becomes
∂zlj+1
∂zli =wlji+1f0(zli). (4.19) Using this, the error in layerlfor neuronican be written as
δli =δlj+1wlji+1f0(zli). (4.20) Now, we do the same for the loss gradient with respect to the bias.Analogous to Eq. (4.9), we have for the bias
∂L
∂bli = ∂L
∂zli
∂zli
∂bil . (4.21)