Single-molecule investigation of DNA scanning by DNA base repair proteins

(1)

Arash Ahmadi

Thesis for the degree of Philosophiae Doctor (PhD)

Department of Medical Biochemistry Institute of Clinical Medicine

University of Oslo

2018

(2)

Series of dissertations submitted to the Faculty of Medicine, University of Oslo

ISBN 978-82-8377-

reproduced or transmitted, in any form or by any means, without permission.

Cover: Hanne Baadsgaard Utigard.

Print production: Reprosentralen, University of Oslo.

(3)

Acknowledgements

The work presented in this thesis was carried out during the period 2015-2018, at the Department of Medical Biochemistry, Institute of Clinical Medicine, University of Oslo.

The financial support was provided by the University of Oslo (MLSUiO program), South-East Norway Regional Health Authorities, the Research Council of Norway (FRIMEDBIO) and International program DAAD.

I would like to express my sincere gratitude to my supervisors, Dr. Bjørn Dalhus, for his patience, immense knowledge and open-minded attitude in guiding this work, as well as Dr. Alexander Rowe for creative and critical thinking and constructive feedbacks that were keys to our progress. I would like to thank my co-supervisors, Dr. Kyrre Glette and Professor Jim Tørresen at the Department of Informatics for all valuable discussions and suggestions as well as access to their facilities. I would also like to express my deep appreciation to Professor Magnar Bjørås for providing us with insightful vision and generously supporting us in different stages of this work.

I am indebted to all people at the Department of Medical Biochemistry for accepting me as a non-biologist member in the lab and constantly teaching me things I didn't know.

Special thanks to Dr. Paul Hoff Backe and Pernille Blicher for all fruitful discussions and the help with protein purifications. I want to thank all people at the Department of Informatics and particularly Yngve Hafting for helping me with additive manufacturing utilized in this work.

I would also like to extend my thanks to Dr. Mark Schüttpelzfor great collaboration, constructive suggestions, and giving me access to their optics lab at the Department of Physics, University of Bielefeld, Germany. Thanks to Robin Diekmann and Katharina Till for their critical cooperation in the lab in Bielefeld.

I am profoundly grateful to my dear family and friends for all support I have received in these years as well as endless love and meaning they give to my life.

Oslo, November 2018 Arash Ahmadi

(4)

(5)

Abbreviation

4f-T 4f-telescope

8-oxoG 8-oxoguanin

AlkD Alkylpurine glycosylase D AlkF AlkD-family structural homolog AOTF Acousto-optical tunable filter AP-site Apurinic/Apyrimidinic site APTES (3-Aminopropyl)triethoxysilane ATP Adenosine triphosphate AVP Adenovirus proteinase

BamHI Type II restriction endonuclease BER Base excision repair

bp Base pair(s)

BP Band-pass filter

d3mA N3-methyldeoxyadenosine d3yA N3-yatakemycinyldeoxyadenosine d7mG N7-methyldeoxyguanosine

DCM Dichroic mirror

DDR DNA Damage Responses

E. coli Escherichia coli

EcoP15I Type III restriction endonuclease EcoRV Type II restriction endonuclease

EMCCD Electron multiplying charge-coupled device

EndoV Endonuclease V

Fpg Formamidopyrimidine DNA glycosylase

FT Focusing telescope

HILO Highly inclined and laminated optical sheet

HLR HEAT-like repeat

HMM Hidden Markov model

hOGG1 Human 8-oxoguanine DNA glycosylase 1

Ile Isoleucine

kbp Kilobase pairs

LFC Laminar flow cell

Mlh1–Pms1 MutL1-Pms1 protein complex in mismatch repair

MMR Mismatch repair

MSD Mean squared displacement Msh2 Eukaryotic MutS homolog 2 Msh6 Eukaryotic MutS homolog 6 MutL Mismatch repair protein

MutM Formamidopyrimidine-DNA glycosylase MutS Bacterial mismatch repair protein

MutSα Msh2-Msh6 complex

MutY Adenine DNA glycosylase

NA Numerical aperture

ND Neutral density filter

Nei Endonuclease VIII

NER Nucleotide excision repair

(9)

Nth Endonuclease III

PCNA Proliferating cell nuclear antigen PCR Polymerase chain reaction PDMS Polydimethylsiloxane PEG Polyethylene glycol

PEG-NHS N-hydroxylsuccinimide functionalized polyethylene glycol PMF polarization-maintaining fiber

PMMA Poly(methyl methacrylate)

PolE Polymerase E

Pro Proline

pVIC Viral peptide

PYIP Pro79, Tyr80, Ile81 and Pro82

Q-dot Quantum dot

RNS Reactive nitrogen species ROS Reactive oxygen species SME Single-molecule experiment SP Short-pass filter.

TALE Transcription activator-like effector TIR Total internal reflection

TM Translatable mirror Tma Thermotoga maritima

Tyr Tyrosine

UL42 Herpes simplex virus DNA polymerase Ung Uracil-DNA glycosylase

vbSPT Variational Bayes single-particle tracking wm-EndoV Wedge-deficient mutant Endonuclease V wt-EndoV Wild-type Endonuclease V

XPC Xeroderma pigmentosum, complementation group C

(10)

(11)

Summary

DNA is under constant threat from environmental factors such as chemicals and ionizing radiation, as well as endogenous reactive compounds produced as by-products of respiration, inflammation and infection. Exposure to these threats leads to chemical alterations of the nucleotide building blocks or discontinuity in the DNA strands caused by single- or double-strand breaks, which are generally referred to as DNA damage. If left unrepaired, DNA damage can lead to mutation(s), the accumulation of which may cause genomic instability which in turn is a primary cause of both cancer and aging. To combat the destructive effects of DNA damage, there are several DNA damage responses, among which DNA repair is central. DNA is repaired via the activity of several classes of DNA repair proteins which scan the DNA, detecting and repairing the lesions. Despite decades of intensive research and great progresses in identifying the structures and biochemical functions of DNA repair proteins, a detailed understanding of the molecular mechanisms by which they search for and recognize errors in DNA remains elusive. The most informative approach to study DNA scanning mechanism(s) is an experimental assay to observe individual proteins as they interact with the DNA, widely known as single-molecule experiments.

In this study, using a single-molecule approach, we characterize the scanning mechanisms of several DNA repair proteins including EndoV, hOGG1, AlkD and AlkF.

In our single-molecule assay a 12-kbp fragment of O-DNA is elongated by attaching it to a microscope coverslip at one end and to a polystyrene bead held in an optical trap at the other end. The elongated DNA is exposed to fluorescently labelled proteins and their interaction with DNA is recorded as trajectories of movement of proteins along the DNA, with images taken at rates up to 130 Hz. From the analysis of these trajectories we find that during DNA scanning, EndoV switches between three different scanning modes which can be generally classified as helical sliding, hopping and base interrogation, each with a distinct range of activation energy barriers. In the hopping mode EndoV scans very close to or even exceeds the upper limit of diffusion for helical sliding, while in the interrogation mode proteins apparently pause locally on the DNA or the movement is below the spatial resolution limit of our instrumentation. This makes EndoV the first example of a monomeric, single-conformation and single-binding-site

(12)

protein demonstrating the ability to switch between three scanning modes. By comparing the scanning properties of wild-type EndoV and a wedge-deficient mutant EndoV, we show that the highly conserved wedge motif in the structure of this protein plays a central role in switching DNA scanning into base interrogation mode.

DNA-associated crystal structures of DNA glycosylases, such as hOGG1, show that for base recognition, the DNA is substantially bent and the damaged bases are flipped into damage-specific recognition pockets. In our single-molecule experiments this is observed as periods of stalled movement of proteins scanning along the DNA, which introduces bi- or multi-modality in the distribution of the instantaneous diffusion rate of scanning. On the other hand, the Heat-Like Repeat (HLR) protein AlkD is known as the first example of a DNA glycosylase whose activity does not depend on DNA bending or base-flipping. We show that DNA scanning by AlkD resembles a single-mode random walk in contrast to the multi-mode scanning utilized by the other DNA repair proteins in this study. This result resonates well with the lack of base flipping in AlkD. In addition, the scanning behaviour of the HLR protein AlkF - which is a structural but not a functional homologue of AlkD - is characterized, and this protein displays bimodality in DNA scanning. We show that this bimodality is due to the ability of AlkF to adopt a hopping mode, likely facilitated by a positively charged β-loop, in contrast to AlkD which lacks this structural element.

To enable delivery of different substances to the point of observation, microfluidic laminar flow cells are crucial to the performance of single-molecule experiments.

Despite substantial progress in the production of such components, the process remains relatively inefficient, inaccurate and time-consuming for most experimentalists. To overcome these challenges and limitations we introduce a new generation of laminar flow cells designed for single-molecule experiments and assembled using additive manufacturing, i.e. 3D-printing. We show compatibility with single-molecule microscopy through examples of biological experiments including the isolation and manipulation of DNA, and visualization of protein-DNA interactions, using these 3D- printed laminar flow cells. We further developed the design to produce and characterize multi-channel and reservoir-based laminar flow cells that are designed to perform more advanced experiments, including multi-component protein-DNA interactions.

(13)

1 Introduction

Errors are inevitable aspects of processes in complex systems such as living entities;

this can affect DNA, the molecule that carries genetic information from generation to generation. Errors in the structure of DNA are referred to as DNA damage or DNA lesion and their frequency of occurrence is surprisingly high, in some cases exceeding tens of thousands of instances per cell per day^1,2. Counteracting these errors is vital for survival and preservation of the living entity. This defense is highly dependent upon the performance of intricate networks of enzymes that detect these errors and perform repair if needed. Failure in the performance of any part of such a network poses a considerable danger to the genomic stability. Unrepaired lesions in the structure of DNA can lead to mutations in the next round of replication, putting the organism at increased risk of developing cancer, premature aging or/and neurodegenerative diseases^1,3–7. Despite being subject to extensive research, several important aspects of DNA repair mechanisms are yet to be explored and understood. These processes can be studied as chemical reactions from a bulk scale perspective, in which two or several components of a reaction come together, react and after a period of time the product is formed. With this viewpoint, a series of appealing questions arise; how do proteins look for and recognize the errors in DNA? What are the mechanisms of DNA damage detection?

How does the physical and/or chemical environment affect damage search and recognition? Can failures in the search and recognition be observed and analyzed at the molecular level? The response to any of these questions opens a wide horizon of phenomena to explore at the single-molecule level. Thanks to recent advances in microscopy, observation of single-molecule interactions is more and more accessible.

The current study is a step in this direction.

1.1 DNA damage

In a very general form DNA damage is defined as “a chemical addition or disruption to a base of DNA (creating an abnormal nucleotide or nucleotide fragment) or a break in one or both strands of the DNA duplex”². In the process of cellular proliferation, DNA is replicated to transmit the genetic code to successive cell generations. As illustrated in Figure 1.1, unrepaired DNA damage in cells can cause errors in replication, which can lead to mutation in the replicated DNA. Further, the proliferation of human cells

(14)

carrying mutated DNA increases the risk of cancer^1,2. In addition to transferring genetic information through the replication process, DNA is also encoding different forms of RNAs with key roles in cellular processes as well as being templates for synthesis of proteins that are crucial for life in cells. DNA damage may introduce errors in transcription which may lead to altered gene expression, senescence at the cellular level or cell death, which in turn contribute to premature aging of the related organism^1,4,8.

Figure 1.1 | DNA Damage. Illustration of some typical DNA damage types and the sources inducing them. The lower part of the image is a simple depiction of how DNA damage can lead to cancer and aging in humans. Reproduced with permission from Hoeijmakers, J. H. J. DNA damage, aging, and cancer. N. Engl. J. Med. 361, 1475–85 (2009), Copyright Massachusetts Medical Society.

DNA damage can happen both due to natural cellular processes or external factors, generally referred to as endogenous and exogenous DNA damage, respectively.

Endogenous damage caused by spontaneous hydrolytic reactions such as depurinations, depyrimidinations as well as deamination of adenine, guanine and cytosine can occur merely due to reaction with water^9,10. For example hydrolytic depurination happens around 2000-10000 times per cell per day, forming apurinic sites (AP sites). If left unrepaired these may cause miscoding during the next round of replication⁹. Endogenous oxidative lesions are the result of exposure of DNA to different reactive compounds, including reactive oxygen species (ROS), reactive nitrogen species (RNS)

(15)

etc. These compounds can be released during normal metabolic activities, inflammatory responses or infections¹¹. Oxidative damage to DNA happens around 10000 times per cell per day¹² in human cells, and some of them, such as 8-oxoguanin (8-oxoG) can alter base pairing and cause transverse mutation upon replication¹³. With a frequency in the order of tens of thousands of times per cell per day, single-strand breaks are the most common DNA damage in human cells¹⁴. Single-strand break can occur due to oxidative attack on the sugar backbone of DNA or as a secondary result of other types of oxidative damage¹⁵. Unrepaired single-strand breaks can block transcription and trigger cell death in non-proliferating cells, or lead to double-strand breaks in proliferating cells¹⁶. Although happening around 1000 times less than single-strand breaks, an endogenous double strand breaks are an even more severe form of DNA damage, and particularly hazardous since it can lead to chromosomal rearrangements¹⁷.

In addition to naturally occurring lesions, DNA damage can happen due to environmental or exogenous factors as well¹⁸. One of the main sources of this type of damage is ionizing radiation, including cosmic and solar radiation or radiation from medical imaging¹⁰. This radiation can include X-rays and the higher ultraviolet part of the electromagnetic spectrum that carry enough energy to excite atoms and liberate their electrons. Direct absorption of radiation energy by DNA can lead to ionization of the base or the sugar-phosphate backbone which may cause wide variety of base modifications, single- and double-strand breaks¹⁹. Furthermore, reaction of DNA with surrounding molecules ionized by the same radiation can also cause DNA damage²⁰. Exposure of DNA to toxic chemicals such as alkylating agents, polycyclic aromatic hydrocarbons, or aflatoxins is another main source of exogenous DNA damage¹⁰. 1.2 Biological responses to DNA damage

As mentioned in the previous section, a wide variety of DNA lesions can form a serious threat to the integrity of the genome, which can have adverse pathological consequences.

To combat these threats and prevent these destructive consequences, there exist a multitude of biological mechanisms known as DNA Damage Responses (DDR). DDR comprises several complex cellular mechanisms taking place either independently or in a coordinated way, complementing each other (Figure 1.2). DDR include the following

(16)

mechanisms: Cell cycle checkpoint activation, DNA repair, DNA damage tolerance and apoptosis^15,21.

Depending on the concentration and the type of damage of the DNA, cell cycle can be arrested at different cell cycle checkpoints. During cell cycle arrest, DNA repair pathways are activated, the relevant DNA repair proteins are relocated to the sites of DNA damage, and cell cycle progression is delayed until the repair process is complete²². The DNA repair process is described in more detail in section 1.3. The ability of cells to continue cell division without having the base repaired is called “DNA damage tolerance response” and is as biologically important as DNA repair¹⁵. There are two main mechanisms of DNA damage tolerance; translesion synthesis and template switching. Translesion synthesis is performed using specialized polymerases with the ability of incorporating a nucleotide opposite the damaged base without the requirement of proper Watson-Crick base pairing. In template switching the replication of the damaged base is avoided by strand exchange (recombination) between the affected and the unaffected newly synthesized daughter DNA duplex^21,23.

Figure 1.2 | DNA damage responses. A depiction of categories of DNA damage responses.

Each of these responses can be divided into several different sub-pathways. For DNA repair, the five major repair pathways are listed.

DNA Damage Responses

(DDR)

Cell cycle checkpoint

activation

DNA damage tolerance

Direct reversal of DNA damage

Excision of DNA damage

Strand break repair

Apoptosis DNA repair

Base excision repair (BER)

Nucleotide excision repair

(NER)

Mismatch repair (MMR) Cycle delay and

repair pathways activation

Translesion synthesis

Template switching

(17)

If none of the repair or damage tolerance processes are effective and the cell still carries a critical damage, the genomic stability is under threat. Therefore, as a last response to DNA damage, apoptosis can be activated. Apoptosis is a programed cell death response that is accomplished by removing the affected cells from the organism to prevent further replication of cells carrying critical damage¹⁰.

1.3 DNA repair

In the authoritative textbook DNA repair and mutagenesis, DNA repair is defined as

“cellular responses to DNA damage that result in the restoration of normal nucleotide sequence and DNA structure”¹⁰. These responses can be put into three general categories: (I) direct reversal of DNA damage, (II) excision of the damage and (III) strand break repair (Figure 1.2). Reversal of damage does not require excision of the base from the DNA and the damage is directly reversed in a single-step mechanism using specific enzymes, such as e.g. AlkB²⁴ and O⁶-methylguanine-DNA methyltransferase²⁵. The types of damage that can be repaired by this mechanism compared to other pathways of DNA repair are rather limited, however the process is efficient and essentially error free²⁶.

A large proportion of repair processes are centered on excision of the damage from DNA. There are several pathways through which this excision is performed. Here we will discuss the three main excision pathways: base excision repair (BER), nucleotide excision repair (NER), and mismatch repair (MMR), among which the BER and NER pathways are relevant to the proteins under investigation in this study. The first common step in both of these pathways is damage recognition. There are a number of DNA glycosylases that can recognize base lesions and thus initiate the BER pathway^27–30. Damage recognition is followed by removal of the damaged base, strand incision, end trimming, gap-filling and ligation to fully restore DNA^28,31–35. In NER, the damages are recognized by damage-sensing proteins such as XPC³⁶ or DDB1-DDB2 complexes³⁷. Subsequently, the process carries on with two strand incisions, removal of a larger segment containing the damage, gap filling, and ligation³⁸. Each step of these pathways is primarily performed by one or a group of monofunctional proteins, although in some cases bifunctional proteins are able to perform two steps. In MMR, the mismatched base pair is detected by the proteins MutS (in bacteria) or MutSD (in humans)^39,40. After

(18)

recognition, the process is followed by cleavage of the strand containing the mismatched base, removal of a longer single-stranded patch of the DNA including the mismatched base, and completion by gap filling and ligation^40–43.

Strand break repair is the third major DDR that falls within the definition of DNA repair responses. Single-strand breaks are generally repaired via a specialized sub-pathway of BER, using specialized proteins like PARP1 and XRCC1 which function as coordinating factors and scaffolds for other proteins, including BER enzymes involved in gap filling and ligation^14,16. Double-strand break repair is performed through several mechanisms including homologous recombination and non-homologous end joining. In homologous recombination the damaged DNA is repaired using an identical or a similar sequence of the intact DNA from the sister-chromatid or a homologous chromosome as a template^44,45. In contrast, non-homologous end joining occurs independent of a homologous DNA molecule. In this process, the breaks are ligated using single-stranded overhangs on the ends of double-strand breaks^44,46.

1.4 Theoretical basis of DNA scanning mechanisms

The DNA repair processes described above start with detection of the target site on DNA, which is commonly known as damage recognition. The mechanism by which this recognition occurs has been subject of intensive research^47–54. The general perception is that proteins bind DNA nonspecifically and search for target sites by scanning along the DNA. However, several questions about this mechanism remain to be answered; for example, do proteins follow the helical path or do they slide parallel to the axis of DNA?

Do proteins have the ability to circumvent obstacles or are they strictly bound to DNA while scanning? What is the role of specific structural elements of the proteins with respect to scanning behavior or the damage recognition process? In this section the theoretical basis of DNA scanning is laid out.

1.4.1 Facilitated diffusion

The first indirect experimental evidence for the idea that proteins bind DNA nonspecifically and scan along it, was collected in 1970. In an experiment, Riggs et al.

measured the rate constant of association of the lac repressor protein to its target site on DNA to be around 100 times faster than what is theoretically possible under the assumption that proteins move in 3D space with a diffusion-limited rate constant⁴⁷. This

(19)

suggests that for association to their target sites, proteins use a strategy different from random 3D diffusion only. The strategy is to first bind nonspecifically to the DNA, and subsequently perform random 1D scanning along DNA. This leads to an increase in the rate of target site association compared to free 3D diffusion by intermittently reducing the dimensions of the search from 3 to 1. After a decade of extensive research, a model was proposed for explaining the scanning strategy⁴⁸. In this model proteins initially undergoing free 3D diffusion in solution, bind to DNA at nonspecific sites. As shown in Figure 1.3, proteins may scan DNA using different strategies, including sliding, hopping, intersegment transfer and free 3D diffusion.

Figure 1.3 | Facilitated diffusion. A depiction of the model for different translocation strategies.

Adopted with permission from Berg, et al. Diffusion-driven mechanisms of protein translocation on nucleic acids. 1. Models and theory. Biochemistry 20, 6929–6948 (1981). Copyright (2018) American Chemical Society.

While sliding, proteins remain microscopically associated with the DNA and follow the helical path. The diffusion is thermally driven and the proteins are swapping their positions with counterions that are bound to the DNA. In hopping, proteins are microscopically dissociated for rather short periods of time, only to the extent that permits counterion recondensation. The proteins remain in the vicinity of the DNA, within one protein molecule diameter⁵⁵, with a high probability of reassociation to a nearby site. In this state, due to the lack of close contact with the DNA, the proteins

Free 3D diffusion

Intersegment

transfer Helical

sliding Hopping Protein

DNA

(20)

have higher mobility along the DNA. Intersegment transfer can take place if proteins are transiently contacting two regions of DNA that come into close contact with each other.

Once proteins fully detach from the DNA and undergo free 3D diffusion in the solution, a subsequent reassociation may occur, however in a completely uncorrelated site.

Despite some single-molecule studies showing that selected proteins choose either sliding or hopping for scanning^56–59, there have been several theoretical^53,60–62 and bulk experiments^63,64 studies pointing out the necessity of a combination of these translocation strategies for efficient scanning of DNA.

1.4.2 Diffusion theory of scanning

Apart from proteins that use ATP to move along DNA^65,66, the movement of the scanning proteins are mainly thermally driven. At the microscopic scale, particles in a liquid or gas are constantly moving and colliding with each other. This type of movement is named Brownian motion, after Robert Brown who observed this phenomenon for the first time in 1827. Half a century later, this movement was characterized independently by Albert Einstein (in 1905)⁶⁷ and Marian Smoluchowski (in 1906)⁶⁸. According to their models, the diffusion constant can be calculated from the hydrodynamic characteristics of the particle and its surrounding medium. This is expressed in equation (1), generally known as the Einstein-Smoluchowski relation⁶⁹:

ܦ ൌ ݇_஻ܶ

݂

(1)

where ݇_஻ is the Boltzmann constant, ܶ is the absolute temperature and ݂ is the frictional drag coefficient, defined as a ratio of drag force to drag velocity (ܨ ݒൗ ). From Stokes law, the drag force (2) is proportional to the radius of the particle and its velocity relative to the surrounding fluid,

ܨ ൌ ͸ߨߟݒܴ (2)

where ܨ is the force needed to drag a spherical particle with the radius ܴ and velocity of ݒ in a fluid with a viscosity of ߟ. From equation (2) we can obtain the frictional drag coefficient as ݂ ൌ ܨȀݒ ൌ ͸ߨߟܴ. By inserting this expression in equation (1) we get the following equation, known as the Stokes-Einstein equation⁷⁰:

(21)

ܦ ൌ ݇_஻ܶ

͸ߨߟܴ

(3)

If proteins were to diffuse in one dimension (1D) only, parallel to the axis of a linearized DNA molecule, with no friction from protein-DNA surface contacts, the diffusion coefficient of this movement could be calculated from equation (3). This equation is correct only if proteins translocate parallel to the axis of the DNA (along the black line in Figure 1.4) without following the helical path. Therefore, if proteins follow the helical path the effect of this rotation should be included in equation (3).

Figure 1.4 | A model for helical sliding. Proteins are approximated as spheres with a diameter of R. On the right side, the protein envelop the DNA and the proteins’ center of mass is aligned with the axis of the DNA (Schurr’s model). On the left side, the protein only partly surrounds the DNA and the center of mass is positioned offset from the axis of DNA.

As depicted in Figure 1.4, there are approximately 10 base pairs (BP) per turn of the helix. This means sliding along DNA is coupled with a rotation; for displacement of around 10 BP parallel to the axis of DNA, proteins rotate an angle ofʹߨ. In a study in 1979, Schurr⁴⁹ assumed proteins envelop the DNA such that their center of mass is aligned with the axis of the DNA (Figure 1.4, sphere on the right side), and using this model he calculated a frictional coefficient for rotational-coupled sliding of a protein moving along the DNA to be

݂ ൌ ͸ߨߟܴ ൅ ሺ ʹߨ

ͳͲܤܲሻ^ଶͺߨߟܴ^ଷ (4)

where BP is the distance between base pairs along the axis of DNA, R is the protein radius, and K is the viscosity of the fluid. The first term in this equation is the translational contribution of the frictional coefficient, originating from the net displacement along the axis of DNA. The second term describes the rotational contribution of the frictional coefficient due to rotation of the protein around the axis of

ࡾ

ࡾࡻ࡯ ࡾ

૛ૈെ૚૙۰۾

(22)

the DNA. This calculation was done assuming that the center of mass of the protein remains on the axis of the DNA. To consider cases in which the center of mass of the protein and the axis of the DNA are not aligned (Figure 1.4, the left side) a more recent study⁵² updated Schurr’s calculation by including a second term for the off-axis rotational contribution:

݂ ൌ ͸ߨߟܴ ൅ ሺ ʹߨ

ͳͲܤܲሻ^ଶሾͺߨߟܴ^ଷ൅ ͸ߨߟܴሺܴ_௢௖ሻ^ଶሿ (5)

where the additional parameter ܴ_௢௖ is the offset between the center of mass of the protein and the axis of the DNA. The third term in this equation is the translational friction that is due to the curvilinear motion along the helix. By combining equation (5) and equation (1), the theoretical diffusion rate for helical sliding along DNA can be calculated. It should be noted that this calculation is based on a purely hydrodynamic model of diffusion, thus the friction between the DNA and protein is not considered.

Therefore, this value represents the upper limit of diffusion for helical sliding. As an example, for the lac repressor with ܴ ൌ ͶǤͻ݊݉, the theoretical diffusion constants for the parallel-to-axis sliding, the on-axis rotation and the off-axis rotation using ܴ_௢௖ ൌ ͷǤͷ݊݉ are calculated as ͶͶǤͷߤ݉^ଶȀݏ, ͲǤͶߤ݉^ଶȀݏ and ͲǤʹߤ݉^ଶȀݏ, respectively. The experimental value for the diffusion rate of the lac repressor was measured to be ͲǤͲͶ͸ߤ݉^ଶȀݏ, around 4 fold smaller than the upper limit of diffusion for helical sliding calculated using the off-axis rotation formula. It is suggested that due to the protein- DNA friction, the experimental value of the diffusion rate is normally 2-5 times smaller than the value of the theoretical upper limit of the diffusion rate for helical sliding⁵². The experimental value for the 1D diffusion coefficient D can be calculated from trajectories of proteins moving along DNA. The mean square displacement (MSD) of proteins ൏ ݔ^ଶ ൐in a one-dimensional Brownian motion is proportional to the diffusion constant ܦ and time ݐ⁶⁹

൏ ݔ^ଶ ൐ൌ ʹܦݐ (6)

Therefore, the experimental diffusion constant can be calculated as ܦ ൌ൏ ݔ^ଶ ൐Ȁʹݐ and in the next section it is explained how an activation energy barrier of base stepping can

(23)

be obtained from comparison of the theoretical upper limit with experimental values of the diffusion rate.

1.4.3 Damage recognition and binding energy landscape

A challenge that DNA-repair proteins face during DNA scanning is to recognize the target sites among an enormous access of normal bases. Once nonspecifically associated with the DNA, proteins are confined to move along the DNA using a hopping and/or sliding strategy or be intermittently involved in base interrogation. The movement of proteins and the characteristics of their interaction with DNA (hopping, sliding, base interrogation or target site association) are mainly regulated by variations in the binding energy at different positions along the DNA or due to different conformations adopted by the proteins. This variation of the binding energy is referred to as the roughness of binding energy landscape (ߪሻ⁵³ or activation energy barrier (Ea, 'G)^56,58,71 in different studies. In simple terms, the rougher this binding energy landscape is, i.e. the larger the energy barrier is for base stepping, the slower the movement of proteins along DNA will be. In a hallmark theoretical study⁵³, it was shown that for an efficient damage recognition to take place, proteins should spend half of the time performing 3D diffusion in the solution and half of the time scanning the DNA in a binding energy landscape with a ߪ of around 1 ݇_஻ܶ. For ߪ ൐ ʹ݇_஻ܶ the scanning becomes extremely slow and efficient scanning becomes impossible. On the other hand, for strong association with the target site, ߪ should be൐ ͷ݇_஻ܶ. This introduces a conflict, known as the search-speed stability paradox, in which efficient scanning requires a smooth binding energy landscape (ߪ̱݇_஻ܶ) whereas target site recognition depends on a very rough binding energy landscape (ߪ ب ݇_஻ܶ). This conflict is addressed by assuming that proteins bind to the DNA in two different modes, referred to as search and recognition mode, with two distinct ranges of binding energy roughness. It has been suggested that the ability to switch between these two modes mainly depends on a conformational change in the structure of the proteins, the DNA or both^48,50,53. Such conformational changes are coupled with target site association; they might be induced if the proteins spend enough time close to DNA bases which matches the binding site on the protein, e.g. damage recognition pockets. Once the conformational change is stabilized, the proteins bind the target site tightly⁵³. In another theoretical study a three-mode scanning

(24)

strategy is suggested for efficient target site recognition, in which the modes are referred to as unbound (hopping/jumping), search and recognition⁶².

The roughness of the binding energy landscape can be calculated by comparing the theoretical upper limit of the diffusion rate for helical sliding with the experimental value for the diffusion rate obtained from single-molecule experiments. There are two models for such calculations^53,71. In the first it is assumed that the roughness of this energy landscape for a continuous stretch of the movement of the protein along DNA is constant. Therefore, proteins need to overcome an activation energy barrier (ܧ_௔) for stepping between base pairs. Considering the sliding process in terms of the kinetics of a sequence of interactions between the protein and adjacent DNA bases, the rate constant can be written as

݇ ൌͳ

ݐ ൌ ʹܦȀ൏ ݔ^ଶ ൐ (7)

where ܦ is the diffusion constant and ൏ ݔ^ଶ ൐ is equal to ܾ݌^ଶ, i.e. the square of the distance between base pairs. From the Arrhenius’ equation we know that

݇ ൌ ܣ݁^ିா^ೌ^Ȁ௞^ಳ^் (8)

The ideal rate constant ݇_{௜ௗ௘௔௟} for helical sliding is reached only when the diffusion rate reaches the theoretical upper limit for a frictionless movement of the protein along the DNA, implying that ܧ_௔ ൌ Ͳ, hence the pre-exponential factor ܣ ൌ ݇_{௜ௗ௘௔௟}. By inserting the value of ܣ into equation (8) and solving for ܧ_௔ we get

ܧ_௔ ൌ ൬݇_{௜ௗ௘௔௟}

݇ ൰ ݇_஻ܶ (9)

Assuming base-pair stepping with the length of 1 bp for both theoretical and experimental trajectories, and using equation (7), we can rewrite equation (9) as

ܧ_௔ ൌ ቆܦ_{௨௣Ǥ௟௜௠}

ܦ_௘௫௣ ቇ ݇_஻ܶ (10)

where ܦ_{௨௣Ǥ௟௜௠} and ܦ_௘௫௣ are the upper limit of the diffusion rate calculated with the method explained in the previous section (equations 1 and 5) and the diffusion rate

(25)

calculated from the experimental trajectories of protein molecules scanning along DNA (equation 6), respectively.

In the second model for calculation of the roughness of the binding energy landscape, it is assumed that the binding energy varies with the DNA sequence, introducing variation in the activation energy barrier for stepping. The variance of this distribution (ߪ^ଶ) can be calculated from the following equation⁷¹

ܦ_௘௫௣ൌ ܦ_{௨௣Ǥ௟௜௠}ሺͳ ൅ߚ^ଶߪ^ଶ

ൗ ሻʹ ^ଵȀଶሺെ͹ߚ^ଶߪ^ଶ

ൗ ሻ Ͷ (11)

where ߚ is ͳ

݇_஻ܶ

ൗ . From the variance of this distribution,ߪ^ଶ, the roughness of the energy landscape can be calculated, giving a value equivalent to the average of the activation energy barrier for stepping. Although these two models use different approaches, it has been shown that they both return similar values for the average activation energy barrier for stepping⁷¹.

1.5 Single-molecule experiments with DNA repair proteins

Despite substantial revelations regarding the mechanism of scanning by theoretical and bulk experiment studies, the most informative approach to study the molecular choreography of DNA scanning is by directly observing the dynamics of the protein- DNA interaction. The need for direct observation of molecular processes has led to the development of single-molecule experiments (SME). The term single-molecule experiment is used for a wide variety of microscopy-based experiments that often includes DNA, proteins or both. These studies can be put into different categories based on the aim of the experiment, such as mechanical characterization of DNA and RNA in various biochemical environments^72–75 or during interaction with proteins^76–84, exploring replication or transcription processes^85–89, or characterization of the DNA scanning behavior of proteins^56,90–96.

The typical size of DNA and proteins is of the order of a few nanometers with negligible absorption or scattering of visible light, while the diffraction limit of light microscopy is above 200 nanometers. Therefore, direct observation of these molecules in regular bright-field microscopy is not possible. In some of these studies single molecules are directly observed using fluorescence microscopy56,58,59,74,92 while in

(26)

others the effect of protein interactions on the length^76,78,80 or mechanical properties^82–84 of DNA are detected. In this section we focus on those studies where the DNA scanning properties of proteins are characterized by direct single-molecule detection. Normally in such experiments, the DNA is first immobilized, then linearized and finally exposed to the interacting proteins. The interaction is recorded as trajectories of proteins scanning along DNA. In this section we focus on DNA repair proteins as well as some other thermally driven proteins that perform scanning of DNA.

1.5.1 Helical sliding

The DNA molecule is around 2 nm thick and the diameter of a typical protein molecule is around 5 nm, while the resolution of a regular microscope is above 200 nm. Using fluorescence microcopy, the movement of proteins along the axis of linearized DNA can be detected with high precision. However, direct observation of helical sliding is not possible due to spatiotemporal resolution limits of regular fluorescence microscopy.

Even though recent advances in super-resolution microscopy has surpassed the diffraction limit and achieved spatial resolution down to a few nanometers, a successful direct observation of helical sliding has not been reported yet. Nevertheless, as depicted in Figure 1.5, the first sign to infer that helical sliding is a dominant strategy of DNA scanning for a given protein is through the complementarity of the proteins’ binding site with the DNA grooves^56,58.

Figure 1.5 | Helical sliding. Depending on the structural complementarity between protein and DNA grooves, helical sliding may be performed by following the minor (red arrow) or major groove (blue arrow) of the DNA. Adopted and modified from Dunn, A. R., Kad, N. M., Nelson, S.

R., Warshaw, D. M. & Wallace, S. S. Single Qdot-labeled glycosylase molecules use a wedge amino acid to probe for lesions while scanning along DNA. Nucleic Acids Res. 39, 7487–7498 (2011) by permission of Oxford University Press.

(27)

In single-molecule studies a necessary, but insufficient, criterion for helical sliding is that the average diffusion constant for the protein is well below the theoretical upper limit of the diffusion rate for helical sliding. It has been suggested that the diffusion rate for proteins with dominant helical sliding should be 2-5 fold smaller than the theoretical upper limit of the diffusion rate for helical sliding^52,56. Proteins breaking this speed limit for helical sliding are likely to use hopping interspersed with helical sliding. Exploring the dependence of the diffusion rate on the ionic strength of the environment is another strategy to verify helical sliding^48,56,58. Since helical-sliding proteins are strictly bound to DNA and swap places with ions bound to DNA, the concentration of ions around DNA should not affect the rate of scanning^56,58. Thus, the absence of a salt-dependence of the average diffusion rate is considered a sign that proteins predominantly use helical sliding for translocation along DNA^56,58. In this respect, several DNA glycosylases including hOGG1⁵⁶^,Nth, Fpg and Nei⁵⁸, as well as the LacI repressor⁹² and the mismatch repair complex Msh2-Msh6⁹⁷ have been shown to employ helical sliding.

Another strategy to verify helical sliding is to check the dependence of the diffusion rate of scanning with the radius R of the protein. According to the formulas for calculation of the upper limit of the diffusion rate for helical sliding, equations 1 and 5, we see that the diffusion rate varies as ܴ^ିଷ . For non-helical sliding the diffusion rate changes with^49,52 ܴ^ିଵ. In a study by Blainey et al, the DNA scanning of 4 different proteins were investigated⁵⁷. The diffusion rate was found to vary as ܴ^ିଷ, suggesting that helical sliding is the dominant strategy for DNA scanning by hOGG1, E. coli MutY, E. coli MutM M74A, the adenoviral AVP–pVIc complex, and the BamHI restriction endonuclease dimer. The energy barrier of sliding for these proteins has been measured to be around 1݇_஻ܶ, which is also consistent with theoretical predictions for helical sliding⁵³.

1.5.2 Hopping

Having a high average diffusion rate of scanning close to the upper limit of helical sliding, or even surpassing it, is a strong indication that a protein employs hopping, either interspersed with sliding⁹⁸ or as a dominant strategy for translocation⁵⁹. However, any lack of such an observation does not exclude hopping since slow helical sliding combined with a low frequency of hopping will lower the average diffusion rate. Hence,

(28)

a widely used strategy for detection of hopping is to check the salt-dependence of the diffusion rate. Proteins undergoing hopping will microscopically dissociate from the DNA while still being confined to a volume in the vicinity of the DNA. Ions surrounding the DNA will reassociate to the vacant protein binding site on DNA. Due to loss of close contact with the DNA, and a reduction in friction in this microscopically dissociated state, the protein’s mobility along the DNA is increased. The average diffusion rate will be highly affected by the frequency of this microscopic dissociation as well as the duration of the dissociated state. A high ionic strength of the environment facilitates microscopic dissociations and reduces the likelihood of reassociation. This leads to an increase in the amount of time proteins spend in the high mobility, microscopically dissociated state, and consequently, an increase in the average diffusion rate. For example, the processivity factor of the herpes simplex virus DNA polymerase, UL42, shows a 4 fold increase in the average diffusion rate when the salt concentration is increased from 25 mM to 100 mM⁵⁹. In another study, it is shown that the diffusion rate for the transcription activator-like effector (TALE) protein increases by 10 fold when the salt concentration is increased from 30 mM to 600 mM⁹⁸.

Another approach to detect hopping is to directly observe the movement of proteins while encountering each other or facing a stationary obstacle along the DNA. In an experiment using two-color fluorescence microscopy, it was shown that two Mlh1–

Pms1 protein complexes can bypass each other or stationary nucleosomes during scanning⁹⁹.

1.5.3 Mode switching; hopping – sliding

Based on a theoretical framework, it has been predicted that interspersed hopping and helical sliding is an optimum combination for target site recognition^53,60–62. DNA is a crowded molecule and there are many different proteins constantly associated with the DNA, performing various tasks. Without the ability to perform hopping for a large number of these proteins, the result would be a molecular traffic jam on the DNA¹⁰⁰. Therefore, the ability to hop, even for very short periods of time, has been considered one of the essential mechanisms of translocation in several studies53,54,60–64,101–103. One of the ways to experimentally detect interspersed hopping and sliding is through direct visualization; this is only possible if the length of hopping is sufficient long. The

(29)

restriction endonuclease EcoRV is an example of a protein that has been observed to perform hopping, with the length of steps up to 1 μm (~ 3000 base pairs)¹⁰⁴. In most cases the length of individual steps in hopping is too short to be directly detected;

therefore alternative indirect approaches are used. In the cases where an ATP-induced conformational change in the protein is responsible for scanning-mode switching, the difference in the diffusion rate in presence and absence of ATP is used to infer a switching between modes. For example, the diffusion rate of the post-replicative mismatch repair protein MutSα¹⁰⁵ and the type III restriction enzyme EcoP15I¹⁰⁶ show respectively 6- and 30-fold increases in the average diffusion rate in the presence of ATP. The tumor suppressor p53 delegates helical sliding and hopping to the C-terminal domain and the core domain of the protein, respectively¹⁰⁷. This was inferred from the observation that the movement of the C-terminal domain did not show any salt dependence, whereas the diffusion rate of the core domain increased 3-folds when increasing the salt concentration from 25 to 175 mM. The transcription activator-like effector protein TALE uses the N-terminal region for helical sliding while the central repeat domain facilitates hopping by forming a loosely wrapped conformation around the DNA^95,98.

To the best our knowledge, the ability to switch between helical sliding and hopping has so far only been attributed either to multimeric, ring-shaped proteins which encircle the DNA, to proteins with two distinct binding sites, or to proteins which undergo ATP- induced conformational changes. For example, PCNA⁹³ (Figure 1.6) and EcoRV¹⁰⁴ form a ring-like shape topology around DNA that keeps these multimeric proteins in the vicinity of DNA at all times; the ring-shape/clamp plays an important role in facilitating hopping without losing contact with the DNA. MutSα¹⁰⁵ and EcoP15I¹⁰⁶ change protein conformation by ATP association/hydrolysis, and p53 has two different binding sites with each mode of scanning delegated to separate binding sites¹⁰⁷. TALE uses two distinct protein conformations for hopping and helical sliding^95,98. The question remains whether helical-scanning-to-hopping mode switching is limited to proteins with the abovementioned structural characteristics or if it is a more general strategy that can also be employed by proteins lacking such structural features.

(30)

Figure 1.6 | Crystal structure of PCNA. PCNA forms a close topology structure around the DNA upon nonspecific binding and this keeps the protein encircling the DNA even though parts of the protein are not tightly associated with the DNA grooves, the colors repreasent different monomers of the protein. PDB code: 1VYM ¹⁰⁸.

1.5.4 Mode switching; search – recognition

According to theoretical models, proteins are either in search or recognition mode^53,62. The two modes of translocation that were discussed above, hopping and helical sliding, both belong to the search mode. In the recognition mode proteins are performing detailed base inspection such as base flipping or/and base probing, e.g. by use of wedge motifs^58,109,110 or single amino acid residues^111,112. In this mode the diffusion rate of the movement is drastically decreased. Therefore, in addition to switching between helical sliding and hopping, proteins may switch into the recognition mode as well. Despite robust theoretical explanations behind this switching53,54,60–62, there have not been that many single-molecule experiments focused on this issue^58,110. From structural studies of protein-DNA complexes it has been shown that in the recognition mode the proteins probe for the damage through one of these strategies: (I) flipping the base into a base recognition pocket¹¹¹; (II) inserting an intercalating residue⁵⁸ or void-filling cap to probe the base¹¹²; (III) capturing the base while being spontaneously flipped out of the DNA stack¹¹³.

In a single-molecule study of the three bacterial DNA glycosylases Fpg, Nei and Nth, intermittent periods of stalling of movement were reported during scanning⁵⁸. As shown in Figure 1.7 for Fpg, this is attributed to the probing of the DNA bases using an intercalating residue (in this case Phe111) in the structure of the protein. In Nei and Nth,

(31)

the amino acids Tyr72 and Leu81 function as intercalating residues, respectively⁵⁸. This study was done using undamaged DNA, and it was argued that base probing for efficient inspection is not dependent on the damage and it can be performed even on undamaged DNA, therefore they suggest the term “interrogation phase” to describe this activity. In a later study¹¹⁰ a single-molecule experiment was performed using the same proteins while scanning damaged DNA. The intercalating residues were mutated (Fpg F111A, Nei Y72A, Nth L81A) and from a comparison of the scanning behavior of the mutant and wild-type proteins they inferred that the switching to recognition mode depends on the presence of the intercalating residue.

Figure 1.7 | The intercalating residue in the Fpg glycosylase. The intercalating residue (phenylalanine 111; Phe111) is inserted between the two strands of DNA in the interrogation mode (red; left). In the seach mode this residue is not intercalating the DNA (blue; right).

Adopted from Dunn, A. R., Kad, N. M., Nelson, S. R., Warshaw, D. M. & Wallace, S. S. Single Qdot-labeled glycosylase molecules use a wedge amino acid to probe for lesions while scanning along DNA. Nucleic Acids Res. 39, 7487–7498 (2011) by permission of Oxford University Press.

1.6 Challenges with methods in previous studies

In a typical single-molecule experiment where the observation of DNA scanning is the objective, DNA should be immobilized, detected, linearized, be exposed to interacting proteins and the interaction should be recorded. Different studies have come up with diverse solutions to each step of this process. However, the methods in these studies have room for improvements and there are sources of errors present that should be minimized or eliminated. In this section, different approaches to perform single- molecule DNA scanning experiments and the potential sources of errors present in previous studies are briefly discussed. The efficient detection of single-molecules with a substantial level of background noise from different sources is one of the major challenges that experimentalists face in single-molecule experiments. In the quest for

(32)

high signal-to-noise ratio and high spatial and temporal resolution, a number of previous studies have use quantum-dots (Q-dots) to label the proteins58,93,97,99,110,114,115. As depicted in Figure 1.8, these Q-dots are considerably larger than the proteins (around 4- 10 times), and are connected to them via relatively large antibodies. This combination severely affects the protein’s diffusive properties⁵⁷.

Figure 1.8 | Quantum dot labelling. The protein (shown in green) is connected to a large quantum dot via a Biotin-lgG linker antibody. The payload greatly exceeds the size and mass of the protein under investigation. Adopted from Dunn, A. R., Kad, N. M., Nelson, S. R., Warshaw, D. M. & Wallace, S. S. Single Q-dot-labeled glycosylase molecules use a wedge amino acid to probe for lesions while scanning along DNA. Nucleic Acids Res. 39, 7487–7498 (2011) by permission of Oxford University Press

After immobilization on or near the surface, the DNA molecules need to be visualized and localized. In some studies DNA intercalating dyes are used for this purpose58,92,97,110,115. It has been shown that intercalating dyes affect the mechanical properties of DNA¹¹⁶ which in turn can affect the scanning process. Moreover, the presence of these dyes in the structure of DNA might interrupt the damage-probing function of the proteins such as base-flipping and base probing via wedge residues. As depicted in Figure 1.9, another common and potential source of error in a subset of previous studies is the use of flow to linearize the DNA56,57,107,117. In these experiments, the speed of flow can exceed the speed of scanning by 1-2 orders of magnitude. It has been shown that the flow can interfere with the scanning process by either shortening the binding lifetime⁵⁸ or affecting thermally-driven scanning¹¹⁸.

(33)

Figure 1.9 | Flow-streched DNA. DNA is immobilized on the surface of a coverslip and elongated using force from flow in the reaction chamber. Adapted by permission from Springer Nature, Nature Structural & Molecular Biology, Nonspecifically bound proteins spin while diffusing along DNA, Blainey, P. C. et al. [COPYRIGHT] (2009).

Another challenge common to all single-molecule studies is the production and operation of microfluidic laminar flow cells. In order to immobilize the DNA on the coverslip surface and later expose it to proteins, sequential delivery of different materials to the point of observation is needed. For this purpose, microfluidic laminar flow cell are used in almost all single-molecule studies. Laminar flow cells containing a channel with typical dimensions of 5×50×0.2 mm are designed to be placed on a microscope and connected to a pumping system for delivery of the necessary reagents and components. Some of the typical ways of fabricating laminar flow cells used in previous studies include: (I) cutting the pattern of the channels into a piece of double- sided tape^119,120 or parafilm⁸⁴ and sandwiching it between a coverslip and microscope slide; (II) etching channels on the surface of the microscope slide ^121,122 or PMMA¹²³ using lasers or milling machines; (III) casting PDMS material on a mold with the pattern of the channels^124,125. A time-consuming and tedious preparation process, low accuracy and high chance of error in operation, low versatility in design and incompatibility with existing optical setups or targeted biological experiments are common challenges facing experimentalists in all three approaches.

In addition to experimental sources of errors and challenges mentioned above, data analysis approaches used in some of the studies can also limit the interpretive power.

Although giving valuable information about the thermally driven movements, looking at the overall average diffusion rate is not sufficient to disclose details of transient nature during scanning^56,57. The distribution of the average diffusion rate of trajectories, and

(34)

also the distribution of the instantaneous diffusion rate of segments of trajectories, can reveal interesting information about the scanning process^58,110. Relatively commonly in previous studies, the average diffusion rate of trajectories is used to investigate any salt- dependence in order to detect or exclude hopping as a translocation strategy for the protein under investigation56–58,97,98,107,110. If hopping is the dominant strategy of scanning the salt-dependence of the diffusion rate might be detected by looking at the average diffusion rate, however, for the cases where the ratio of hopping to sliding is low, the average diffusion rate may fail to show the salt dependence. Therefore, to check for the presence of salt-dependent variation in the diffusion rate, segmentation of trajectories into different modes of scanning, as performed in our study, may be helpful.

1.7 Proteins used in this study

In manuscript I a wild-type Endonuclease V (wt-EndoV) and a wedge-deficient mutant of Endonuclease V (wm-EndoV) from the hyperthermophilic bacteria Thermotoga maritima were used. In addition, human 8-oxoguanine DNA glycosylase 1 (hOGG1) was included as a prototypic DNA repair protein utilizing helical sliding. In manuscript II, the DNA scanning properties of the HEAT-like repeat proteins AlkD from Enterococcus Faecalis and AlkF from Bacillus Cereus are characterized. In this section a brief introduction to structural elements and functions of these proteins is presented.

1.7.1 EndoV and hOGG1

Endonuclease V (EndoV) is a highly conserved DNA/RNA endonuclease with sequence homologs in all domains of life; bacteria, archaea and eukarya^126–128. In prokaryotes, the main target is deaminated adenine (hypoxanthine) lesions^129,130. However, deaminated base lesions such as xanthine¹³¹ , oxanine¹³² and uracil¹³³ are also processed. In addition, EndoV has been shown to recognize and cleave the DNA strand next to a wide variety of DNA anomalies, ranging from deaminated bases^129–135, AP-sites^133,135, base mismatches^135,136, insertion-deletion (ID) loops, hairpins, flaps and pseudo-Y structures¹³⁷. Moreover, the enzyme can bind to, but not cleave, a variety of branched DNA structures such as forks, three-way junctions and Holliday junctions¹²⁷. The eukaryote version of EndoV does not catalyse cleavage of DNA, but instead cleaves various forms of RNA containing a deaminated base128,138–140.

(35)

EndoV does not remove the damaged base; upon encounter, it instead hydrolyzes the second phosphodiester bond 3’ to the deaminated base using Mg²⁺ as a cofactor^109,133. The enzyme remains tightly bound to the nicked product, suggesting EndoV is a monofunctional protein and that other proteins are needed to complete the repair pathway^109,130. The repair pathway initiated by EndoV is yet to be fully discovered.

EndoV from the hyperthermophilic bacterium Thermotoga maritima (Tma)¹⁴¹ has been widely used for structural and functional characterization. Crystal structures of Tma EndoV in complex with damaged DNA have revealed a strand-separating wedge motif at the protein surface, formed by the four residues Pro79, Tyr80, Ile81 and Pro82 (PYIP;

Figure 1.10)¹⁰⁹. This highly conserved PYIP-wedge has been shown to be crucial for the enzyme’s ability to recognize helical distortions in DNA. In fact, the DNA strands in the complexes are split by the wedge exactly at the weak point in DNA¹⁴², and the wedge can therefore be reasonably expected to play a role in switching from search to recognition mode. The ability to split the DNA strands at points with weak base pairing might also explain the enzymes’ affinity for the above mentioned DNA anomalies.

Figure 1.10 | Crystal structure of EndoV. Left: Crystal structrue of wild-type Tma EndoV, with the PYIP wedge residues shown in dark green. PDB code 2w35¹⁴³. Right: Model of the structure of the wedge-deficient mutant Tma EndoV.

Human 8-oxoguanine DNA glycosylase 1 (hOGG1) is the primary enzyme responsible for recognition and removal of oxidized guanine (8-oxoguanine; 8-oxoG) lesions in DNA^144–147. This proteins belongs to the Helix-hairpin-Helix structural superfamily of DNA glycosylases and the base excision repair pathway¹⁴⁸. DNA-associated crystal structures of hOGG1^148,149 shows that in the recognition mode, the DNA is bent

wt-EndoV wm-EndoV

Wedge residues PYIP Mutated