Afantitis+et+al_Comput+Struct+Biotechnol+J_2020.pdf (1.606Mb)

(1)

Review

NanoSolveIT Project: Driving nanoinformatics research to develop innovative and integrated tools for in silico nanosafety assessment

Antreas Afantitis

^a,^⇑

, Georgia Melagraki

^a

, Panagiotis Isigonis

^a

, Andreas Tsoumanis

^a

,

Dimitra Danai Varsou

^a

, Eugenia Valsami-Jones

^b

, Anastasios Papadiamantis

^b

, Laura-Jayne A. Ellis

^b

, Haralambos Sarimveis

^c

, Philip Doganis

^c

, Pantelis Karatzas

^c

, Periklis Tsiros

^c

, Irene Liampa

^c

,

Vladimir Lobaskin

^d

, Dario Greco

^e

, Angela Serra

^e

, Pia Anneli Sofia Kinaret

^e

, Laura Aliisa Saarimäki

^e

, Roland Grafström

^f,g

, Pekka Kohonen

^f,g

, Penny Nymark

^f,g

, Egon Willighagen

^h

, Tomasz Puzyn

^i,j

, Anna Rybinska-Fryca

ⁱ

, Alexander Lyubartsev

^k

, Keld Alstrup Jensen

^l

, Jan Gerit Brandenburg

^m,n

,

Stephen Lofts

^o

, Claus Svendsen

^p

, Samuel Harrison

^o

, Dieter Maier

^q

, Kaido Tamm

^r

, Jaak Jänes

^r

, Lauri Sikk

^r

, Maria Dusinska

^s

, Eleonora Longhin

^s

, Elise Rundén-Pran

^s

, Espen Mariussen

^s

, Naouale El Yamani

^s

,

Wolfgang Unger

^t

, Jörg Radnik

^t

, Alexander Tropsha

^u

, Yoram Cohen

^v

, Jerzy Leszczynski

^w

,

Christine Ogilvie Hendren

^x

, Mark Wiesner

^x

, David Winkler

^y,z,aa,bb

, Noriyuki Suzuki

^cc

, Tae Hyun Yoon

^dd,ee

, Jang-Sik Choi

^ee

, Natasha Sanabria

^ff

, Mary Gulumian

^ff,gg

, Iseult Lynch

^b,^⇑

aNanoinformatics Department, NovaMechanics Ltd, Nicosia, Cyprus

bSchool of Geography, Earth and Environmental Sciences, University of Birmingham, B15 2TT Birmingham, UK

cSchool of Chemical Engineering, National Technical University of Athens, 157 80 Athens, Greece

dSchool of Physics, University College Dublin, Belfield, Dublin 4, Ireland

eFaculty of Medicine and Health Technology, University of Tampere, FI-33014, Finland

fMisvik Biology OY, Itäinen Pitkäkatu 4, 20520 Turku, Finland

gKarolinska Institute, Institute of Environmental Medicine, Nobels väg 13, 17177 Stockholm, Sweden

hDepartment of Bioinformatics – BiGCaT, School of Nutrition and Translational Research in Metabolism, Maastricht University, Universiteitssingel 50, 6229 ER Maastricht, the Netherlands

iQSAR Lab Ltd., Aleja Grunwaldzka 190/102, 80-266 Gdansk, Poland

jUniversity of Gdansk, Faculty of Chemistry, Wita Stwosza 63, 80-308 Gdansk, Poland

kInstitutionen för material- och miljökemi, Stockholms Universitet, 106 91 Stockholm, Sweden

lThe National Research Center for the Work Environment, LersøParkallé 105, 2100 Copenhagen, Denmark

mInterdisciplinary Center for Scientific Computing, Heidelberg University, Germany

nChief Digital Organization, Merck KGaA, Frankfurter Str. 250, 64293 Darmstadt, Germany

oUK Centre for Ecology and Hydrology, Library Ave, Bailrigg, Lancaster LA1 4AP, UK

pUK Centre for Ecology and Hydrology, MacLean Bldg, Benson Ln, Crowmarsh Gifford, Wallingford OX10 8BB, UK

qBiomax Informatics AG, Robert-Koch-Str. 2, 82152 Planegg, Germany

rDepartment of Chemistry, University of Tartu, Ülikooli 18, 50090 Tartu, Estonia

sNILU-Norwegian Institute for Air Research, Instituttveien 18, 2002 Kjeller, Norway

tFederal Institute for Material Testing and Research (BAM), Unter den Eichen 44-46, 12203 Berlin, Germany

uEschelman School of Pharmacy, University of North Carolina at Chapel Hill, 100K Beard Hall, CB# 7568, Chapel Hill, NC 27955-7568, USA

vSamueli School Of Engineering, University of California, Los Angeles, 5531 Boelter Hall, Los Angeles, CA 90095, USA

wInterdisciplinary Nanotoxicity Center, Jackson State University, 1400 J. R. Lynch Street, Jackson, MS 39217, USA

xCenter for Environmental Implications of Nanotechnologies, Duke University, 121 Hudson Hall, Durham, NC 27708-0287, USA

yLa Trobe Institute of Molecular Sciences, La Trobe University, Plenty Rd & Kingsbury Dr, Bundoora, VIC 3086, Australia

zMonash Institute of Pharmaceutical Sciences, Monash University, Parkville 3052, Australia

aaCSIRO Data61, Clayton 3168, Australia

bbSchool of Pharmacy, University of Nottingham, Nottingham, UK

ccNational Institute for Environmental Studies, 16-2 Onogawa, Tsukuba, Ibaraki 305-0053, Japan

ddDepartment of Chemistry, College of Natural Sciences, Hanyang University, Seoul 04763, Republic of Korea

eeInstitute of Next Generation Material Design, Hanyang University, Seoul 04763, Republic of Korea

https://doi.org/10.1016/j.csbj.2020.02.023

2001-0370/Ó2020 The Authors. Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology.

This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Abbreviations:AI, Artificial Intelligence; AOPs, Adverse Outcome Pathways; API, Application Programming interface; CG, coarse-grained (model); CNTs, carbon nanotubes;

FAIR, Findable Accessible Inter-operable and Re-usable; GUI, Graphical Processing Unit; HOMO-LUMO, Highest Occupied Molecular Orbital Lowest Unoccupied Molecular Orbital; IATA, Integrated Approaches to Testing and Assessment; KE, key events; MIE, molecular initiating events; ML, machine learning; MOA, mechanism (mode) of action;

MWCNT, multi-walled carbon nanotubes; NMs, nanomaterials; OECD, Organisation for Economic Co-operation and Development; PC, Protein Corona; PBPK, Physiologically Based PharmacoKinetics; PChem, Physicochemical; PTGS, Predictive Toxicogenomics Space; QC, quantum-chemical; QM, quantum-mechanical; QSPR, quantitative structure- property relationship; QSAR, quantitative structure-activity relationship; RA, risk assessment; REST, Representational State Transfer; ROS, reactive oxygen species; SAR, structure-activity relationship; SMILES, Simplified Molecular Input Line Entry System; SOPs, standard operating procedures.

j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / c s b j

(2)

ffNational Health Laboratory Services, 1 Modderfontein Rd, Sandringham, Johannesburg 2192, South Africa

ggHaematology and Molecular Medicine, University of the Witwatersrand, Johannesburg, South Africa

a r t i c l e i n f o

Article history:

Received 12 November 2019

Received in revised form 18 February 2020 Accepted 29 February 2020

Available online 7 March 2020 Keywords:

Nanoinformatics Computational toxicology Hazard assessment Engineered nanomaterials (quantitative) Structure–activity relationships

Integrated approach for testing and assessment

Safe-by-design Machine learning Read across Toxicogenomics Predictive modelling

a b s t r a c t

Nanotechnology has enabled the discovery of a multitude of novel materials exhibiting unique physicochemical (PChem) properties compared to their bulk analogues. These properties have led to a rapidly increasing range of commercial applications; this, however, may come at a cost, if an association to long-term health and environmental risks is discovered or even just perceived. Many nanomaterials (NMs) have not yet had their potential adverse biological effects fully assessed, due to costs and time con- straints associated with the experimental assessment, frequently involving animals. Here, the available NM libraries are analyzed for their suitability for integration with novel nanoinformatics approaches and for the development of NM specific Integrated Approaches to Testing and Assessment (IATA) for human and environmental risk assessment, all within the NanoSolveIT cloud-platform. These established and well-characterized NM libraries (e.g. NanoMILE, NanoSolutions, NANoREG, NanoFASE, caLIBRAte, NanoTEST and the Nanomaterial Registry (>2000 NMs)) contain physicochemical characterization data as well as data for several relevant biological endpoints, assessed in part using harmonized Organisation for Economic Co-operation and Development (OECD) methods and test guidelines.

Integration of such extensive NM information sources with the latest nanoinformatics methods will allow NanoSolveIT to model the relationships between NM structure (morphology), properties and their adverse effects and to predict the effects of other NMs for which less data is available. The project specifically addresses the needs of regulatory agencies and industry to effectively and rapidly evaluate the exposure, NM hazard and risk from nanomaterials and nano-enabled products, enabling implementation of computational ‘safe-by-design’ approaches to facilitate NM commercialization.

Ó2020 The Authors. Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology. This is an open access article under the CC BY license (http://creativecommons.

org/licenses/by/4.0/).

Contents

1. Introduction . . . 585

2. Materials and methods . . . 586

2.1. NMs datasets and knowledge infrastructures . . . 586

2.1.1. Data generation and curation activities . . . 586

2.1.2. Computational approaches for data gap filling and interpolation . . . 587

2.1.3. Dedicated NMs databases organized for modelling and informatics. . . 588

2.2. Toxicogenomics modelling (predictive models using omics data) . . . 588

2.2.1. Deriving signatures of NM MOA to support AOP development . . . 589

2.2.2. Deriving signatures of NM MOA that define robust predictive models based on NM bioactivity . . . 590

2.2.3. Developing computational methods for inclusion into a robust computational platform . . . 590

2.3. Multi-scale modelling framework for NMs property prediction . . . 590

2.3.1. Computational NMs descriptors . . . 590

2.3.2. Modelling (predicting) NM corona composition. . . 591

2.3.3. Connecting NM-coronas to biological impacts/AOPs . . . 592

2.4. Predictive nanoinformatics modelling (using artificial intelligence methodologies) . . . 593

2.4.1. Calculated toxicity predictors for ML models . . . 593

2.4.2. Meta models to integrate across scales. . . 593

2.5. NM human and environmental RA. . . 594

2.6. NM cloud platform . . . 595

2.7. Knowledge transfer and communication with stakeholders and RA bodies . . . 596

3. Summary and outlook . . . 597

CRediT authorship contribution statement . . . 597

Acknowledgments . . . 598

Appendix A. Supplementary data . . . 598

References . . . 598

⇑Correspondence authors.

E-mail addresses:[email protected](A. Afantitis),[email protected](I. Lynch).

(3)

1. Introduction

Nanotechnology has increased accessibility to novel, diverse materials with dimensions of ~1–100 nm, the size range within which novel physicochemical (PChem) properties appear and transport processes occur in living systems; such materials are being used in a multitude of industrial and consumer-goods applications. The advantages of engineered nanomaterials (NMs) over similar materials in bulk are well defined; however, in most of the cases, risk assessment (RA) of the potential hazards arising from these new materials properties is incomplete or lacking. Cur- rently, evaluation of possible NM-related risks is an expensive, slow and complicated task, usually achieved by combinations of in vitroandin vivoexperiments that estimate human and environmental hazards. Clear conclusions regarding the hazards posed by NMs are often very difficult because the interpretation of experimental results is influenced by the type of procedures and protocols applied, and studies are usually limited to just a few NMs, doses and timepoints. At this stage, it is clear that proper use of existing high-efficacy occupational technical- and personal protec- tion equipment can be used to sufficiently reduce exposure to NMs, [41,68,175]and that the effects from acute exposure to NMs at realistic doses mirror those from anthropogenic particle exposure.

However, the challenges now involve assessing long-term low- level chronic exposures, and biological and ecological impacts from multi-component NMs where for example, the different components may degrade at different rates. A validated, predictivein sil- icoapproach that takes into account the complexity of NMs and the diverse environments in which they are deployed is essential for timely RA and continued progress in nanosafety research.

Despite the great success of computational methods to model and predict properties of conventional chemicals over many years, development of analogous quantitative models relating engineered NMs structure (morphology) to PChem properties and to toxic effects is still underdeveloped, due to:

The relative paucity of reliable experimental data on the biological properties of NMs;

Intrinsic complexity of NMs in particle size distribution, shape and degree of agglomeration compared to small organic molecules;

A limited number of systematic studies on the dynamic interaction of NMs with available macromolecules when placed in a biological environment (such as serum, plasma and environmental compartments) to potentially form coronas which then become the ‘‘biologically relevant entity” seen by cells and organisms, and the role thereof.

Systematic studies conducted to date are limited only to a few nanostructure descriptors and/or physicochemical properties namely, size, shape and surface properties[14,115].

Despite these limitations, significant progress has been made in recent years towards understanding the drivers of NMs toxicity and ecotoxicity[188,189]. Size has long been the defining feature of nanoscale materials along with surface area, but is not in itself sufficient as a predictor of toxicity[24]. Surface charge has also been found in many studies to drive toxicity, with cationic NMs being more toxic than negatively charged materials of similar composition, likely a result of strong electrostatic interactions between cationic NMs and negatively charged membranes[42,77]. Indeed, there are several models linking zeta potential and NMs toxicity in the literature[59,99,165]. The presence of crystalline order in materials is another key driver of toxicity, known since the earliest studies with quartz silica and asbestos and applies to NMs such as TiO2 and SiO2, where amorphous forms have low toxicity while

some ordered structures are especially toxic[4,122]. Band-gap is another common NM property linked with toxicity, as overlap of NM and cellular conductance bands facilitates transfer of electrons and oxidative stress[190]. Surface bond strain arising from high curvature and high temperature synthesis is another feature strongly correlated with toxicity, which explains differences between materials of similar composition produced by different synthesis methods (e.g. with and without high temperatures) [154,189]. A less investigated parameter that may be important for carbon-based NMs such as CNTs and graphene-like materials is chirality[38]. Many of the properties of NMs are influenced by their surroundings, so-called extrinsic properties, such as dissolution and formation of a biomolecule corona [12,89]. Binding of specific proteins that facilitate receptor mediated uptake is also a much-investigated feature of NMs, as this drives much of the subsequent signalling[74,180]. In the environment, transformations such as sulfidation, oxidation and interaction with phosphate, can change NM stability and toxicity[135,139]. Thus, it is increasingly clear that a wide range of NMs structure descriptors and properties, many of which are interlinked, are correlated with their toxicity. Modelling can elucidate the main drivers of NM toxicity via quantum mechanical and atomistic parameters that provide important insights into reactivity and mechanisms.

Evaluating the hazards of NMs requires the integration and assessment of currently quite disparate NMs characterization and toxicity data, categorization and grouping of NMs, as well as the derivation of exposure routes, forms and concentrations, and hazard threshold levels for human health and the environment.

Assessing the hazards of NMs solely based on laboratory tests is time-consuming, resource intensive and constrained by ethical considerations[154]. Consequently, over the past couple of dec- ades, computational approaches for modelling the relationships between NM structure, properties and their biological effects have become a key priority. This field has been reviewed by Winkler et al.[182], but has since advanced further. The most successful computational models, capable of predicting biological properties of NMs in diverse and complex environments, are based on the quantitative structure–activity relationship (QSAR) method. This, and related methods, use statistical and machine learning (ML) algorithms to model relationships between a materials’ structure, molecular properties, provenance and other parameters, and their biological effects. The application of these methods to diverse materials and NMs has been comprehensively reviewed by a number of authors [15,73,183], providing some very useful tools.

Although data driven, these computational methods are still capable of modelling relatively small data sets [36]. Clearly, larger integrated data sets, and future as yet to be produced data, form the basis for more comprehensive models that will increase automation of experimental data analysis and interpretation.

These computational approaches can rapidly fill data gaps, exploit

‘read across’ prediction of biological effects of similar materials, and classify the hazards of NMs to individual species. They are also valuable adjuncts to experimental data on the biological properties of new materials, which remains the bottleneck for reliable risk predictions[1,55,97,172]. Indeed,in silicomodels of this type are used extensively by scientists in academia and industry to reliably calculate PChem properties, and to evaluate effects on human and environmental health, ecotoxicological behaviour and fate of a broad range of chemical substances including complex materials.

More importantly, integrating these data resources across disci- plines, such as chemoinformatics, systems biology and ‘omics approaches etc., including with non-nanotechnology resources, will support multiple objectives including the reuse of existing information [60]. In addition, this deeper analysis may lead to new discoveries fuelling the innovation pipeline.

(4)

2. Materials and methods

Below we summarize the latest advances in five important fields of the nanoinformatics sector and how they can be further explored, expanded and incorporated in future generations of NM computational modelling tools and infrastructure. These include: i) dataset curation, quality assessment and knowledge infrastructures, ii) toxicogenomics modelling, iii) multi-scale modelling (physics-based and data driven), iv) predictive modelling (data driven) and v) NMs human and environmental RA. Despite the continuous advances in each sector, researchers face important challenges that need to be solved urgently. These advances are exploited within the European Commission Horizon2020 funded project NanoSolveIT to create the relevant nanoinformatics e- platform to facilitatein silicoNMs exposure, hazard and risk assessment. To this end, each sub-chapter includes a short analysis and review of the state-of-the-art and recent bibliography, as well as a concise presentation of how the main challenges of each field are being tackled by the authors.

2.1. NMs datasets and knowledge infrastructures

Development of nanotoxicity prediction models is becoming increasingly important in risk assessment (RA) of engineered NMs but is dependent on the availability of good datasets with high data quality, completeness [92], and quantity (in terms of numbers of different NMs, but also incorporating a range of doses, timepoints, cell or organism types and end-points). While industry is responsible for the provision of data for their specific NMs for regulatory evaluation, research is required to develop the predictive models to enable reduction of the experimental data needed for regulatory RA. However, computational researchers face significant challenges in acquiring this data needed for the development of robust models. These have included the wide heterogeneity of published literature data in terms of NMs characterization, exposure and hazard data reported, the availability of the underpinning datasets in a form useful for modelling, and in regards to the vari- able degrees of completeness and quality of available datasets [92,157]. Therefore, in the field of nanoinformatics, there has been special emphasis on the curation and quality assessment of nanosafety data. A small number of recent projects have thus prioritized the curation of literature data and datasets generated in past projects that ended before the current (evolving) standards for data quality emerged (e.g. NanoReg2 and caLIBRAte projects curating NaNoReg data), NanoSolveIT partners curating literature publications on NMs transcriptomics etc.).

Diversity of experimental approaches in the literature data has led to many missing values (data gaps) and differing data quality (numbers of replicates, signal to noise ratio, relevance of endpoints, different experimental conditions etc.), such that assessment of the quality and completeness of the collected data is a critical issue for modellers and regulators[92]. Previously, Klim- isch et al.[65]proposed criteria to assess the reliability of toxicological and eco-toxicological data based on the source of the toxicity data (i.e. whether the data were produced using interna- tional standard operating procedures (SOPs) such as the aforemen- tioned OECD test guidelines). Thereafter, Lubinski et al. [83]

expanded on these criteria to include PChem properties of NMs such as size, shape, and surface charge. Recently, Ha et al.[45]

and Trinh et al.[168] further extended criteria for evaluation of the quality and completeness of NM’s PChem data. The quality and completeness of these data were determined using a set of rules which specifically assigned a PChem score for each attribute reported (i.e. core size, hydrodynamic size, surface charge and specific surface area). The score for each attribute was composed

of two sub-scores; one for the reliability of the data source (score range: 0–3) and another for the reliability of the measurement method (score range: 0–2). To evaluate the quality and completeness of a dataset, the average values and standard deviation of scores of all data rows in a dataset are used[168].

2.1.1. Data generation and curation activities

To support the development of better prediction models, there has been several data generation and data curation and quality assessment activities reported. For instance, Hendren et al.[50]

introduced the NMs Data Curation Initiative almost 10 years ago, and explored the critical aspect of data curation within the development of informatics approaches to understand the behaviour of NMs, while Powers et al.[130]proposed a workflow for nanosafety data curation and Robinson et al.[92]discussed various issues in the evaluation of curated NM data, such as considering its completeness and quality, where the requirements for each will depend on the purpose for which the data was generated and/or will be re-used. Examples of experimentally generated and literature mined datasets suitable for modelling have also recently emerged. For example, Puzyn et al.[131]performed an experiment to determine toxicity of 17 different types of metal oxide NMs towardsE. coliand proposed a simple equation for prediction of the effective concentration at which 50% of the organisms were killed (EC50) using the enthalpy of formation of a gaseous cation.

Walkey et al.[177]published experimental data and a model utilizing protein corona fingerprints to predict NMs cellular attach- ment, consisting of 105 surface modified gold NMs, although it turns out that the corona isolation method utilized here inadver- tently removed albumin which is usually a major constituent of NM protein coronas[76,177]. Furthermore, Oh et al. [118] conducted a meta-analysis of more than 300 published articles report- ing the toxicity of quantum dots and found that only a few parameters - surface properties, diameter, assay type, and exposure time, contributed significantly to their toxicity. Gernand et al.[40]collected and analyzed literature data on the rodent pulmonary toxicity of uncoated, unfunctionalized carbon nanotubes (CNTs) and proposed that the main factors driving pulmonary toxicity of CNTs were metallic impurities, CNT length, CNT diameter, and aggregate size. Melagrakiet. al.[97]usedin silicomethods to investigate published datasets, constructing and validating a predictive model using an organized dataset on NMs cellular uptake of 109 NPs tested in pancreatic cancer cells (PaCa2). Recently, Ha et al.[45] extracted and compiled a dataset for 26 metal oxide NMs from 216 literature articles related to the toxicity of metal oxide NMs. Trinh et al.[168]on the other hand, collected cytotoxicity data of metallic NMs, which includes PChem properties, their measurement methods (PChem attributes), in vitro cytotoxicity assay conditions and resultant cell viability data (Tox attributes).

Each of these datasets has been, and continues to be, re-used in the development of predictive models for NM toxicity prediction.

Importantly, a number of large EC and internationally funded projects were recently completed (e.g. [29]), describing large libraries of well characterized NMs and their accompanying hazard and/or exposure datasets.Table 1lists these datasets, which are in various stages of curation and ontological annotation for semantic mapping and database integration, by project: eNanoMapper, NanoMILE, NanoSolutions, NANoREG, NanoReg2, caLIBRAte, NanoTEST, NanoFASE, the Nanomaterials Registry, the 2 NSF- funded centers for environmental implications of NMs (CEINT and CEIN) and South Korea’s S2Nano, for which several relevant PChem and biological endpoints have been assessed mostly using OECD methods.

Based predominantly on OECD documents from 2005 to 2017, Steinhauser and Sayre reviewed and summarized the key PChem properties, their preferred measurement metrics, as well as

(5)

strengths/weaknesses of intrinsic and extrinsic properties (where they go, i.e. persistence, or, what they do, i.e. reactivity) in terms of predicting NMs behaviour[151]. The major intrinsic properties that they focused on included particle size distribution (number average), particle shape (e.g. aspect ratio), surface area, redox potential/band gap, crystalline phase(s), hydrophobicity, chemical composition (impurities, surface chemistry), and, rigidity. The extrinsic properties related to persistence included biodurability, zeta potential, density, dustiness (depends on moisture), dissolution rate (in environment, acellular), agglomeration/hydrodynamic diameter (dispersion stability) and surface affinity. Those related to reactivity only included reactive oxygen species (ROS) production and photoreactivity. A key factor in determining the extrinsic properties is the need to characterize the NMs in the relevant exposure medium and across the relevant exposure times[87,89].

2.1.2. Computational approaches for data gap filling and interpolation Technological advances including high content and high throughput screening and omics approaches have transformed nanosafety research into a data rich field. Concurrently, nanoinformatics and ML-based in silico modelling applied to nanosafety require even larger data sets. For NM property models to be robust, predictive, and broadly applicable, large amounts of high-quality and complete experimental data are needed, that are organized and accessible. A current bottleneck is the fragmentation and inaccessibility of much of the data generated to date. To overcome this data fragmentation and facilitate model development, new processes need to be developed and implemented that will allow the capturing of both the NM data and the associated metadata making the produced datasets findable, accessible, interoperable and re-

usable (FAIR) [54,181]. This two-fold process aims at extending existing nanosafety and nanoinformatics databases with interfaces that allow curation, ontological annotation and semantic mapping of the data schema, and extraction of data and knowledge about NMs, as well as implementation of very focused set of experiments, designed to fill gaps identified in the existing large datasets. The combination of these activities will support the development of computational predictive toxicology methods, enablein silicomodels to perform at optimum levels, and increase the predictive power of the overall Integrated Approaches to Testing and Assess- ment (IATA) for human and environmental RA that is envisioned within NanoSolveIT. A strong focus is given to the production of curated, reliable NMs safety data under the FAIR principles[181].

Among the approaches available to overcome data fragmentation and data inaccessibility is data mining from literature and subsequent meta-analysis of the curated data[5,71], although this can be extremely time consuming if done manually. Text mining algorithms are under development[44,78]and their suitability for NMs is being evaluated within other ongoing nanosafety data projects such as NanoCommons. One issue arising from these literature mining approaches is the degree of data quality and completeness [83,92], as well as the comparability of datasets generated by different groups using different batches of NMs[102]or characterization protocols. More specifically, data quality concerns originate from the different methods/protocols and experimental conditions used for the data production (e.g. lack of characterization in the exposure media) and the lack of sufficient metadata to describe the data and ensure interoperability. Data completeness concerns are linked with the amount of PChem characterization of NMs performed prior to and during the experimental procedure in the rel- Table 1

Datasets from various sources contributed by NanoSolveIT partners which are currently being curated and ontologically annotated by NanoSolveIT for use in modelling and federation into a knowledge commons.

Projects Materials Information included Numbers

NanoMILE Diverse NMs (i.e. ZnO, CuO, Au, Ag, CoO, SiO2, BaTiO3, AlOOH, Si, CeO2, CuO, hydroxyapatite) with nanoinformatics andin vitro toxicity data

Size dependent nanodescriptors >3000

datapoints NanoSolutions A panel of 30 industrial NMs each with variants of surface –

uncoated, positive, negative and PEG coated. Also CNTs, Nanocellulose and more.

Omics and intrinsic properties on NMin vitro / in vivo effects

in vitrohigh throughput screening

>100 NMs 31 SmartNanoTox TiO2, SiO2, Au, carbon nanotubes etc. (binding free energies and

potentials of mean force for interactions for all)

Interactions of amino acids and components of lipids and sugars with NMs (computational and experimental data)

>5 NMs

NanoFASE TiO2, CeO2, Ag, Ag2S Transformations of NMs in the environment (air, water,

sediment, soil, waste treatment and biota) and release models

Nanomaterial Registry Diverse NMs NanoMaterials Registry Database >2000

datapoints NanoTEST TiO2, two fluorescent SiO2, Iron oxide coated and uncoated,

PLGA

Genotoxicity, cytotoxicity, uptake, oxidative stress >5000 datapoints S2NANO Various engineered NMs (e.g., Oxide NM, Metallic NMs, and

Carbonaceous NMs) 28–30 Curated from literature and experimental studies.

PChem properties / characterization; cytotoxicity assay conditions.

16 NMs datasets CEINT NIKC Ag/Ag2S, CuO, Graphene oxide, CNTs, CeO2, nZVI, cellulose

nanocrystals, TiO2, gold etc. – literature curated datasets / mesocosm datasets.

NM intrinsic, extrinsic (system dependent), social (e.g. use scenarios, matrix, concentration in products) properties;

System characteristics; Exposure / Hazard data; Meta-data (protocol, temporal and spatial descriptors etc.)

20 NMs datasets

UC-CEIN NanoDatabank Pristine MOx NM, quantum dots, CNTs/graphene 300 toxicological assessments, 150 investigations (curated data from over 500 publications)

PChem properties / characterization; toxicological assessments; NM fate, transport and material characterization.

->1000 NMs

Modern metal oxide NPs of 12 sizes between 5 and 60 nm 35 full particle nano-descriptors 24 NMs

10,080 datapoint NanoTOES Ag NMs: 3 different sizes same surface properties, Ag NMs

20 nm with 6 different surface properties

PChem properties / characterization; cytotoxicity and genotoxicity endpoints;

9 NMs,

>1000 datapoints eNanoMapper Publicly available datasets included PChem properties / characterization; Hazard data 636 NMs,

~1750 datapoints

(6)

evant exposure medium, and the respective endpoints (toxicological, omics etc.) measured. The impact of these factors on modelling can be increased data variance and unreliability and decreased model robustness and predictivity. This is especially true in the development of nano-QSAR/nano-QSPR (quantitative structure–

property relationship) models, where the variability observed in the PChem properties of NM might be higher than it is in reality [83].

To overcome these issues several approaches have been proposed[36,37,45,90,168]. For example, Ha et al.[45], demonstrated the benefits of data gap filling during the meta-analysis of the cytotoxicity of metal oxide NM using data mined from 216 publications, which resulted in 6,842 data rows and 14 attributes of nanostructure descriptors, PChem, toxicological and quantum–mechanical (QM) (computational) properties. Gap-filling was achieved using information from manufacturers’ specifications, references utilizing the same NM or with estimation from other PChem properties. Data quality was assessed using a scoring system based on the presence and origin of data. Gajewicz et al.

[36], on the other hand, proposed the use of read-across algorithms to predict the missing values and improve the predictive outcome of predictive models. In all cases, data gap-filling and increased data quality resulted in increased accuracy of the nanostructure–activity relationships models.

For such methods to be successful, detailed workflows for the experimental and literature data curation are needed[130]. Simi- larly, standardized experimental workflows, and complete report- ing of experimental procedures including computational and database mining, need to be established to ensure data repro- ducibility at a wider scale [17,76,137]. NanoSolveIT aims to develop the gap-filling approaches and curation workflows through the gathering of existing datasets originating from recently completed EU funded projects (e.g. NanoMILE, NanoFASE, caLIBRAte, NanoTEST and others reported inTable 1above), per- forming the necessary evaluation of the existing data and metadata and designing detailed experiments to fill the identified gaps in the NM PChem characterization and toxicity endpoints, thus increasing their quality. The resulting larger datasets will then be used by the modelling partners for the development of more robustin silico approaches with wider domains of applicability and enhanced predictive capability, while the developed standardized workflows will be made available for experimentalists to guide them in the production of the necessary scale and completeness of data for use in modelling approaches.

2.1.3. Dedicated NMs databases organized for modelling and informatics

The NanoSolveIT knowledge base will extend the NanoCom- mons / eNanoMapper databases with innovative, ontology-based application programming interfaces (APIs) to allow semi- automated curation and extraction of data and knowledge about NMs to support development of computational predictive toxicology methods. It will cover a wide range of data that researchers are looking for: omics data, nanodescriptors and relevant literature, as well as PChem properties and biological effects. There are two key aspects of the knowledge base. Firstly, NMs-specific datasets will be federated with other databases that are optimized for endpoints from proteomics, transcriptomics etc. for which well-established data management and deposition solutions already exist. The NanoSolveIT knowledge base will communicate via APIs and integrate the data via semantic mapping of their data schemas. Sec- ondly, the knowledge base will be enriched by integration of data from protein structures, known signalling pathways, crystal structure information, for example. These data will be used by physics-based modelling approaches (seeSection 2.3) to computationally design NMs and their biomolecule fingerprints. Other

types of data that can be enriched in the knowledge base include environmental data such as river pH, ionic strength, dissolved organic matter etc. which can support development of enhanced models for prediction of NM’s environmental transformation and consequent ecotoxicity.

Furthermore, the NanoSolveIT knowledge base will adopt Open Science approaches to trigger open innovation with related projects and future users and collaborators. This will result in the NanoSolveIT Knowledge Infrastructure containing curated, reliable NMs safety data, accessible and reusable within the project, by the nanoinformatics community and by all stakeholders. The NanoSol- veIT knowledge base will be helpful for those researchers inter- ested, not just in the structural characteristics (descriptors) and PChem properties, but also in the known biological effects of particular NMs. Researchers can use the interface to navigate through the available data or retrieve data for further exploitation.

2.2. Toxicogenomics modelling (predictive models using omics data) Toxicogenomics modelling is a subdiscipline of pharmacology that deals with information about gene and protein activity within a particular cell or tissue of an organism in response to exposure to toxic substances. It can be used to link the safety of NMs to the underlying biological mechanisms of their toxicity. Gene expres- sion data can be combined with biological pathway information to identify possible adverse outcomes[43,67,110]. In order to make sense of the large volume of data generated by bioinformatics analyses, SOPs for the interpretation of the results in the correct biological context need to be established [46]. Adverse Outcome Pathways (AOPs), which describe in mechanistic detail the sequences of events that are necessary for an exposure at cellular and subcellular levels to lead to an adverse event or outcome at the organ or organism level, are a useful tool for organizing bioinformatics and other types of results into a predictive framework [111].

During the last decade, multiple efforts have aimed at charac- terizing the mechanism (mode) of action (MOA) of toxic chemical exposures using transcriptomics profiling of the exposed biological systems. This generated large reference data sets such as connectivity map [152], TG-GATEs [51], DrugBank [184] and LINCS L1000[66]. These have been extensively used for drug reposition- ing (e.g.[53,106]) and toxicity prediction (e.g.[67]). The general concept of this approach is that toxicogenomic experiments identify the primary molecular changes in cells and tissues as a direct consequence of toxin exposure, and hence directly inform the toxicity pathways of tested compounds[43]. As an example, Predic- tive Toxicogenomics Space (PTGS) components, which were developed based on connectivity map data to predict organ toxicity, are likely useful as descriptors of AOP-linked MOAs and key events (KEs) in the affected signalling pathways[67].

Furthermore, when multiple time points are screened, toxicogenomics data can give a robust insight into the toxicokinetics and can further assist the drafting of an AOP[35,111,185]. Toxicokinet- ics describes the absorption, distribution, metabolism and storage/

excretion of chemical toxicants, while, toxicodynamics describes the adverse (biological) effects that a toxicant has on an organism, e.g. altered structure/function and disease. Both processes are determined by the structure (morphology) and PChem properties of NMs, e.g. size, shape and surface reactivity[95]. Toxicogenomics data can also be exploited to infer similarities between different types of exposure and between exposure and human diseases [145]. Finally, new opportunities are emerging for integrating

‘omics data with intrinsic NMs exposure properties to allow hybrid quantitative structure and MOA activity relationships to be developed[146].

(7)

More recently, toxicogenomics approaches have been employed to describe the MOA of NMs in various exposure scenariosin vitro (e.g. [63,101,144]) and in vivo (e.g. [47,52,64,70,107,132,141]).

When multiple doses of toxin are screened by ‘omics technologies, dose-dependent events can be extrapolated in order to further dis- sect specific mechanisms of toxicity[133,158]. In fact, dose metrics are a basic requirement for anyin vitroscreening to assess potential health risks of NMs. Genomic dose responses can be used to define the biological potency of a material as well as points-of- departure concentrations denoting adverse levels of exposure to the organism or cell. Typically point-of-departure concentrations are set using concentration response modelling or the lowest observable effect level approach [158]. Although transcriptomic response itself is often triggered as a response to an adverse reac- tion to the environment, a pathway-level concentration is typically used, as this value is more robust than a gene-level response and is directly connected to a known biological response. Further modelling methods, such as the PTGS, can be helpful to differentiate between adverse, adaptive and benign (or even beneficial) biological responses to chemicals. Mechanisms or pathways connected to key event responses in known AOPs are also applied for selecting adverse responses among bioinformatics analysis results [111].

When cell culture data is utilized, there is also the need to extrap- olate the cell culture concentration to a biological exposure sce- nario which in the case of NMs is typically inhalation-based [111]. However, focusing on mechanistic aspects, the pathway- level benchmark concentrations can be used to rank NMs based on biological potency. Taking the concentration response into account is also helpful for biological grouping, e.g., for selecting optimal treatments for connectivity mapping-based biological sim- ilarity analyses. As the cost of toxicogenomics is steadily being reduced, its use in safety assessment and mechanistic analyses will only grow in importance.

NanoSolveIT is collecting existing toxicogenomics data, then transforming, analyzing and modelling it. The project has multiple aims, including deriving signatures of NM MOAs to support AOP

development; generating useful predictive models of the biological effects of NMs; and developing computational methods and software to be included into a robust computational platform forin sil- iconanosafety analysis (as seen inFig. 1).

2.2.1. Deriving signatures of NM MOA to support AOP development Toxicogenomics has provided unprecedented opportunities to clarify the MOA of many chemical exposures. The essential idea is to identify a set of genes that are significantly altered in a given biological system of interest due to exposure to a toxic agent.

While conventionally, ‘omics data analysis resulted in lists of dif- ferentially expressed genes, these are not per se informative of more complex patterns of regulation that underlie broader biological functions. To elucidate these functions, systems biology approaches, such as gene network reconstruction and inference, have been used to identify complex patterns of molecular regulation and co-regulation[64,192]. Moreover, the systematic annotation of molecular changes into known biological pathways has helped define toxicity and other AOPs [70,111,144]. To date, the transcriptome has been the primary molecular focus of such studies, followed by proteomics and metabolomics. Multi-omics approaches have already been used extensively to generate more thorough landscapes of molecular changes in human diseases (e.g. The Cancer Genome Atlas) and are beginning to be used to build more general models of NM MOA [144]; [143]. Omics- derived biosignatures are valuable for comparing the effects of different types of exposure, such as NMs and small molecules that would otherwise be difficult or impossible to detect by other means. For example, omics-derived NM biosignatures have been systematically compared to those from small molecules, drugs and human diseases in search of direct exposure-disease associa- tions[145]. These analyses have identified biomarkers useful for biochemical assays in the zebrafish model, enabling the drafting of an AOP for metal and metal oxide NMs impacts on the central nervous system[75].

Fig. 1.Schematic overview of the workflow for toxicogenomics modelling and how these models feed into the subsequent materials modelling and IATA. AO – Adverse Outcome; ENM – Engineered Nanomaterial; KE – Key Event; MIE – Molecular Initiating Event.

(8)

2.2.2. Deriving signatures of NM MOA that define robust predictive models based on NM bioactivity

Toxicogenomics data has more recently been used to generate predictive models of toxicological and pharmacological interest.

Multiple strategies have been employed: identification of specific biomarkers[145], or discovery of broader gene sets with strong predictive ability (PTGS, zebrafish-based toxicogenomic space, etc.). Typically, ‘omics data analyses use univariate statistical testing, where each molecular feature is tested for significant differences between exposed and unexposed sets in replicate samples.

However, these approaches allow derivation of only very limited individual or linear or concatenated biomarkers. The output from these analyses, importantly, does not guarantee biomarker specificity, although they are often referred to as such in the literature.

Thus, more sophisticated feature selection strategies are needed, that allow non-linear combinations of molecules, or sets of biomarkers that are more specific. To this end, a number of algorithms have been proposed including GALGO[138]DIABLO[149], MANGA[145], MLREM[138,167,9,34,6,147].

2.2.3. Developing computational methods for inclusion into a robust computational platform

Despite the great value of toxicogenomics approaches in identi- fying important (NM) toxicity mechanisms in an unbiased manner, these approaches have thus far struggled to be implemented in the mainstream regulatory framework for chemical safety assessment.

One reason for this is that toxicogenomics data are usually difficult to interpret without strong skills in bioinformatics and biostatis- tics. Importantly, the scientific community has been successful in developing improved analytical methods but has not yet agreed on formalized standard analytical SOPs. Clearly, omics data analysis is used extensively in other biomedical research fields, so many of these standard methods should be transferrable to NM toxicogenomics problems. The NM toxicogenomics community needs to ensure that this existing expertise is converted into standardized pipelines and software for nanosafety applications. Examples of useful methods are the eUTOPIA software for omics data prepro- cessing[94]and INfORM for gene network inference[93]. Similar efforts are being undertaken, within NanoSolveIT and several other EU projects, to further resolve dose dependent patterns of molecular change and to benchmark the resulting toxicogenomics and AOP models to increase their utility and acceptance by the community.

2.3. Multi-scale modelling framework for NMs property prediction Adverse human health effects can be triggered and modulated by molecular-level interactions at the bionano interface, i.e. a nanoscale layer where biological entities meet foreign materials.

These interactions are often non-specific and unintended. The currently poor understanding of the bionano interface means that RA for NMs and biomaterials broadly is largely based on empirical evidence and not on the mechanistic action of the adverse effects. In general, NM properties primary responsible for adverse effects are largely unknown, or are not the same as the PChem properties that can be routinely measured[89,136]. Understanding these interactions and the bionano-interface structure will assist with developing safety regulations and reducing the associated health risks but also with achieving improved control over the surface activity in nanotech-based applications.

Steinhauser and Sayre[151], reviewed the OECD guidance documents for NMs risk assessment, which provided recommenda- tions regarding the measurement and assessment of occupational exposure, consumer exposure, environmental fate, ecological effects and biokinetics, as well as considerations forin vitrotesting of NMs, for increased reliability and relevance. However, of partic-

ular interest for NanoSolveIT were the guidance documents for exposure modelling and QSAR modelling. While QSAR models usually employ two-dimensional (2D) structural information from molecules, they can employ three-dimensional (3D) information also, making them suitable for predictions of NMs properties and behaviour. The benefit of 2D is that it provides a good visualization of the structure, where one can easily identify the connectivity of atoms, the presence of specific functional groups and predict reactivity. However, 3D focuses on the molecular level and includes additional information related to bond distances or angles, as well as connectivity or binding to ligands in relation to surface topogra- phy and can account for extrinsic properties such as formation of a protein (biomolecule) corona, for example.

2.3.1. Computational NMs descriptors

In addition to direct correlations between the NM structure (expressed in term of nanostructure descriptors), properties and toxicity, as described above, interactions at the bionano interface can initiate AOPs via sequestering or unfolding of proteins central to molecular initiating events (MIEs) and KEs of the corresponding pathways [21,32,86,88]. Although they may not be completely independent of the basic features of the NM (as expressed either directly by nanostructure descriptors or by their intrinsic properties), a systematic evaluation of these types of properties that express protein affinity, protein unfolding and potential formation of cryptic epitopes that can induce new signalling pathways[86], may make predictive models more compact and robust.

Simply stated, adescriptoris the final result of a logic/mathe- matical process that transforms chemical-based information (en- coded within a symbolic representation of a chemical structure) and, in the case of NMs, also physical-based information (i.e. morphology) into a useful number that can be exploited by a predictive model. Thus, descriptors provide unique information required to draw (or build a molecular model of) a NM. Whereas property (e.g. solubility) should be considered as a consequence of the NM’s structure; it is impossible to deduce back the structure from such properties only. The properties can be either measured experimentally or computed with use of first-principles-based methods (e.g.ab initio, Density Functional Theory, molecular dynamics). The correct distinction betweendescriptorsandpropertiesis important, since only the structure (descriptors) can be directly controlled by a designer in the safe-by-design process.

There are various levels for chemical structure representations (descriptors) ranging from one-dimensional (1D) descriptors (e.g.

basic molecular formulas), to 2D descriptors (e.g. connectivity index), as well as 3D that are conformation dependent (e.g. dihe- dral angles, radius, shape).

Examples of computational properties based on NMs interactions that can be related to MIE, KE or AOPs include: composition of the NM protein corona; adsorption enthalpy of amino acids, lipid molecules, or proteins onto the NM surface; adsorbed protein or NM hydrophobicity; production of ROS, dissolution of NMs leading to release of ions, all of which must be determined in realistic environments[20]. Calculation of properties based on a full-particle molecular model, usingab initioquantum chemical (QC) or even semi-empirical methods remains unfeasible in the near future due to the large size of NMs and thus the associated enormous computational time needed. Therefore, a significant effort was invested by the community to develop different approximations and simplified molecular models of NMs to derive NMs properties.

One of the first approaches for calculation of nanodescriptors was the design of optimal molecular descriptors by Toropov et al.

[162,164,166]. These descriptors are calculated from SMILES structures and consider the chemical composition of NMs. Information about experimental conditions of NMs synthesis can be included

(9)

in the descriptor calculation. This type of descriptor has been successfully used to model the toxicity of NMs[162,166].

QC calculations based on small clusters of atoms can be used to obtain such properties as HOMO-LUMO gap (band-gap between conductance and valence electrons) and enthalpy of formation, which is currently not possible for full-sized NMs. QC properties can be directly used to model toxicities of NMs without taking into account the size dependency of the properties [131] or can be extrapolated to obtain properties values for specific size of NMs [56]. To take into account the effect of size on the properties of NMs, the calculations based on so-called full-particle molecular model should be performed. Such type of calculations for metal oxide NMs have been performed by[157,155,91,11]. Full-particle properties were derived from molecular mechanics calculation of NMs and describe the energetics (potential energies), coordination numbers and other attributes of the NMs. Full-particle descriptors and properties have been successfully used to model the toxicity of metal oxide NMs[156].

Surface modifications, such as change in coating materials, influence the properties of NMs and should also be included in the modelling process. Xia et al.[186]) proposed a biological surface adsorption index to describe competitive adsorption of proteins onto the surface of NMs[187]. The adsorption coefficient is expressed as a logarithmic function of five descriptors: excess molar refraction (representing molecular force of lone-pair electrons); the polarity/polarizability parameter; hydrogen-bond acid- ity and basicity; and the McGowan characteristic volume describing hydrophobic interactions [186]. Experimentally obtained log K values can be used to derive five descriptors for surface forces related to adsorption. Combining all of these approaches would allow the characterization and modelling of NMs in biological systems accounting for all important aspects (electronic effects, size dependency and surface modifications) and would greatly improve the quality of nanoQSAR models.

2.3.2. Modelling (predicting) NM corona composition

The structure of the bionano interface can be simulated using first principles physics-based methods. Such simulations are often computationally intractable or use model systems that do not cap- ture the complexity of real biological environments[72]. The relevant system sizes are too large for direct atomistic simulation, so the properties of interest can only be accessed using a coarse- grained (CG) representation, where sub-nanometer interactions have been integrated out. Molecular details of the NM are preserved when the CG model is constructed using a multistep approach, where each layer is parameterized from simulations at finer resolution[129]. Atomistic simulations also have challenges as they rely on accurate and validated force fields. Where such force fields are available, all-atom Molecular Dynamics (MD) methods can be used to construct a united atom representation of the NM and associated biomolecules. Computational study of new materials requires development or optimization of new atomistic force fields based on QC calculations or experimental data[8].

Coarser scale simulations also have challenges due to the nature of biological samples: the number of relevant biomolecules inter- acting with the NM can be enormous; for example, human plasma contains over 3,700 proteins and even larger numbers of metabo- lites[16]. The corona composition (lists of proteins and metabo- lites (lipids and other small molecules) known to interact with a specific NM) may therefore be an impractical property to be used for predictions, although meta-analysis of over 63 NMs plasma corona studies suggested that about 125 proteins form the interac- tome of NMs[176]. Each NM immersed in plasma typically has its own unique corona that may involve hundreds of different proteins[13,23]. Proteins in the corona reflect the functionalities on the NM that bind specific types of biomolecule[85]. This changes

over time, as the most abundant proteins bind first, and are subse- quently replaced by less abundant but more tightly bound proteins, and the corona also evolves as the NMs are internalized into cells for example[84], and the cells respond to the presence of the NMs [2]. Capturing this complexity in descriptors used in ML models is very challenging, and often statistical properties of descriptors for the proteins are used.

Early examples of corona-based predictive schemes exist in the literature. An extensive gold NMs protein corona dataset generated and analyzed by Walkey et al.[177], was re-analyzed in order to identify and quantify the relationships between NM-cell association and protein corona fingerprints in addition to NM PChem properties[1,80]. QSAR models were developed based on both linear and non-linear support vector regression models making use of a sequential forward selection of predictors. For example, an initial pool of 148 predictors was used, with the analyses eventually iden- tifying 10 corona proteins and 3 PChem characteristics (NM size and zeta potential in cell culture medium) as the most significant factors correlating with NM cell association [1]. As more data emerges, including on the small molecule or metabolite corona and how these interact with proteins to form the complete corona [16], refined models and more detailed predictions of the composition and impact of the NM corona will emerge.

NanoSolveIT recently proposed a multiscale modelling scheme that enables modelling of large molecular assemblies in both length and time domains, which is not achievable by traditional atomistic simulations. This allows information on NM and biomolecule specificity to be preserved[82,81,129]whilst also enabling calculations in reasonable times. The NanoSolveIT method (shown schematically inFig. 2) uses a systematic CG method that includes:

parameterization of the atomistic force-field for the NM;

calculation of interactions of the biomolecule building blocks (amino acids, lipid segments, DNA bases) with the surface of the NM and interaction between the building blocks at the atomistic level under specified conditions;

parameterization of the CG force field for biomolecule building blocks and construction of a NM of arbitrary size and shape;

CG modelling of interaction of entire biomolecules with the NMs’ surface and calculation of preferred orientation and the mean adsorption energies for the bound biomolecules;

further coarse graining for lipids and proteins to make united amino acid blocks and study competitive adsorption and bionano-interface structure.

The multiscale modelling framework proposed by NanoSolveIT enables calculation of descriptors and properties of the bionano interface for a large number of biomolecules in a short time. The new properties include those for proteins (principal moments of inertia, charge, dipole moment, hydrophobicity indices, and the solvent accessible area) and those for interaction with NM (Hamaker constants for residues, their mean adsorption energies, and the overall adsorption energy for the protein globule or lipid molecule)[82]. These properties will be used to produce interaction fingerprints for arbitrary NMs with respect to specified biological activities and will thus provide key information for toxicologically relevant predictive modelling, e.g., for predicting NM ability to induce an AOP via MIE or KE.

Several models describing how binding to NM surfaces affects protein conformations and subsequent recognition behaviour have appeared in the literature. These will be very useful for connecting corona composition to molecular initiating and other key events in AOPs. Multiscale MD simulations of a single NM with the protein ubiquitin demonstrated that ubiquitin competed with citrates for the NN surface. At a high protein/NM stoichiometry, ubiquitins formed a multi-layer corona on the particle surface, with the pro-

(10)

teins exhibiting conformational changes that included destabiliza- tion of

a

-helices and increasedb-sheet content of the proteins[22].

A significant challenge with this approach is that it is unknown whether this protein binds under competitive conditions (e.g. in plasma), and whether these conformational changes occur in real systems. However, the correlation between unfolding of a specific protein and receptor activation has also been demonstrated experimentally and modelled using MD. Ding et al. showed that specific sizes of negatively charged poly(acrylic acid)-conjugated gold NPs bound to, and induced unfolding of, fibrinogen (Fg). This promoted interaction with the integrin receptor, Mac-1, leading to increased NF-

j

B signalling and release of inflammatory cytokines[21]. Build- ing on this work, Kharazian et al. used MD simulation to investigate how poly(acrylic acid) coats a gold NM surface. The root- mean-square deviation (RMSD), radius of gyration (Rg) and solvent accessible surface area (SASA) properties from the calculations showed that the gold surface can induce Fg conformational changes favouring an inflammation response [62]. They suggest that the integrity of coatings on ultra-small gold NMs are compro- mised by the large surface curvature, and that surface coatings may be degraded by physiological activity. Other modelling studies have also assessed biomolecule conformation in NM coronas and, where relevant, these approaches will be integrated into the Nano- SolveIT toolbox.

2.3.3. Connecting NM-coronas to biological impacts/AOPs

Data exchange and reuse puts significant limits on how we rep- resent the biology and chemistry of NMs. For example, to compare the risk of two NMs, it is necessary to know whether they are chemically similar and if they are behaving biologically in the same way (e.g. via read across studies). To ensure data reuse is possible, the NM data representations must be interoperable, i.e. both humans and machines must be able to make such chemical and biological comparisons. The bioinformatics and cheminformatics communities have developed extensive methods to perform such tasks. To be reusable, data must meet community standards around data quality[92], be interoperable (i.e. be able to interact with other databases), be machine accessible (i.e. be annotated to allow computers to understand the individual datapoints), not be hidden behind firewalls and be findable by search engines and models. Recently, these ideas were summarized in the FAIR principles[181]and the need for interoperability and data linking was recently outlined in a position paper[60]. A key step for FAIRness

was the development of a common NMs ontology to facilitate data interoperability[48,100].

Several recent studies focused on developing models of NM- related properties and biological effects [10,123,142,159, 160,161,163]. For example, a set of 18 NMs were studied by Wang et al.[178]and showed that factors such as metal content, surface charge, and particle morphology induce high toxicity. Melagraki et al. developed, a predictive classification model based on OECD principles, for the toxicological assessment of iron oxide NMs with different cores, coatings and surface modifications based on a number of different properties including size, relaxivities, zeta potential and type of coating [97]. More recently a predictive nanoinformatics model, validated according to the OECD principles, has been developed for the prediction of the protein binding and the cytotoxicity of functionalized multi-walled carbon nanotubes[172].

Numerous other as yet-unexplored features may also be important for generating adverse effects from NMs. For instance, there is evidence to suggest that NMs, especially in the lower nm range, penetrate biological membranes and are able to reach organs that are otherwise inaccessible for larger substances[14]. Exposure to specific NMs has also been demonstrated to cause adverse biological effects by increased production of ROS, such as oxyradicals [79,103,112]. However, some NMs may be less harmful than their corresponding bulk forms in some instances[61] and even two NMs from the same source, with similar sizes and chemical com- positions may exhibit diverse effects[121]. Clearly, factors other than size, shape and surface area are also important in controlling the interactions and effects of NMs in biological systems. For example, Zhang et al. proposed that the higher toxicity of fumed silica relative to Stöber silica stems from the formers’ intrinsic pop- ulation of strained three-membered rings along with its chainlike aggregation and hydroxyl content[189].

Surface coatings are crucial for control of both useful and adverse effects of NMs. They provide a great opportunity to improve materials through rational design, by enhancing a useful property and reducing adverse effects. This approach has been demonstrated for multi-walled carbon nanotubes (MWCNT) and asbestos fibers, where structural similarities suggested potentially harmful effects according to the so-called ‘fiber paradigm’. The similarities in the two structures guided a pilot study, which showed that when long MWCNTs are injected into the abdominal cavity of mice, asbestos-like pathogenic effects can be induced [125]. However, less rigid forms of CNTs had lower toxicity[105], Fig. 2.Schematic illustration of the NanoSolveIT approach to multi-scale modelling of NM interactions with biomolecules to form the biomolecule corona which provides the biological identify to the NM and determines its subsequent uptake and impacts in cells and organisms.