(CR) Article #29 Notes: Computational predictive approaches for interaction and structure of aptamers

Article notes should be on separate sheets

Source Title Computational predictive approaches for interaction and structure of aptamers

Source citation (APA Format)

Emami, N., Pakchin, P. S., & Ferdousi, R. (2020).

Computational predictive approaches for interaction and structure of aptamers. Journal of Theoretical Biology^, 497, 110268. https://doi.org/10.1016/j.jtbi.2020.110268 Original URL https://www-sciencedirect-com.ezpxy-web-p-u01.wpi.edu/science/arti

cle/pii/S0022519320301235

Source type Scientific Review Article

Keywords (From article) Aptamer; Interaction prediction; Structure prediction;

Affinity; Machine learning Summary of key

points (include methodology)

This paper provides abundant knowledge of computational methods that can be utilized to effectively predict the structure of the aptamer and the interactions that it has with its corresponding ligand/antigen.

One such method of this includes using machine learning to better design a computer system that can take a sequence, translate it into certain properties in a chart known as a feature vector, and then further develop that data (methods vary) to predict the structure of the aptamer. It also mentions numerous softwares that can be used (it includes the URLs for some of them) for said prediction process, which can be used to further the development of an aptamer to make one that is more specific than similar aptamers.

Research

Question/Problem/

Need

What is the most ideal method for predicting the structure of an aptamer based on the way it interacts with other molecules?

Michalak 79 Important Figures

Two different ways 3D aptamer structures can be predicted based on the antigen and other data.

Notes INTRODUCTION

● In the 1990’s, aptamers were first proposed as molecules that can novelly detect target molecules

● It is because of the short sequence and 3-D structure that make them effective for use of detection of target molecules

● They have been used to detect proteins, viruses, lipids, ions, carbohydrates, nucleic acids, cells, and other biological substances effectively

● Aptamers take advantage of van der Waals forces, their 3-dimensional structure, hydrogen bonds, stacking and electrostatic interactions for their detection of antigens (they use these forces, etc. when trying to bind to antigens)

● Aptamers do indeed function similarly to antibodies, a well-known detection molecule used for assays and more

● Despite this, because of all of the points that follow, they are more well-suited for detection than antibodies

○ their ability to function in a wider range of temperatures

○ their ability to be inexpensively replicated via PCR

○ their short length that leads to them lasting longer

○ They can be screened with in-vitro processes and libraries

● Aptamers can also be modified to serve a specific purpose as well (via their DNA bases, etc)

● Using systematic evolution of ligands by exponential enrichment (SELEX), created in 1990, aptamers can be selected and amplified over and over until ideal aptamers are obtained for use

Michalak 80

● SELEX takes many weeks to carry out, and is usually not efficient or very effective

● As a result, the use of computational methods in the prediction of aptamers have been growing in popularity

○ Using aptamer databases, several alternative methods for selecting and predicting aptamers have been developed for both 2-D and 3-D aptamer structures

○ Many of these can take place inexpensively on a computer, like in-silico methods for design INTERACTION PREDICTION

1. Determine your proper features that you need to use to create your feature vector of data values

a. This is the data about both your aptamer and target i. This data can include sequence-based

features, structure-based features, and energy-based features, each with their own properties that determine specific ideas and concepts about the molecules in question ii. TARGET: Casein, I can find all of that iii. APTAMER: But I don't have it yet?

b. These features can be determined using pseKRAAC, iFeature, and/or Seq2Fearture

2. Take many feature vectors of values and combine them in some sort of way so that they can be compared?

3. Then multiple feature extraction techniques need to take place to fully represent the data

a. For forms of feature extraction

b. 1. One-hot encoding: rather easy to do and very effective, especially when coupled with extended one-hot Encoding (4)

c. 2. K-mer frequency

d. 3. Continuous distributed representation e. 4. Extended one-hot encoding

4. Feature selection then needs to be performed as well, which the begins to sift out all of the irrelevant information from each feature vector to improve upon the efficacy of the data

5. Take the data that has been developed and plug it into a Machine Learning (ML) algorithm

a. 4 Types of ML

i. Supervised, semi-supervised, and

unsupervised, and reinforcement learning (methods)

b. Examples of companies/devices that perform such tasks include LPI-ETSLP (lncRNA-protein interaction

Michalak 81 based on eigenvalue transformation-based

semi-supervised link prediction), DLPRB (Deep Learning for Protein-RNA Binding)

● Other methods were looked into to evaluate the accuracy of the models and tests developed, however for now they will not be looked into in this Project Notes document

● Several URLs to websites and pages were provided which used websites or tools used to predict the interactions (even the ideal ones) of specific RNA molecules with proteins

● It is also important if the 2-D and 3-D structures of the aptamers can be determined to better understand them

○ These have been proven to be done with methods that involve computation, involving sequence- and structure-based methods for this prediction

● The Following Tools (with links to them in Table 3) were proposed as ways to do this:

○ RNA composer

○ RMDetect

○ MC-Fold/MC-Sym

○ (See more in Table 3) 2D STRUCTURE PREDICTION

● 2 main ways to do this, both usually more useful when working with structures in which the homologous folds or enough auxiliary structural data are known:

○ Single-sequence analysis: Finding the minimal free energy of only one sequence by rearranging all of the possible combinations to make up a structure. Some of the parameters used for experiments in this category utilize the nearest-neighbor model

○ Multiple sequence analysis: doing something very similar to the above, but doing it on multiple

sequences, instead of one. ValFold has been used for this, especially as 2-D prediction software, as well as the TurboFold (and TurboFold II) algorithm

3D STRUCTURE PREDICTION

● One part of it: Sampling

○ Using biophysical rules, data and results can be confirmed using Monte Carlo simulations or modular dynamics, and others

● Another part of it: Scoring

○ Relies on the similarities of sequences

○ Generates a rank of the biological substances that are currently being evaluated

● Template-based modeling (and methods) has been used for 3-D structure modeling in the past. This involves using a template to model and identify the specific targets

Michalak 82

○ Utilizes information from a template library to choose the ideal sequence needed, and after some refining it develops the resulting predicted model

● Additionally, free modeling has also been used

○ It uses physics and the knowledge from a fragments library to assemble and determine the predicted 3-D model of a molecule

● Both template-based modeling and free modeling both start from a target sequence that is being looked for

● Using the computational techniques described in this paper, the aptamer databases can be developed successfully

○ These DBs are what can be used to perform all tasks on aptamers

● More tools specific for aptamer development are still in great need for advancement in this field

Cited references to follow up on

https://www.nature.com/articles/nrg3920

Follow up Questions ● What kind of information exists in these aptamer (etc) databases that are interpreted as different labels of different targets of aptamers and determined as positive or negative results?

● Why exactly do we need a learning algorithm for this development here?

○ To actually develop a program that can effectively predict the sequence and shape of your proteins based on the one that you develop

● How much of this machine learning process will I actually have to carry out?

● What are some drawbacks of some of the current aptamers that exist (or the ones that I have been working with), such as cas1, and how can these properties be improved upon to create a more ideal aptamer?

○ Would it be a good idea to create a feature vector of current casein aptamers (for a comparison process to be carried out)?

Michalak 83

(CR) Article #30 Notes: In silico molecular docking in

In document Project Title: Avoiding Cross-Contact: Using Bioluminescent Aptamers to Detect Casein Name: Griffin Michalak (Page 79-84)