Skip to main content

Structure based approach for understanding organism specific recognition of protein-RNA complexes

Abstract

Background

Protein-RNA interactions perform diverse functions within the cell. Understanding the recognition mechanism of protein-RNA complexes has been a challenging task in molecular and computational biology. In earlier works, the recognition mechanisms have been studied for a specific complex or using a set of non–redundant complexes. In this work, we have constructed 18 sets of same protein-RNA complexes belonging to different organisms from Protein Data Bank (PDB). The similarities and differences in each set of complexes have been revealed in terms of various sequence and structure based features such as root mean square deviation, sequence homology, propensity of binding site residues, variance, conservation at binding sites, binding segments, binding motifs of amino acid residues and nucleotides, preferred amino acid-nucleotide pairs and influence of neighboring residues for binding.

Results

We found that the proteins of mesophilic organisms have more number of binding sites than thermophiles and the binding propensities of amino acid residues are distinct in E. coli, H. sapiens, S. cerevisiae, thermophiles and archaea. Proteins prefer to bind with RNA using a single residue segment in all the organisms while RNA prefers to use a stretch of up to six nucleotides for binding with proteins. We have developed amino acid residue-nucleotide pair potentials for different organisms, which could be used for predicting the binding specificity. Further, molecular dynamics simulation studies on aspartyl tRNA synthetase complexed with aspartyl tRNA showed specific modes of recognition in E. coli, T. thermophilus and S. cerevisiae.

Conclusion

Based on structural analysis and molecular dynamics simulations we suggest that the mode of recognition depends on the type of the organism in a protein-RNA complex.

Reviewers

This article was reviewed by Sandor Pongor, Gajendra Raghava and Narayanaswamy Srinivasan.

Background

Protein-RNA interactions play critical roles in determining the structure of the ribosome and spliceosome, and gene expression. The interaction of proteins with RNA has been generally explained using different types of motifs such as Arginine rich motif, RNA recognition motif, GXXG motif, double stranded RNA binding motif, tetra loops (GX[GA]A) in RNA and so on [1]. The recognition mechanisms of protein-RNA complexes and their functional importance have been mainly elucidated by three-dimensional structure determination of protein-RNA complexes [2] along with other molecular biology experiments such as site directed mutagenesis, fluorescence resonance energy transfer (FRET) imaging, etc. The structures of protein-RNA complexes have been effectively used for identifying the binding sites using distance based criteria, solvent accessibility based method and energy based approach [3-5].

The availability of protein-RNA complex structures in PDB [6] has enabled researchers to develop secondary databases [7,8] and to analyze the binding sites in terms of atomic contacts, amino acid composition, preference of residues, secondary structures, solvent accessibility, electrostatic interactions, hydrophobic contacts, hydrogen bonding, cation-π, stacking and van der Waals interactions [3,9,10]. The results obtained from the structural analysis of protein-RNA complexes have been successfully utilized for understanding their recognition mechanism and predicting the binding sites. Further, Pietal et al. developed a method for visualizing and analyzing contact and distance maps for protein-RNA complex structures [11]. Recently, Fornes et al. reviewed the applications of knowledge-based potentials for evaluating the models of protein-RNA interactions along with other complexes [12].

On the other hand, several methods based on machine learning techniques have been proposed for identifying the binding sites in protein-RNA complexes. These methods utilize different features such as side chain pKa, hydrophobicity index, molecular mass, evolutionary conservation, predicted secondary structure, solvent accessibility and PSSM profiles [13-17]. Recently, Nagarajan and Gromiha (2014) analyzed the performance of various methods for identifying the binding sites in protein-RNA complexes based on protein structural class, fold, family, superfamily, function, RNA structure, and conformation.

The structural analysis of protein-RNA complexes and prediction methods mainly utilize non-redundant set of complexes for avoiding bias in the analysis. This assumption is based on the fact that the structure and function of protein-RNA complexes are similar if the protein sequences are homologous to each other. We have addressed this issue by analyzing the binding sites of same protein-RNA complexes belonging to different organisms in which the protein sequences are redundant among themselves. We have developed a dataset of protein-RNA complexes from different organisms with high sequence identity and identified the binding sites. The binding sites have been analyzed in terms of binding propensity, amino acid-nucleotide pair preference, binding motif etc. We have found that the proteins of mesophiles contain more binding sites than thermophiles and the binding propensities of amino acid residues are distinct in each organism. Positively charged residues have high preference in E. coli, aromatic residues are preferred in S. cerevisiae, polar residues in thermophiles, Gly and Trp in H. sapiens and a mixed combination of residues in archaea. The binding propensities of polar residues showed high variability among different organisms at conserved positions. The analysis on the preference of amino acid-nucleotide residue pairs revealed that the amino acid residues prefer to pair with cytosine in E. coli though the preference is mainly with adenosine in H. sapiens and S. cerevisiae. Thermophiles and archaea showed high preference to interact with cytosine and uracil, respectively. Further, molecular dynamics simulations studies on aspartyl tRNA synthetase complexed with aspartyl tRNA (AspRS-tRNAAsp) indicated distinct modes of recognition in different organisms.

Methods

Dataset

We have constructed 18 sets of protein-RNA complexes belonging to different organisms. The datasets have been obtained by carefully searching such complexes in PDB [6] with the following criteria: (i) structures of protein-RNA complexes are known for at least two organisms, (ii) protein should have a minimum of 30 residues, (iii) RNA should have at least 5 nucleotides and (iv) the sequence identity of proteins among these complexes is more than 25%. The list of 18 sets of complexes along with their structural similarity (RMSD score) and sequence identity have been summarized in Table 1. The crystallization temperature is 100 K for most of the complexes (>90%) and all of them are expressed in E. coli [6].

Table 1 List of protein-RNA complexes used in the present study

Identification of binding site residues

Generally, binding site residues in protein-RNA complex structures have been identified with three different criteria: (i) distance between contacting atoms in protein and RNA using a specific cut-off value [18,19], (ii) reduction of solvent accessibility upon binding [20] and (iii) inter-residue interaction energy [21]. We have used the distance based approach to identify the binding site residues/nucleotides for the considered protein-RNA complexes. In this method, we have calculated the distance between the heavy atoms in protein and RNA. Two atoms (one in protein and another in RNA) are considered to be interacting with each other if the distance between them is less than 3.5 Å [5]. The respective residues and nucleotides are treated as binding site residues and nucleotides.

Binding propensity

The binding propensity for the 20 amino acid residues and 4 nucleotides present in protein-RNA complexes has been calculated using following procedure [21-23]:

(i) We computed the frequency of occurrence of amino acid residues (nucleotides) in binding sites (fb) and in the protein (RNA) as a whole (ft). The binding propensity (Pbind) is calculated using the equation:

$$ {\mathrm{P}}_{\mathrm{b}\mathrm{ind}}\left(\mathrm{i}\right)={\mathrm{f}}_{\mathrm{b}}\left(\mathrm{i}\right)*100/{\mathrm{f}}_{\mathrm{t}}\left(\mathrm{i}\right) $$
(1)

where, i represents each of the 20 amino acids and 4 nucleotides.

(ii) The binding propensity was normalized with the percentage of binding site residues in the considered protein-RNA complexes. The normalization factor (Norm) was calculated as follows:

$$ \mathrm{Norm}={\mathrm{f}}_{\mathrm{b}}/{\mathrm{f}}_{\mathrm{t}} $$
(2)

where, fb is the total binding residues (nucleotides) and ft is the total number of residues (nucleotides) in the considered protein-RNA complexes.

(iii)The normalized binding propensity (Pnormbind) for the 20 amino acid residues and 4 nucleotides of RNA present in protein-RNA complexes was developed as follows:

$$ {\mathrm{P}}_{\mathrm{normbind}}\left(\mathrm{i}\right)={\mathrm{P}}_{\mathrm{bind}}\left(\mathrm{i}\right)/\mathrm{Norm} $$
(3)

The comparison among specific pairs of protein-RNA complexes from different organisms have been carried out using the normalized propensity of all and conserved residues along with the propensity of residues in five typical groups such as E. coli, H. sapiens, S. cerevisiae, thermophiles and archaea.

Conservation of amino acid residues

We have evaluated the conservation of residues in each RNA binding protein using the server, Consurf [24] available at http://consurf.tau.ac.il/. We have selected JTT evolutionary substitution model for amino acid replacements and Bayesian method for computing the score. Consurf compares the sequence of a protein chain with the proteins deposited in Uniprot and displays the sequences that are homologous to the given protein sequence. All the sequences that were found to be evolutionarily related with a RNA binding protein chain within the dataset were subsequently analysed using multiple sequence alignment. These protein sequence alignments were used to classify all the residues in each RNA binding protein into 9 categories: highly variable (score: 1) to highly conserved (score: 9).

Binding segments

The residues identified as binding sites have been studied in terms of binding segments. It is based on the number of consecutive binding residues in the amino acid sequences. For example, a 4-residue binding segment has a stretch of four consecutive binding residues. We have analyzed the binding segments with one, two, three, four, five, six and more than six residues. Similar analysis has also been carried out for nucleotides in RNA.

Preference of amino acid-nucleotide pairs

The preference of amino acid-nucleotide pairs at the interface of protein-RNA complex in specific organism has been computed using the following equation [4]:

$$ {\mathrm{Pair}}_{\mathrm{org}}\left(\mathrm{i},\mathrm{j}\right)=\varSigma {\mathrm{N}}_{\mathrm{i}\mathrm{j}}/\left(\varSigma {\mathrm{N}}_{\mathrm{i}}+\varSigma {\mathrm{N}}_{\mathrm{j}}\right) $$
(4)

where i and j stands for the interacting residues and nucleotides in proteins and RNA, respectively. Ni,j is the number of interacting residues of type i in protein and j in RNA. ΣNi and ΣNj are the total number of residues and nucleotides i and j in protein and RNA, respectively.

The amino acid-nucleotide pair preference for each organism has been normalized with the preference of all protein-RNA complexes [Pair(i,j)] to obtain the propensity of amino acid-nucleotide pairs at the interface. It is given by

$$ \mathrm{Propen}\left(\mathrm{i},\mathrm{j}\right)={\mathrm{Pair}}_{\mathrm{org}}\left(\mathrm{i},\mathrm{j}\right)/\mathrm{Pair}\left(\mathrm{i},\mathrm{j}\right) $$
(5)

The propensity has been converted into potentials for the amino acid-nucleotide pairs using standard procedures [25].

$$ \mathrm{Potential}\left(\mathrm{i},\mathrm{j}\right)=-\mathrm{R}\mathrm{T}\ \ln\ \mathrm{Propen}\left(\mathrm{i},\mathrm{j}\right) $$
(6)

where R is the gas constant and T is the temperature.

Influence of neighboring residues and motifs for binding with RNA

We have analyzed the influence of neighboring residues of binding sites using various aspects: (i) *B and B*, where * is any residue and B is a binding site residue. Further, the preferred tripeptide and trinucleotide motifs have been identified with a pattern, *B* [4,26]. As the number of combinations is high for tetrapeptides there will be no significant hits and hence we did not consider tetrapeptides in this work.

Molecular dynamics simulations

We have analyzed the mode of recognition of tRNAAsp by aspartyl tRNA synthetase (AspRS) in different organisms [27-29] using molecular dynamics simulations. The simulations were performed for 20 ns in an explicit water environment using ff99SB force field in AMBER suite [30-32]. The force field parameters of the modified tRNA bases were obtained from the Modifieds database [33]. Energy minimization and equilibrations were performed to remove the steric clashes and to set the temperature at 300 K and pressure at 1 atm using Berendsen thermostat coupling [34]. SHAKE algorithm [35] and Particle Mesh Ewald (PME) method [36] were employed to treat the hydrogen bonds and long range electrostatic interactions, respectively. Production runs (unrestrained) were carried out for 20 ns with 2 fs time step for each AspRs-tRNAAsp complex. The binding free energy (ΔG°) calculations have been performed with MM-GB/SA method [37-39] for identifying the active site amino acids, which are strongly interacting with the tRNAAsp. The calculation of ΔG° for each residue has been carried out using pairwise decomposition with mmpbsa.py module [40].

Results and discussion

Percentage of binding site residues in protein-RNA complexes from different organisms

We have computed the percentage of binding site residues in all the considered protein-RNA complexes and the results obtained for different organisms are presented in Table 1. Our analysis showed that the percentage of binding site residues varies with organisms for the same protein-RNA complex. For example, the binding site residues in AspRS are 7.12%, 3.97% and 8.57% of total residues for E. coli, T. thermophilus and S. cerevisiae, respectively. On the other hand, the binding site nucleotides are 32.00%, 20.55% and 26.67%, respectively. These data reveal that the binding sites of thermophilic proteins are less than mesophiles both in protein and RNA; specifically, the differences in aspartyl tRNA synthetase are 3% and 11%, respectively. Similar trend is also observed in leucyl tRNA synthetase. This may be due to the fact that the residues in thermophiles are contributing towards the stability of proteins, whereas mesophiles show higher tendency to interact with RNA than thermophiles. In EF-Tu elongation factor, mesophilic E. coli has less number of binding residues though it has more number of binding nucleotides. Overall analysis reveals that the recognition depends on the organism for a protein-RNA complex.

Binding propensity of residues in protein-RNA complexes from different organisms

We have computed the normalized binding propensity of all the 20 amino acid residues in different organisms (E. coli, H. sapiens, S. cerevisiae, thermophiles and archaea) and the results are shown in Figure 1. The analysis has been carried out on two aspects: in the first case, we have considered all the protein-RNA complexes in a single organism together and computed the average propensity and secondly, we have computed the propensity for each complex in an organism individually and computed the average and deviation. In this computation, residues with no binding sites were not taken into consideration. Noticeably, the trend is qualitatively similar in both results. We observed that the residues Ala, Val, Leu, Ile, Asp and Glu with the majority of hydrophobic residues have the normalized binding propensity of less than 2 and hence are not preferred at the binding sites. On the other hand, Ser, Tyr, Gln, Asn, Lys, Arg and His have the binding propensity of more than 2 in all the organisms showing their preferences at the interface. These results are similar to the binding propensity of residues obtained with energy based approach in a set of 81 protein-RNA complexes [4]. Interestingly, we noticed few differences in the binding propensity of residues among different organisms. Pro, Cys and Gln show higher preference in S. cerevisiae than other organisms. Lys, Arg and Phe are highly favored in E coli whereas Gly and Trp are preferred in H. sapiens. Asn shows high preference in thermophilic proteins although their overall composition is less than mesophilic ones [41]. Protein-RNA complexes from archaea are preferred with Ala, Pro, Met, Ser, Asp and His (Figure 1). In essence, the preference of amino residues at the interface of protein-RNA complexes is distinct in different organisms: positively charged residues in E. coli, aromatic residues in S. cerevisiae, polar residues in thermophiles, Gly and Trp in H. sapiens and a mixed combination of residues in archaea. These differences in binding sites residues among different organisms reflect their specific mode of recognition with RNA. Further, we have examined the statistical significance of the results and found that the p-value is less than 0.05.

Figure 1
figure 1

Normalized binding propensity of amino acid residues in different organisms.

Binding propensity of nucleotides in different organisms of protein-RNA complexes

We have computed the normalized binding propensity of nucleotides in E. coli, H. sapiens, S. cerevisiae, thermophiles and archaea, and the results are presented in Figure 2. We observed that the propensity is high for adenine in H. sapiens and archaea, uracil in S. cerevisiae and cytosine in E. coli and thermophilies. Cytosine has the propensity of more than one in 4 of the 5 considered groups. The propensity of guanine lies between the propensities of other nucleotides in all organisms. This analysis also emphasizes different modes of recognition by different organisms. However, it is noteworthy that the difference in propensity among the four nucleotides in different organisms is less than that of 20 amino acid residues.

Figure 2
figure 2

Normalized binding propensity of nucleotides in different organisms.

Variations of binding propensities in conserved residues of protein-RNA complexes from different organisms

We have further analyzed the normalized binding propensities of amino acid residues at conserved positions of E. coli, H. sapiens, S. cerevisiae, thermophiles and archaea in protein-RNA complexes. We observed that the overall tendency of amino acid residues is similar for both conserved and other positions, and few residues showed remarkable differences in their propensities at the conserved binding sites. In E. coli, Glu has more preference for the binding sites of conserved positions compared to its propensity at all binding sites. Similar results were observed for Asn in H. sapiens, Glu and Lys in thermophiles and Lys in archaea. On the other hand, an opposite trend was observed for few other residues: Cys in H. sapiens, Trp in S. cerevisiae, Tyr in thermophiles, and Gln and His in archaea. These results indicate the role of residue conservation for the interactions between protein and RNA and specifically the influence of polar residues at conserved positions in different organisms of protein-RNA complexes.

Influence of RNA base sequence on binding propensity

We have evaluated the influence of RNA base sequence on the binding propensity of amino acid residues in nucleotides. The lengths of RNA sequences are almost similar in all the complexes and the sequence identity varies in the range of 40-100% in most of the considered complexes. We have analyzed the nucleotide sequences at the binding sites in different pairs of protein-RNA complexes and observed that the binding preference is similar for all the nucleotides. Further, the change in propensities of amino acid residues is not uniform with the corresponding change in nucleotides. These analyses reveal that the influence of base sequence is not appreciable compared with amino acid sequences of protein-RNA complexes from different organisms. However, this effect can be extensively studied using systematic analysis on mutations and molecular dynamics simulations for deriving a conclusion.

Binding segments in protein-RNA complexes belonging to different organisms

We have analyzed the binding residues in terms of “continuous stretch” in protein and RNA sequences and the results are presented in Figure 3a and b. The length of continuous binding residues is termed as a binding segment. We observed that the single residue segments are preferred uniformly by all the organisms followed by two-residue segments in proteins, which is consistent with our previous analysis on non-redundant set of protein-RNA complexes [4].

Figure 3
figure 3

Variation of binding segments in (a) proteins and (b) RNA.

At the RNA level, most of the organisms prefer single nucleotide segments for binding with proteins. The preference of occurrence is approximately 30% in RNA whereas it is about 70% in proteins. The binding segments with more than two residues are observed in 70% of the binding sites in RNA. E. coli prefers to have binding segments with the length of 3, 4, 5 and more than 6 nucleotides whereas its preference is less for 2 and 6-residue segments. H. sapiens and S. cerevisiae have 20-25% of their binding sites in 2-residue segments and 10-15% have long stretch of binding sites with more than six nucleotides. Archaea has 25% of binding sites in 3-residue segments followed by 4 and 5-residue segments. These results reveal that the binding behavior of different organisms varies within the binding segments also for protein-RNA complexes and the observation was found to be statistically significant (p = 0.0347).

Binding motifs in protein-RNA complexes from different organisms

The information obtained about the preference of binding site residues and nucleotides has been used to identify the potential motifs in protein and RNA for binding. We have computed the probability of all the possible tripeptides and trinucleotides that are involved in binding in different organisms. We noticed that some of the motifs are unique in the considered organisms as reported in the literature [42]. All the tripeptides NYV in H. sapiens and S. cerevisiae are involved in binding. In addition, tripeptide IQK has the probability of 100% and 80% for binding with RNA in H. sapiens and S. cerevisiae, respectively. In archaea, the tripeptides RRS and LKE have the probability of 100% and 75%, respectively in the binding sites. The total number of binding site residues in E. coli and T. thermophilus are less and hence are excluded in the analysis. At the RNA level ACA, GGU and UGU are preferred in E. coli whereas all the trinucleotides UUU in H. sapiens and S. cerevisiae are observed to be binding with proteins.

Preference of dipeptides in the vicinity of binding sites

We have analyzed the preference of neighboring residues around the binding sites in protein-RNA complexes using the occurrence of dipeptides adjacent to the binding sites and their respective occurrences in the whole protein. The computations have been done using all possible 400 pairwise combinations of amino acid residues for the two categories, (i) *B (where '*'refers to any residue and B refers to the binding residue) and (ii) B*, and the preferred residue-pairs with the probability of more than 75% in any one of the organisms are presented in Tables 2 and 3. We noticed that few residue pairs (*B) are specific to a particular organism such as Cys-His in H. sapiens, Gly-Arg, Ser-Lys and Glu-Val in archaea (Table 2). Similar observation is also noticed in B* and specifically Val-Lys and His-Pro were observed in archaea (Table 3). This analysis reveals that the binding residue pairs are unique especially in archaea. On the other hand, several residue pairs are common for two to three organisms. For example, Ser-Asn has high preference in E. coli, H. sapiens and thermophiles, Asn-Tyr in H. sapiens and S. cerevisiae in *B. For B*, Tyr-Val is preferred in E. coli, H. sapiens and S. cerevisiae, His-Pro in E. coli, H. sapiens and archaea. These preferred residues pairs can be effectively used for identifying the binding sites in protein-RNA complexes. Further, we have examined the statistical significance of the data and the p-values of *B and B* are 3.6 × 10−12 and 1.2 × 10−9, respectively.

Table 2 Preferred residue pairs (*B) for binding with RNA
Table 3 Preferred residue pairs (B*) for binding with RNA

Preference of interacting amino acid-nucleotide pairs

We have analyzed the preference of interacting residues/nucleotides in proteins and RNA by calculating their pair preferences at the binding sites. The preferences of amino acid-nucleotide pairs have been converted into energy potentials to understand the preferred and avoided residue-nucleotide pairs for binding. The pairs, which have the values of less than −0.5 are considered as preferred and the ones with greater than 0.5 are treated as avoided. We noticed that the preferred and avoided amino acid residues are specific to interact with RNA and in different organisms (Table 4). The preferred residue-nucleotide pairs are Gly-C, Ala-C, Ser-C, Tyr-C, Asn-C and Leu-U in E. coli, Val-A, Cys-A, Trp-G and His-U in H. sapiens, Tyr-A, Gln-A and Met-G in S. cerevisiae, Val-C, Leu-C, Ile-C, Trp-C and Trp-U in thermophiles and Pro-C, Ile-U, Met-U, Ser-U, Cys-U and Glu-U in archaea. This analysis reveals that the preferred amino acids show inclination towards pairing with cytosine in E. coli and with adenine in H. sapiens and S. cerevisiae. Thermophiles and archaea show high preference to interact with cytosine and uracil, respectively. The potentials for all the possible 80 pairs are given in Additional file 1: Table S1 and the data are statistically significant (p = 0.0126). The potentials developed in this work will be useful for predicting the binding specificity of protein-RNA complexes belonging to different organisms.

Table 4 Preferred and avoided amino acid-nucleotide pairs in different organisms

Case study

We have extensively studied the variation of binding site residues in different organisms for each protein-RNA complex and the normalized binding propensities of 20 amino acid residues for a typical complex, AspRS-tRNAAsp from E. coli, T. thermophilus and S. cerevisiae are shown in Table 5. We observed that the binding mode and binding site residues are distinct in these organisms. Phe prefers to be in the binding sites in E. coli whereas Gly is prefered in T. thermophilus and Pro, Met and Thr are prefered in S. cerevisiae. Although Asn, Glu and Arg show preference to be at the interface in all the organisms, the strength is different among them. The preference of Arg was higher in E. coli and T. thermophilus than Lys whereas an opposite trend was observed in S. cerevisiae. The structure based sequence alignment of AspRS from three different organisms is shown in Figure 4. We observed that the binding site residues, binding mode and binding segments are different among the three different organisms in the considered complex. The analysis of binding segments showed a similar trend at the protein level however the behavior is different in RNA among different organisms. Single nucleotide segments accommodated 67% of the binding sites in T. thermophilus whereas only 33% of the binding sites have single nucleotide segments in E. coli.

Table 5 Propensity of amino acid residues in three different organisms of aspartyl tRNA synthetase
Figure 4
figure 4

Structure based sequence alignment of aspartyl tRNA complexes, 1ASY, 1EFW and 1IL2. The structurally conserved regions are shown in boxes. The interacting residues are highlighted with bold letters.

The mode of recognition for protein-RNA complexes belonging to different organisms has been further studied with a typical complex, AspRS-tRNAAsp using molecular dynamics simulations as described in the Methods section. The overall binding free energy for AspRS-tRNAAsp complexes from E. coli, T. thermophilus and S. cerevisiae are −212 ± 19.9 kcal/mol, −116.6 ± 14.3 kcal/mol and −190.9 ± 12.6 kcal/mol, respectively. The free energy is remarkably higher for T. thermophilus compared with its homologues indicating its low affinity for binding. This might be due to the fact that the thermophiles mainly account for their stability and are capable to sustain at high temperature. This has been confirmed with a large conformational change in the anti-codon loop of the complex from E. coli.

Further, T. thermophilius has half the number of binding sites compared with S. cerevisiae and E. coli, indicating its major role on stabilizing the complex. The energetic analysis shows that 17, 14 and 23 residues, respectively in E. coli, T. thermophilus and S. cerevisiae, potentially bind with RNA with a free energy of less than −3 kcal/mol. The hydrogen bond analysis shows the presence of 2069, 2131 and 1826 interactions in E. coli, T. thermophilus and S. cerevisiae respectively. Among them 114, 116 and 124 interactions are more stable with an occupancy of >80%. Specifically, 10 and 17 interactions strongly stabilize the AspRS-tRNAAsp complexes of E. coli and S. cerevisiae, respectively while only 5 interactions were found at the interface in the case of T. thermophilus. It is due to the conformational fluctuation of the cognate tRNA, which leads to less number of hydrogen bonds in T. thermophilus than in other complexes. Conversely, the total number of interactions stabilizing the T. thermophilus RS (90) is higher than E. coli (71) and S. cerevisiae (64). We have also estimated the number of stabilizing residues in these three organisms using SRide server [43].We found that the T. thermophilus has the highest number of 51 stabilizing residues followed by E. coli (42) and S. cerevisiae (34).

In addition, Table 6 provides the positional relationship of binding site residues with high affinity and it reveals the difference in recognition mechanism in the three organisms. These high affinity binding residues span different RNA binding regions of AspRS such as anti-codon binding domain, hinge region, catalytic and insertion domains. The tRNAAsp binding residues at anti-codon binding region are conserved among the three organisms and showed less variations. However, significant variation has been observed in the hinge and catalytic domains. Mechanism of recognition of tRNA by the RNA synthetase begins with binding of anti-codon bases with the hydrophobic residues at anti-codon binding domain of the protein. These variations in hinge and catalytic domains among different organisms dictate their unique mode of recognition of AspRS.

Table 6 List of residues from different regions of AspRS strongly binding with tRNAAsp

The organism specific recognition of protein-RNA complexes may be attributed with the following perspectives: (i) every stage of RNA metabolism is driven by binding of RNA binding proteins (RBPs) through RNA binding domains. In general, RBPs are structurally diverse as the complexity of the genome is increased during evolution and they are recruited at different stages during transcription and translation processes [44,45], (ii) horizontal gene transfer [46] and (iii) RBPs acquire evolutionarily conserved structures and they show difference at sequence level in each subfamily. As discussed in the case study, these differences influence the mode of binding with its tRNA substrate. This may be further examined with detailed analysis on various pairs of protein-RNA complexes.

Conclusions

We have investigated the organism specific recognition of protein-RNA complexes based on various sequence and structure based features such as binding propensity, preference of residues at conserved positions, binding segments, binding motifs, neighboring residues and interacting amino acid-nucleotide pairs. The results showed that the residue and nucleotide preferences are distinct in different organisms. The preference of amino acid residue pairs obtained in the present work will be useful for predicting the binding sites of RNA binding proteins. We have developed amino acid-nucleotide pair potentials for different organisms, which can be used for predicting the binding specificity of protein-RNA complexes. The molecular dynamics simulations studies on a typical complex, AspRS-tRNAAsp showed the specific mode of recognition as well as preferred binding sites in different organisms. These results provide deep insights to understand the recognition of protein-RNA complexes belonging to different organisms.

Reviewers’ comments and response

Reviewer #1: Professor Sandor Pongor

In this work, the authors have analyzed the binding specificity of 18 sets of homologous protein-RNA complexes belonging to different organisms. This is a different approach from the traditional analysis with non-redundant datasets. The investigations have been carried out on various sequence and structure based features as well as molecular dynamics simulations. The results showed the similarities and differences between different organisms in the same complex. Further, distinct modes of recognition have been revealed with a typical example using MD simulations and energy calculations. The work would have further implications on understanding the recognition mechanism of protein-RNA complexes from different organisms.

1. It has been mentioned that the potentials for amino acid-nucleotide pairs derived for different sets of organisms would be helpful for predicting the binding specificity. However, the data are not shown. The potentials should be given in supplementary information.

Authors’ response: Amino acid-nucleotide pair potentials are given in supplementary Table S1.

2. The stability of aspartyl tRNA synthetase from E. coli, T. thermophiles and S.cerevisiae could be discussed with stabilizing residues in these complexes.

Authors’ response: The stability has been discussed with the number of stabilizing residues.

3. The cutoff used to select the preferred and avoided residues in Table 3 may be given.

Authors’ response: Values less than -0.5 are considered as preferred and greater than 0.5 as avoided amino acid-nucleotide pair preference.

Reviewer #2: Professor Narayanaswamy Srinivasan

Gromiha et al have performed comparative analysis of 3-D structures of homologous proteins bound to RNA. They have analysed number of RNA binding sites, amino acid residues which are involved in RNA recognition, segments in proteins and RNAs involved in recognition of each other etc. The most important new feature of this analysis is to view these structural attributes in terms of organisms. This led to recognition of organism-dependent features in protein-RNA complexes. This is a new and important finding. Though physicochemical nature of the binding sites determine the specificity and stability of the complexes, learning from this manuscript provides a new dimension to protein-RNA recognition based on the type of the organism. I think a round of revision is needed before this work may be published.

1. The most important outcome of this work is the "organism-dependent" features of protein-RNA complexes. This must be ensured by statistical significance tests. I hope the observed frequencies of various features, such as amino acids involved in RNA binding, and the size of the dataset will permit authors to perform meaningful statistical significance tests, Data presented in most of the Tables and Figures must be subjected to statistical significance tests. In my view this is a crucial addition to be made in the revised version.

Authors’ response: We have performed statistical significance tests for the results presented in Tables and Figures using ANOVA, wherever possible. The p-values are less than 0.05 for most of the data, which validates the results.

2. I understood that dataset formation involved groups of protein-RNA complex structures with proteins being homologous. What about RNA sequences in each group? Can the observed differences in preferred amino acids which recognize RNA be explained in terms of base sequence differences in bound RNA?

Authors’ response: We have evaluated the influence of RNA base sequence on binding propensity of amino acid residues in nucleotides. The lengths of RNA sequences are almost similar in all the complexes and the sequence identity varies in the range of 40-100% in most of the considered complexes. We have analyzed the nucleotide sequences at the binding sites in different pairs of protein-RNA complexes and observed that the binding preference is similar in all the nucleotides. Further, the change in propensities of amino acid residues is not uniform with similar change in nucleotides. These analyses reveal that the influence of base sequence is not appreciable compared with amino acid sequences of protein-RNA complexes from different organisms. However, this effect can be extensively studied using systematic analysis on mutations and molecular dynamics simulations for deriving any conclusions.

3. While the manuscript is well organized, it requires sorting out typos and refinement throughout the manuscript. For example, in the Abstract authors mention "We have found that the mesophilic organisms have more number of binding sites than thermophiles and....". I am sure authors mean proteins of mesophilic and thermophilic organisms not organisms themselves. In another place in the Abstract authors mention "Proteins prefer to bind with RNA using a single residue in.....". It is not clear if authors mean segments with a single residue or single segment.

Authors’ response: The language corrections have been carried out.

Reviewer #3: Dr Gajendra Raghava

In this manuscript authors analyzed Protein-RNA complexes to understand RNA binding in different organism. They obtained Protein-RNA complexes from different organisms and compute binding preference of residues in protein and nucleotides in RNA. Their observation is interesting that different residue are preferred in different organism, similarly nucleotide preference is also different in different organism. This reviewer have following point for authors.

1. What is impact of crystallization conditions particularly temperature on RNA binding, authors should examine this issue. Authors should also examine whether Protein-RNA complexes were expressed in their host or not.

Authors’ response: We have checked the crystallization conditions, and found that more than 90% of structures in the dataset have the same temperature (100 K). In all the cases, the expression organism is E. coli.

2. Deviation in preference of residues among proteins belongs to same organism, similarly variation in nucleotide preferences among RNAs belongs to same organism should be examined. Standard deviation in residue/nucleotide plot may provide this information.

Authors’ response: Deviations are included in all the figures.

3. Significance should be calculated to understand whether preference is really different.

Authors’ response: We have performed statistical significance tests for the results presented in Tables and Figures using ANOVA, wherever possible. The p-values are less than 0.05 for most of the data.

4. If possible, authors should provide reasons why binding is different in Protein-RNA complexes belongs to different organisms.

Authors’ response: (i) Every stage of RNA metabolism is driven by binding of RNA binding proteins (RBPs) through RNA binding domains. In general, RBPs are structurally diverse as the complexity of the genome is increased during evolution and they are recruited at different stages during transcription and translation processes [44,45], (ii) horizontal gene transfer [46] and (iii) in each subfamily, RBPs acquire evolutionarily conserved structures and they show difference at sequence level. As discussed in the case study these differences influence the mode of binding with its tRNA substrate. This may be further examined with detailed analysis on various pairs of protein-RNA complexes.

Abbreviations

RNA:

Ribonucleic acid

PDB:

Protein data bank

FRET:

Fluorescence resonance energy transfer

PSSM:

Position specific scoring matrices

tRNA:

Transfer RNA

AspRS:

Aspartyl tRNA synthetase

tRNAAsp:

Aspartyl tRNA

RMSD:

Root mean square deviation

AMBER:

Assisted Model Building with Energy Refinement

PME:

Particle Mesh Ewald

MM-GB/SA:

Molecular Mechanics-Generalied Born/Surface Area

References

  1. Chen Y, Varani G. Protein families and RNA recognition. FEBS J. 2005;272(9):2088–97.

    Article  CAS  PubMed  Google Scholar 

  2. Tagami S, Sekine S, Kumarevel T, Hino N, Murayama Y, Kamegamori S, et al. Crystal structure of bacterial RNA polymerase bound with a transcription inhibitor protein. Nature. 2010;468(7326):978–82.

    Article  CAS  PubMed  Google Scholar 

  3. Jones S, Daley DT, Luscombe NM, Berman HM, Thornton JM. Protein-RNA interactions: a structural analysis. Nucleic Acids Res. 2001;29:943–54.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  4. Gromiha MM, Yokota K, Fukui K. Understanding the recognition mechanism of protein-RNA complexes using energy based approach. Curr Protein Pept Sci. 2010;11(7):629–38.

    Article  CAS  PubMed  Google Scholar 

  5. Nagarajan R, Gromiha MM. Prediction of RNA binding residues: an extensive analysis based on structure and function to select the best predictor. PLoS One. 2014;9(3):e91140.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  6. Rose PW, Bi C, Bluhm WF, Christie CH, Dimitropoulos D, Dutta S, et al. The RCSB Protein Data Bank: new resources for research and education. Nucleic Acids Res. 2013;41(Database issue):D475–82.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  7. Shulman-Peleg A, Nussinov R, Wolfson HJ. RsiteDB: a database of protein binding pockets that interact with RNA nucleotide bases. Nucleic Acids Res. 2009;37(Database issue):D369–73.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  8. Lewis BA, Walia RR, Terribilini M, Ferguson J, Zheng C, Honavar V, et al. PRIDB: a Protein-RNA interface database. Nucleic Acids Res. 2011;39(Database issue):D277–82.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  9. Bahadur RP, Zacharias M, Janin J. Dissecting protein-RNA recognition sites. Nucleic Acids Res. 2008;36:2705–16.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  10. Borozan SZ, Dimitrijević BP, Stojanović SĐ. Cation-π interactions in high resolution protein-RNA complex crystal structures. Comput Biol Chem. 2013;47:105–12.

    Article  CAS  PubMed  Google Scholar 

  11. Pietal MJ, Szostak N, Rother KM, Bujnicki JM. RNAmap2D - calculation, visualization and analysis of contact and distance maps for RNA and protein-RNA complex structures. BMC Bioinformatics. 2012;13:333.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  12. Fornes O, Garcia-Garcia J, Bonet J, Oliva B. On the Use of knowledge-based potentials for the evaluation of models of protein-protein, protein-DNA, and protein-RNA interactions. Adv Protein Chem Struct Biol. 2014;94:77–120.

    Article  PubMed  Google Scholar 

  13. Kumar M, Gromiha MM, Raghava GP. Prediction of RNA binding sites in a protein using SVM and PSSM profile. Proteins. 2008;2008(71):189–94.

    Article  Google Scholar 

  14. Wang L, Huang C, Yang MQ, Yang JY. BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features. BMC Syst Biol. 2010;4:S3.

    Article  PubMed Central  PubMed  Google Scholar 

  15. Wang Y, Chen X, Liu ZP, Huang Q, Wang Y, Xu D, et al. De novo prediction of RNA-protein interactions from sequence information. Mol Biosyst. 2013;9:133–42.

    Article  CAS  PubMed  Google Scholar 

  16. Walia RR, Caragea C, Lewis BA, Towfic F, Terribilini M, El-Manzalawy Y, et al. Protein-RNA Interface Residue Prediction using Machine Learning: An Assessment of the State of the Art. BMC Bioinformatics. 2012;13:89.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  17. Puton T, Kozlowski L, Tuszynska I, Rother K, Bujnicki JM. Computational methods for prediction of protein-RNA interactions. J Struct Biol. 2012;179:261–8.

    Article  CAS  PubMed  Google Scholar 

  18. Ahmad S, Gromiha MM, Sarai A. Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information. Bioinformatics. 2004;20:477–86.

    Article  CAS  PubMed  Google Scholar 

  19. Nagarajan R, Ahmad S, Gromiha MM. Novel approach for selecting the best predictor for identifying the binding sites in DNA binding proteins. Nucleic Acids Res. 2013;41:7606–14.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  20. Tjong H, Zhou H-X. DISPLAR: An accurate method for predicting DNA-binding sites on protein surfaces. Nucleic Acids Res. 2007;35:1465–77.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  21. Gromiha MM, Selvaraj S, Jayaram B, Fukui K. Identification and analysis of binding site residues in protein complexes: Energy based approach. Lect Notes Comput Sci. 2010;6215:626–33.

    Article  Google Scholar 

  22. Gromiha MM, Yokota K, Fukui K. Energy based approach for understanding the recognition mechanism in protein-protein complexes. Mol Biosyst. 2009;5:1779–86.

    Article  PubMed  Google Scholar 

  23. Gromiha MM, Fukui K. Scoring function based approach for locating binding sites and understanding the recognition mechanism of protein-DNA complexes. J Chem Inf Model. 2011;51(3):721–9.

    Article  CAS  PubMed  Google Scholar 

  24. Glaser F, Pupko T, Paz I, Bell RE, Bechor D, Martz E, et al. ConSurf: identification of functional regions in proteins by surface mapping of phylogenetic information. Bioinformatics. 2003;19:163–4.

    Article  CAS  PubMed  Google Scholar 

  25. Thangakani AM, Kumar S, Nagarajan R, Velmurugan D, Gromiha MM. GAP: towards almost 100 percent prediction for β-strand-mediated aggregating peptides with distinct morphologies. Bioinformatics. 2014;30(14):1983–90.

    Article  CAS  PubMed  Google Scholar 

  26. Gromiha MM, Saranya N, Selvaraj S, Jayaram B, Fukui K. Sequence and structural features of binding site residues in protein-protein complexes: comparison with protein-nucleic acid complexes. Proteome Sci 2011;9 Suppl 1:S13.

  27. Moulinier L, Eiler S, Eriani G, Gangloff J, Thierry JC, Gabriel K, et al. The structure of an AspRS-tRNA (Asp) complex reveals a tRNA-dependent control mechanism. EMBO J. 2001;20:5290–301.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  28. Briand C, Poterszman A, Eiler S, Webster G, Thierry J, Moras D. An intermediate step in the recognition of tRNA(Asp) by aspartyl-tRNA synthetase. J Mol Biol. 2000;299:1051–60.

    Article  CAS  PubMed  Google Scholar 

  29. Ruff M, Krishnaswamy S, Boeglin M, Poterszman A, Mitschler A, Podjarny A, et al. Class II aminoacyl transfer RNA synthetases: crystal structure of yeast aspartyl-tRNA synthetase complexed with tRNA(Asp). Science. 1991;252:1682–9.

    Article  CAS  PubMed  Google Scholar 

  30. Ponder JW, Case DA. Force fields for protein simulations. Adv Protein Chem. 2003;66:27–85.

    Article  CAS  PubMed  Google Scholar 

  31. Duan Y, Wu C, Chowdhury S, Lee MC, Xiong G, Zhang W, et al. A point-charge force field for molecular mechanics simulations of proteins based on condensed-phase quantum mechanical calculations. J Comput Chem. 2003;24:1999–2012.

  32. Pearlman DA, Case DA, Caldwell JW, Ross WS, Cheatham Iii TE, DeBolt S, et al. AMBER, a package of computer programs for applying molecular mechanics, normal mode analysis, molecular dynamics and free energy calculations to simulate the structural and energetic properties of molecules. Comput Phys Commun. 1995;91:1–41.

    Article  CAS  Google Scholar 

  33. Aduri R, Psciuk BT, Saro P, Taniga H, Schlegel HB, SantaLucia J. AMBER force field parameters for the naturally occurring modified nucleosides in RNA. J Chem Theor Comput. 2007;3:1464–75.

    Article  CAS  Google Scholar 

  34. Berendsen HJC, Postma JPM, van Gunsteren WF, DiNola A, Haak JR. Molecular dynamics with coupling to an external bath. J Chem Phys. 1984;81:3684–90.

    Article  CAS  Google Scholar 

  35. Ryckaert J-P, Ciccotti G, Berendsen HJC. Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. J Comput Phys. 1977;23:327–41.

    Article  CAS  Google Scholar 

  36. Darden T, York D, Pedersen L. Particle mesh Ewald: An N.log(N) method for Ewald sums in large systems. J Chem Phys. 1993;98:10089–92.

    Article  CAS  Google Scholar 

  37. Wang J, Hou T, Xu X. Recent advances in free energy calculations with a combination of molecular mechanics and continuum models. Current Computer - Aided Drug Design. 2006;2:287–306.

    Article  CAS  Google Scholar 

  38. Wang W, Donini O, Reyes CM, Kollman PA. BIOMOLECULAR SIMULATIONS: recent developments in force fields, simulations of enzyme catalysis, protein-ligand, protein-protein, and protein-nucleic acid noncovalent interactions. Annu Rev Biophys Biomol Struct. 2001;30:211–43.

    Article  CAS  PubMed  Google Scholar 

  39. Kollman PA, Massova I, Reyes C, Kuhn B, Huo S, Chong L, et al. Calculating structures and free energies of complex molecules: combining molecular mechanics and continuum models. Acc Chem Res. 2000;33:889–97.

    Article  CAS  PubMed  Google Scholar 

  40. Miller BR, McGee TD, Swails JM, Homeyer N, Gohlke H, Roitberg AE. MMPBSA.py: an efficient program for End-state free energy calculations. J Chem Theor Comput. 2012;8:3314–21.

    Article  CAS  Google Scholar 

  41. Gromiha MM, Suresh MX. Discrimination of mesophilic and thermophilic proteins using machine learning algorithms. PROTEINS: Struct Funct Bioinf. 2008;70:1274–9.

    Article  CAS  Google Scholar 

  42. Gardner PP, Eldai H. Annotating RNA motifs in sequences and alignments. Nucleic Acids Res. 2015;43:691–8.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  43. Magyar C, Gromiha MM, Pujadas G, Tusnády GE, Simon I. SRide: a server for identifying stabilizing residues in proteins. Nucleic Acids Res. 2005, 33(Web Server issue):W303-5.

  44. Hogan DJ, Riordan DP, Gerber AP, Herschlag D, Brown PO. Diverse RNA-binding proteins interact with functionally related sets of RNAs, suggesting an extensive regulatory system. PLoS Biol. 2008;6:e255.

    Article  PubMed Central  PubMed  Google Scholar 

  45. Glisovic T, Bachorik JL, Yong J, Dreyfuss G. RNA-binding proteins and post-transcriptional gene regulation. FEBS Lett. 2008;582:1977–86.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  46. Woese CR, Olsen GJ, Ibba M, Söll D. Aminoacyl-tRNA synthetases, the genetic code, and the evolutionary process. Microbiol Mol Biol Rev. 2000;64(1):202–36.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank the reviewers for their helpful comments, suggestions and insights that have helped us to improve our manuscript. MMG wishes to thank Tokyo Institute of Technology, Japan for providing Visiting Professorship. MMG, SPC, CR and RN thank Bioinformatics Facility, High Performance Computing Facility and Indian Institute of Technology Madras for computational facilities. RN thanks Department of Biotechnology (DBT), Govt. of India for the award of Bioinformatics National Certification (BINC) fellowship. This research was partially supported by Department of Science and Technology, Government of India (MMG; No: SR/SO/BB-0036/2011).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Masakazu Sekijima or M Michael Gromiha.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

MMG and MS designed the study. RN, SPC and CR carried out the calculations. MMG, RN, SPC, CR and MS analysed the data. MMG drafted the manuscript, and all authors read and approved the manuscript.

Authors’ information

MMG: Ph.D., Leader, Protein Bioinformatics Lab, Indian Institute of Technology Madras, India.

MS: Ph.D., Associate professor, Global scientific information and computing center, Tokyo Institute of Technology, Japan.

CR: Ph.D., Research Associate, Indian Institute of Technology Madras, India.

RN: M.Sc., PhD student, Indian Institute of Technology Madras, India.

SPK: MS., Associate Research Engineer, Philips Research, USA.

Additional file

Additional file 1: Table S1.

Amino acid-nucleotide pair potentials in different organisms.

Rights and permissions

Open Access  This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver (https://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nagarajan, R., Chothani, S.P., Ramakrishnan, C. et al. Structure based approach for understanding organism specific recognition of protein-RNA complexes. Biol Direct 10, 8 (2015). https://0-doi-org.brum.beds.ac.uk/10.1186/s13062-015-0039-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://0-doi-org.brum.beds.ac.uk/10.1186/s13062-015-0039-8

Keywords