SROOGLE

Help page

  1. SROOGLE in a nutshell
  2. Input form
  3. Annotation of 5’ss
  4. Annotation of SRSs
  5. Percentile scores
  6. Neighborhood Inference (NI) scores
  7. Displaying Matches for Mutations
  8. Mutability Index
  9. First Time Use Troubleshooting
  10. References

SROOGLE in a nutshell

Exons are typically around 140 nucleotides in length, surrounded by vast intronic oceans that are thousands of nucleotides long. Four splice signals direct the splicing machinery to the exon/intron junctions, termed the 5’ and 3’ splice sites (5’ss and 3’ss), the polypyrimidine tract and the branch site (BS). In addition, many sequences have been identified over the past years known as splicing regulatory sequences (SRSs), which boost or repress the recognition of exons. Different algorithms have been developed to identify and score the four splicing signals, and thousands of sequences have been identified as putative SRSs. SROOGLE’s goal is to make all this data available to the biologist, in an integrative, easily interpretable, and user-friendly manner.
SROOGLE combines the following features:

Availability of data: SROOGLE allows biologists accessibility to large sets of published data, which are not available on any other public servers.
Integration of data: The majority of existing webservers present data only for few particular types of SRSs, or for only few splicing signals, not allowing biologist to obtain, with one glance, an integrative overview of the signals characterizing their exons of interest.
Intuitive statistical measures: Many algorithms provide output which are not directly interpretable (e.g. delta-G scores, PSSM log odd scores). Whenever possible, we have provided percentile scores, indicating the strength of a signal with respect to two large pre-compiled pools of alternative and constitutive exons.
User friendliness: much emphasis was laid on an intuitive, interactive, graphical user interface and on dynamic java-script programming, enabling users to interactively modify their input.

Input form

Upon entering the website, users are requested to enter their exons along with the two introns flanking it. The server will accept either consecutive stretches of DNA, or stretches of DNA separated by spaces and numbers, as obtained in the UCSC web browser. Users may also use our sample exons and introns, by choosing the relevant link.

Annotation of 5’ss

SROOGLE scores the four main splicing signals based on 9 different algorithms. Each of these signals is marked above the sequence in dashed lines in a different color, and setting the cursor over these dashed lines will display additional information regarding the splicing signal. This and additional information is summarized in a table beneath the browser, on the left. SROOGLE identifies and scores the four splice-signals according to 9 different algorithms. These include:
In addition to the direct scores yielded by the different algorithms, SROOGLE also presents percentile scores (see below).

Annotation of SRSs

SROOGLE identifies and visualizes specific splicing regulatory sequences (SRSs) based on 13 different datasets. These sequences are visualized in the browser beneath the sequence, and setting the cursor on them will provide information regarding the SRS sequence, the group it pertains to, and – if available – a score between 0 and 1 indicating the ranking of a given sequence within the different sequences identified in a given dataset.  In other words, a score of 1 indicates that a sequence was the highest-scoring sequence in a study a score of 0 indicates that it was the lowest scoring sequence. This ranking was performed based on the P-values and Z-scores provided by the different studies, if provided.

Each SRS dataset was classified either as an enhancer (marked by an E in the table below the browser, to the right, visualized in red), silencer (S, visualized in green) or regulator (R, visualized in gray), based on how it is predicted to affect exon selection. In addition, some datasets of SRSs were originally identified within exons, or only within the upstream intron, or only within the downstream one. Thus, the default view of the browser is to visualize each SRS only in the relevant segments. However, the user can interactively modify this.

For each SRS, density values are calculated, defined as the number of nucleotides within the exon covered by a given group of SRSs. Percentile scores for this value are provided as well (see below).

Percentile scores

To obtain these scores, two datasets of >50,000 constitutive exons, and >3000 alternative exons were compiled, along with their flanking introns. In each of these datasets, the splicing signals were detected and scored based on the different algorithms, and SRS densities for each dataset were calculated as well. Based on the distribution of values for each of these signals within each of these two datasets, percentile scores are calculated and presented, indicating the ranking of a given score within these two pre-calculated distributions. Thus, a value of 0.95 indicates that 95% of the exons have lower scores, and only 5% have higher ones.

Neighborhood Inference (NI) scores

SROOGLE visualizes Neighborhood Inference (NI) scores, based on (Stadler, Shomron, Yeo, Schneider, Xiao and Burge 2006). Positive scores indicate that a hexamer beginning at a given position resembles exonic splicing enhancers in terms of sequence, and is therefore predicted to be one as well, whereas negative scores indicate that a given position resembles splicing silencers.

Displaying Matches for Mutations

This feature presents the SRSs overlapping a given position under the supposition that this position were mutated, and aims to guide biologists on the splicing related consequences of inducing a point mutation at a given position. Check the “Display Matches For Mutations” box, and then point the cursor at a given position, to examine the SRSs overlapping that position if it were mutated.

Mutability Index

This novel index aims to provide an overview of the extent to which a given nucleotide is involved in splicing regulation. This index is calculated as (sum_nonmut-sum_mut)/ (sum_nonmut+sum_mut), where sum non_mut is the sum of the SRSs overlapping a given nucleotide, and sum_nonmut is the average number of SRSs overlapping the given nucleotide when it is mutated to each of the three other possible options. Thus, high values of this index indicate that once a given nucleotide is mutated, it tends to disrupt SRSs, whereas low values indicate that no matter the specific nucleotide at a given position, SRSs would overlap it (due to the content of the neighboring positions).

First Time Use Troubleshooting

To view the graphical interface, Microsoft Silverlight (a browser plug-in) must be installed. A link for this program is automatically provided if this application is not installed. The application must be downloaded and run, following which the browser must be closed, and reloaded.
 

References

  1. Kol, G., G. Lev-Maor, and G. Ast. 2005. Human-mouse comparative analysis reveals that branch-site plasticity contributes to splicing regulation. Hum Mol Genet 14: 1559-1568.
  2. Schwartz, S.H., J. Silva, D. Burstein, T. Pupko, E. Eyras, and G. Ast. 2008. Large-scale comparative analysis of splicing signals and their corresponding splicing factors in eukaryotes. Genome Res 18: 88-103.
  3. Shapiro, M.B. and P. Senapathy. 1987. RNA splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression. Nucleic Acids Res 15: 7155-7174.
  4. Stadler, M.B., N. Shomron, G.W. Yeo, A. Schneider, X. Xiao, and C.B. Burge. 2006. Inference of splicing regulatory activities by sequence neighborhood analysis. PLoS Genet 2: e191.
  5. Yeo, G. and C.B. Burge. 2004. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J Comput Biol 11: 377-394.