- SROOGLE in a nutshell
- Input form
- Annotation of 5’ss
- Annotation of SRSs
- Percentile scores
- Neighborhood Inference (NI) scores
- Displaying Matches for Mutations
- Mutability Index
- First Time Use Troubleshooting
SROOGLE in a nutshell Exons
are typically around 140 nucleotides in length, surrounded by vast
intronic oceans that are thousands of nucleotides long. Four splice
signals direct the splicing machinery to the exon/intron junctions,
termed the 5’ and 3’ splice sites (5’ss and 3’ss), the polypyrimidine
tract and the branch site (BS). In addition, many sequences have been
identified over the past years known as splicing regulatory sequences
(SRSs), which boost or repress the recognition of exons. Different
algorithms have been developed to identify and score the four splicing
signals, and thousands of sequences have been identified as putative
SRSs. SROOGLE’s goal is to make all this data available to the
biologist, in an integrative, easily interpretable, and user-friendly
SROOGLE combines the following features:
Availability of data: SROOGLE allows biologists accessibility to large sets of published data, which are not available on any other public servers.
Integration of data:
The majority of existing webservers present data only for few
particular types of SRSs, or for only few splicing signals, not
allowing biologist to obtain, with one glance, an integrative overview
of the signals characterizing their exons of interest.
Intuitive statistical measures:
Many algorithms provide output which are not directly interpretable
(e.g. delta-G scores, PSSM log odd scores). Whenever possible, we have
provided percentile scores, indicating the strength of a signal with
respect to two large pre-compiled pools of alternative and constitutive
much emphasis was laid on an intuitive, interactive, graphical user
interface and on dynamic java-script programming, enabling users to
interactively modify their input.
entering the website, users are requested to enter their exons along
with the two introns flanking it. The server will accept either
consecutive stretches of DNA, or stretches of DNA separated by spaces
and numbers, as obtained in the UCSC web browser. Users may also use
our sample exons and introns, by choosing the relevant link.
Annotation of 5’ssSROOGLE
scores the four main splicing signals based on 9 different algorithms.
Each of these signals is marked above the sequence in dashed lines in a
different color, and setting the cursor over these dashed lines will
display additional information regarding the splicing signal. This and
additional information is summarized in a table beneath the browser, on
the left. SROOGLE identifies and scores the four splice-signals
according to 9 different algorithms. These include:
In addition to the direct scores yielded by the different algorithms, SROOGLE also presents percentile scores (see below).
- Branch site and polypyrimidine tract:
For both these signals, the algorithms developed by (Kol, Lev-Maor and
Ast 2005) and (Schwartz, Silva, Burstein, Pupko, Eyras and Ast 2008)
are implemented and visualized.
- 3’ and 5’ splice sites:
For both these signals we implement two methods: a maximum entropy
based scoring method developed by (Yeo and Burge 2004) and a
position-specific scoring matrix (PSSM) method as described in (Shapiro
and Senapathy 1987). For the 5’ss we implement an additional method,
based on calculation of the free energy (delta-G) in the binding
between U1 snRNA and a given 5’ss.
Annotation of SRSsSROOGLE
identifies and visualizes specific splicing regulatory sequences (SRSs)
based on 13 different datasets. These sequences are visualized in the
browser beneath the sequence, and setting the cursor on them will
provide information regarding the SRS sequence, the group it pertains
to, and – if available – a score between 0 and 1 indicating the ranking
of a given sequence within the different sequences identified in a
given dataset. In other words, a score of 1 indicates that a
sequence was the highest-scoring sequence in a study a score of 0
indicates that it was the lowest scoring sequence. This ranking was
performed based on the P-values and Z-scores provided by the different
studies, if provided.
SRS dataset was classified either as an enhancer (marked by an E in the
table below the browser, to the right, visualized in red), silencer (S, visualized in green) or regulator (R, visualized in gray),
based on how it is predicted to affect exon selection. In addition,
some datasets of SRSs were originally identified within exons, or only
within the upstream intron, or only within the downstream one. Thus,
the default view of the browser is to visualize each SRS only in the
relevant segments. However, the user can interactively modify this.
each SRS, density values are calculated, defined as the number of
nucleotides within the exon covered by a given group of SRSs.
Percentile scores for this value are provided as well (see below).
obtain these scores, two datasets of >50,000 constitutive exons, and
>3000 alternative exons were compiled, along with their flanking
introns. In each of these datasets, the splicing signals were detected
and scored based on the different algorithms, and SRS densities for
each dataset were calculated as well. Based on the distribution of
values for each of these signals within each of these two datasets,
percentile scores are calculated and presented, indicating the ranking
of a given score within these two pre-calculated distributions. Thus, a
value of 0.95 indicates that 95% of the exons have lower scores, and
only 5% have higher ones.
Neighborhood Inference (NI) scoresSROOGLE
visualizes Neighborhood Inference (NI) scores, based on (Stadler,
Shomron, Yeo, Schneider, Xiao and Burge 2006). Positive scores indicate
that a hexamer beginning at a given position resembles exonic splicing
enhancers in terms of sequence, and is therefore predicted to be one as
well, whereas negative scores indicate that a given position resembles
Displaying Matches for MutationsThis
feature presents the SRSs overlapping a given position under the
supposition that this position were mutated, and aims to guide
biologists on the splicing related consequences of inducing a point
mutation at a given position. Check the “Display Matches For Mutations”
box, and then point the cursor at a given position, to examine the SRSs
overlapping that position if it were mutated.
novel index aims to provide an overview of the extent to which a given
nucleotide is involved in splicing regulation. This index is calculated
as (sum_nonmut-sum_mut)/ (sum_nonmut+sum_mut), where sum non_mut is the
sum of the SRSs overlapping a given nucleotide, and sum_nonmut is the
average number of SRSs overlapping the given nucleotide when it is
mutated to each of the three other possible options. Thus, high values
of this index indicate that once a given nucleotide is mutated, it
tends to disrupt SRSs, whereas low values indicate that no matter the
specific nucleotide at a given position, SRSs would overlap it (due to
the content of the neighboring positions).
First Time Use TroubleshootingTo
view the graphical interface, Microsoft Silverlight (a browser plug-in)
must be installed. A link for this program is automatically provided if
this application is not installed. The application must be downloaded
and run, following which the browser must be closed, and reloaded.
G., G. Lev-Maor, and G. Ast. 2005. Human-mouse comparative analysis
reveals that branch-site plasticity contributes to splicing regulation.
Hum Mol Genet 14: 1559-1568.
S.H., J. Silva, D. Burstein, T. Pupko, E. Eyras, and G. Ast. 2008.
Large-scale comparative analysis of splicing signals and their
corresponding splicing factors in eukaryotes. Genome Res 18: 88-103.
M.B. and P. Senapathy. 1987. RNA splice junctions of different classes
of eukaryotes: sequence statistics and functional implications in gene
expression. Nucleic Acids Res 15: 7155-7174.
M.B., N. Shomron, G.W. Yeo, A. Schneider, X. Xiao, and C.B. Burge.
2006. Inference of splicing regulatory activities by sequence
neighborhood analysis. PLoS Genet 2: e191.
G. and C.B. Burge. 2004. Maximum entropy modeling of short sequence
motifs with applications to RNA splicing signals. J Comput Biol 11: