SAPS evaluates by statistical criteria a wide variety of
protein sequence properties. Properties considered include
compositional biases; clusters and runs of charge and other
amino acid types; different kinds and extents of repetitive
structures; locally periodic motifs; and anomalous spacings
between identical residue types. The statistics are computed
for any single (or appropriately concatenated) protein
sequence input. Statistically significant sequence features
highlighted by SAPS in the input sequence may suggest
promising regions for experimental investigation. The pro-
gram also finds application in the description of conserved
features of families of proteins as well as in the inverse
problem of deriving protein groupings based upon sequence
features.
Short sequences are subject to larger statistical fluctua-
tions than longer sequences. The statistical evaluations of
SAPS are reliable only for sequences of at least about 200
residues. Shorter sequences may in some cases be appropri-
ately concatenated and analyzed as a representative combined
sequence (e.g., histones, or Ras family proteins).
The SAPS program was developed in the group of Prof. Samuel
Karlin at Stanford University. The program is available via
anonymous ftp from gnomic.stanford.edu. Correspondence
relating to SAPS should be addressed to Volker Brendel at
the Department of Mathematics, Stanford University, Stanford
CA 94305, U.S.A.; phone: (415) 723-9256; fax: (415) 725-
2040; email: volker@gnomic.stanford.edu. Users of the pro-
gram should cite the following reference:
Brendel, V., Bucher, P., Nourbakhsh, I., Blais-
dell, B.E., Karlin, S. (1992)
Methods and algorithms for statistical analysis of
protein sequences.
Proc. Natl. Acad. Sci. USA 89: 2002-2006.
Input sequence can be cut-and-pasted into the box, or a sequence file can be uploaded into the web interface. The sequence file should consist of a single sequence of max 10,000 residues in any of the following formats:
Raw, Plain, EMBL, SwissProt, Genbank, PIR, Fasta, NBRF, GCG. The web interface runs the 'fmtseq' program to convert the sequence into EMBL format.
This web interface to SAPS has the following options:
- Output type
- Normal - regular output
- Terse - Limited output confined to the analysis of the charge distribution and of high scoring segments.
- Verbose - A more detailed output providing additional details for several of the analysis functions.
- Documented - A completely documented output that annotates each part of the program; this should be selected when SAPS is used for the first time as it provides helpful explanations with respect to the statistics being used and the layout of the output.
- Species
- Uses the specified species table for quantile comparisons.
The residue composition of the input protein may be evaluated relative to standard sets of proteins grouped by species, size class, subcellular location, function, or other criteria. Specifically, the composition of the input protein is compared with the quantile table of residue usage for the the user-specified standard set. Extremal usages which fall in the tails of the reference distribution are indicated for individual amino acids, charged and hydropho- bic residues. . For each reference set, only proteins of lengths at least 200 residues were included; redundant entries were culled.
If no species is selected, the input sequence is evaluated with the quantile table 'swp23', a random sample of proteins from SwissProt Release 23. Available species are:
- human
- mouse
- rat
- chicken
- xenopus (frog)
- drosophila (Drosophila melanogaster)
- yeast (Saccharomyces cerevisiae)
- E.coli (Escherichia Coli)
- B.subt. (Bacillus subtilis)
- Positive residues
- By default, SAPS treats only lysine (K) and arginine (R)
as positively charged residues. Alternatively, Histidine (H)
can also be treated as positively
charged in all parts of the program involving the charge
alphabet.
Questions? Problems? Send email to webtools@helix.nih.gov
Helix Systems,
CIT,
NIH.
Analysis of specified amino acid distribution
Clusters of particular amino acid types may be evaluated
by means of the same tests that are used to detect cluster-
ing of charged residues (binomial model and scoring statis-
tics). These tests are invoked by setting the `-a' flag; for
example, to test (separately) for clusters of alanine (A)
and serine (S), set `-a AS'. The binomial test is also pro-
grammed for certain combinations of amino acids: AG (flag
`-a a'), PEST (flag `-a p'), QP (flag `-a q'), ST (flag `-a
s').