Disprot - Database of Protein Disorder

DisProt Help

Linking to DisProt
Fasta Format
The Single-Letter Amino Acid Code
Homologues

Linking to DisProt

Every protein in the DisProt database can be viewed in HTML format directly via

http://www.disprot.org/protein.php?id=disprot_id.

The same information can be accessed in XML format via

http://www.disprot.org/protein.php?id=disprot_id&view=xml

and in FASTA format via

http://www.disprot.org/protein.php?id=disprot_id&view=fasta

A disprot_id is a 7-character long string consisting of DP followed by the 5-digit protein number. For example: DP00016.

Fasta Format

FASTA format consists of one line header followed by a corresponding amino acids sequence, coded using the single-letter amino acid code.

Example:

>DisProt|DP00036|sp|P22121|gi|123686|pir|S13365 #1-195:Protein-DNA binding:Transactivation (transcriptional activation)
#217-222:Protein-DNA binding:Transactivation (transcriptional activation) #260-276:Protein-DNA binding:Transactivation (transcriptional activation)
#268-271:Protein-DNA binding:Transactivation (transcriptional activation) 
MGHNDSVETMDEISNPNNILLPHDGTGLDATGISGSQEPYGMVDVLNPDSLKDDSNVDEPLIEDIVNPSLDPEGVVSAEP
SNEVGTPLLQQPISLDHVITRPASAGGVYSIGNSSTSSAAKLSDGDLTNATDPLLNNAHGHGQPSSESQSHSNGYHKQGQ
SQQPLLSLNKRKLLAKAHVDKHHSKKKLSTTRARPAFVNKLWSMVNDKSNEKFIHWSTSGESIVVPNRERFVQEVLPKYF
KHSNFASFVRQLNMYGWHKVQDVKSGSMLSNNDSRWEFENENFKRGKEYLLENIVRQKSNTNILGGTTNAEVDIHILLNE
LETVKYNQLAIAEDLKRITKDNEMLWKENMMARERHQSQQQVLEKLLRFLSSVFGPNSAKTIGNGFQPDLIHELSDMQVN
HMSNNNHNNTGNINPNAYHNETDDPMANVFGPLTPTDQGKVPLQDYKLRPRLLLKNRSMSSSSSSNLNQRQSPQNRIVGQ
SPPPQQQQQQQQQQGQPQGQQFSYPIQGGNQMMNQLGSPIGTQVGSPVGSQYGNQYGNQYSNQFGNQLQQQTSRPALHHG
SNGEIRELTPSIVSSDSPDPAFFQDLQNNIDKQEESIQEIQDWITKLNPGPGEDGNTPIFPELNMPSYFANTGGSGQSEQ
PSDYGDSQIEELRNSRLHEPDRSFEEKNNGQKRRRAA

Disordered regions are denoted by symbol "#" in format:

#<starting residue>-<ending residue>

In the example above residues 1 to 195, 217 to 222, 260-276, and 268 to 271 are disordered.

Ordered (structurally determined) parts of proteins are denoted by symbol "&" in format:

<starting residue>-<ending residue>

The functional class(es) and subclass(es) (if known) of each structurally determined region follow
the starting-ending residue. Functional classes are denoted by the symbol "*", functional subclasses
are denoted by the symbol ":". For example:

<starting residue>-<ending residue>*Molecular recognition effectors:Protein-protein binding

Structurally undetermined are all remaining residues (posibly containing very short disordered regions).

In the example above residues 197 to 216, 223 to 259, and 277 to 677 are structurally undetermined.

Top

The Single-Letter Amino Acid Code

Code	Amino Acid	Code
A	Alanine	Ala
C	Cysteine	Cys
D	Aspartic Acid	Asp
E	Glutamic Acid	Glu
F	Phenylalanine	Phe
G	Glycine	Gly
H	Histidine	His
I	Isoleucine	Ile
K	Lysine	Lys
L	Leucine	Leu
M	Methionine	Met
N	Asparagine	Asn
P	Proline	Pro
Q	Glutamine	Gln
R	Arginine	Arg
S	Serine	Ser
T	Threonine	Thr
V	Valine	Val
W	Tryptophan	Trp
Y	Tyrosine	Tyr

Top

Homologues

Homologues are obtained using the CD-HIT clustering program with a 50% identity threshold.

References:
"Clustering of highly homologous sequences to reduce the size of large protein database", Weizhong Li, Lukasz Jaroszewski & Adam Godzik Bioinformatics, (2001) 17:282-283
"Tolerating some redundancy significantly speeds up clustering of large protein databases", Weizhong Li, Lukasz Jaroszewski & Adam Godzik Bioinformatics, (2002) 18:77-82

Top

Disprot-footer



			Contact us