DisProt Help
Linking to DisProt
Every protein in the DisProt database can be viewed in HTML format directly
via
http://www.disprot.org/protein.php?id=disprot_id.
The same information can be accessed in XML format via
http://www.disprot.org/protein.php?id=disprot_id&view=xml
and in FASTA format via
http://www.disprot.org/protein.php?id=disprot_id&view=fasta
A disprot_id is a 7-character long string consisting of DP followed
by the 5-digit protein number. For example: DP00016.
Fasta Format
FASTA format consists of one line header followed by a corresponding
amino acids sequence, coded using the single-letter amino
acid code.
Example:
>DisProt|DP00036|sp|P22121|gi|123686|pir|S13365 #1-195:Protein-DNA binding:Transactivation (transcriptional activation)
#217-222:Protein-DNA binding:Transactivation (transcriptional activation) #260-276:Protein-DNA binding:Transactivation (transcriptional activation)
#268-271:Protein-DNA binding:Transactivation (transcriptional activation)
MGHNDSVETMDEISNPNNILLPHDGTGLDATGISGSQEPYGMVDVLNPDSLKDDSNVDEPLIEDIVNPSLDPEGVVSAEP
SNEVGTPLLQQPISLDHVITRPASAGGVYSIGNSSTSSAAKLSDGDLTNATDPLLNNAHGHGQPSSESQSHSNGYHKQGQ
SQQPLLSLNKRKLLAKAHVDKHHSKKKLSTTRARPAFVNKLWSMVNDKSNEKFIHWSTSGESIVVPNRERFVQEVLPKYF
KHSNFASFVRQLNMYGWHKVQDVKSGSMLSNNDSRWEFENENFKRGKEYLLENIVRQKSNTNILGGTTNAEVDIHILLNE
LETVKYNQLAIAEDLKRITKDNEMLWKENMMARERHQSQQQVLEKLLRFLSSVFGPNSAKTIGNGFQPDLIHELSDMQVN
HMSNNNHNNTGNINPNAYHNETDDPMANVFGPLTPTDQGKVPLQDYKLRPRLLLKNRSMSSSSSSNLNQRQSPQNRIVGQ
SPPPQQQQQQQQQQGQPQGQQFSYPIQGGNQMMNQLGSPIGTQVGSPVGSQYGNQYGNQYSNQFGNQLQQQTSRPALHHG
SNGEIRELTPSIVSSDSPDPAFFQDLQNNIDKQEESIQEIQDWITKLNPGPGEDGNTPIFPELNMPSYFANTGGSGQSEQ
PSDYGDSQIEELRNSRLHEPDRSFEEKNNGQKRRRAA
Disordered regions are denoted by symbol "#" in format:
#<starting residue>-<ending residue>
In the example above residues 1 to 195, 217 to 222, 260-276, and 268 to 271 are
disordered.
Ordered (structurally determined) parts of proteins are denoted by symbol "&"
in format:
<starting residue>-<ending residue>
The functional class(es) and subclass(es) (if known) of each structurally determined region follow
the starting-ending residue. Functional classes are denoted by the symbol "*", functional subclasses
are denoted by the symbol ":". For example:
<starting residue>-<ending residue>*Molecular recognition effectors:Protein-protein binding
Structurally undetermined are all remaining residues (posibly containing very short
disordered regions).
In the example above residues 197 to 216, 223 to 259, and 277 to 677 are
structurally undetermined.
The Single-Letter Amino Acid Code
Code |
Amino Acid |
Code |
A | Alanine | Ala |
C | Cysteine | Cys |
D | Aspartic Acid | Asp |
E | Glutamic Acid | Glu |
F | Phenylalanine | Phe |
G | Glycine | Gly |
H | Histidine | His |
I | Isoleucine | Ile |
K | Lysine | Lys |
L | Leucine | Leu |
M | Methionine | Met |
N | Asparagine | Asn |
P | Proline | Pro |
Q | Glutamine | Gln |
R | Arginine | Arg |
S | Serine | Ser |
T | Threonine | Thr |
V | Valine | Val |
W | Tryptophan | Trp |
Y | Tyrosine | Tyr |
Homologues
Homologues are obtained using the CD-HIT clustering program with a 50% identity threshold.
References:
"Clustering of highly homologous sequences to reduce the size of large protein database", Weizhong Li, Lukasz Jaroszewski & Adam Godzik Bioinformatics, (2001) 17:282-283
"Tolerating some redundancy significantly speeds up clustering of large protein databases", Weizhong Li, Lukasz Jaroszewski & Adam Godzik Bioinformatics, (2002) 18:77-82
|