SWISS-PROT RELEASE 26.0 RELEASE NOTES
1. INTRODUCTION
1.1 Evolution
Release 26.0 of SWISS-PROT contains 31808 sequence entries, comprising
10'875'091 amino acids abstracted from 30967 references. This represents
an increase of 6.5% over release 25. The recent growth of the data bank
is summarized below.
Release Date Number of entries Nb of amino acids
3.0 11/86 4160 969 641
4.0 04/87 4387 1 036 010
5.0 09/87 5205 1 327 683
6.0 01/88 6102 1 653 982
7.0 04/88 6821 1 885 771
8.0 08/88 7724 2 224 465
9.0 11/88 8702 2 498 140
10.0 03/89 10008 2 952 613
11.0 07/89 10856 3 265 966
12.0 10/89 12305 3 797 482
13.0 01/90 13837 4 347 336
14.0 04/90 15409 4 914 264
15.0 08/90 16941 5 486 399
16.0 11/90 18364 5 986 949
17.0 02/91 20024 6 524 504
18.0 05/91 20772 6 792 034
19.0 08/91 21795 7 173 785
20.0 11/91 22654 7 500 130
21.0 03/92 23742 7 866 596
22.0 05/92 25044 8 375 696
23.0 08/92 26706 9 011 391
24.0 12/92 28154 9 545 427
25.0 04/93 29955 10 214 020
26.0 07/93 31808 10 875 091
1.2 Source of data
Release 26.0 has been updated using protein sequence data from release
36.0 of the PIR (Protein Identification Resource) protein data bank, as
well as translation of nucleotide sequence data from release 35.0 of the
EMBL Nucleotide Sequence Database.
As an indication to the source of the sequence data in the SWISS-PROT
data bank we list here the statistics concerning the DR (Database cross-
references) pointer lines:
Entries with pointer(s) to only PIR entri(es): 4485
Entries with pointer(s) to only EMBL entri(es): 4471
Entries with pointer(s) to both EMBL and PIR entri(es): 22181
Entries with no pointers lines: 671
<PAGE>
2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 25
2.1 Sequences and annotations
About 1875 sequences have been added since release 25, the sequence data
of 286 existing entries has been updated and the annotations of 4100
entries have been revised. In particular we have used reviews articles
to update the annotations of the following groups or families of
proteins:
- 6-hydroxy-D-nicotine oxidase and reticuline oxidase
- ABC-2 type transport system integral membrane proteins
- Acyl-CoA-binding protein
- Beta-eliminating lyases
- Calsequestrins
- CAP-Gly domain protein
- Carbamoyl-phosphate synthase
- Chaperonins clpA/B
- Chloroplast encoded proteins
- Cys/Met metabolism enzymes
- D-alanine--D-alanine ligases
- Dihydroxy-acid and 6-phosphogluconate dehydratases
- Galanin
- General secretion pathway (GSP) proteins
- Guanylate kinase family
- GTP cyclohydrolase I
- Hox complex proteins
- Indoleamine 2,3-dioxygenase
- MCM2/3/5 family
- Nickel-dependent hydrogenases b-type cytochrome subunit
- Ornithine/DAP/arginine decarboxylases family 2
- Osteopontin
- 'POU' domain proteins
- Prephenate dehydratase
- Proteins containing cyclic nucleotide-binding domain(s)
- Proteasome A-type and B-type subunits
- Sodium:alanine symporters
- Sodium:galactoside symporters
- Sodium:neurotransmitter symporters
2.2 Weekly updates of SWISS-PROT
Since release 24, we provide weekly updates of SWISS-PROT. These updates
are available by anonymous FTP. Three files are updated every week:
new_seq.dat Contains all the new entries since the last full release.
upd_seq.dat Contains the entries for which the sequence data has been
updated since the last release.
upd_ann.dat Contains the entries for which one or more annotation
fields have been updated since the last release.
<PAGE>
Currently these files are available on the following anonymous ftp
servers:
Organism EMBL ftp server
Address ftp.embl-heidelberg.de (or 192.54.41.33)
Directory /pub/databases/swissprot/new
Organism Basel Biozentrum Biocomputing server (EMBnet SWISS node)
Address bioftp.unibas.ch (or 131.152.8.1)
Directory /archive_data/brand_new/swissprot/updates
Organism ExPASy (Geneva University Expert Protein Analysis System)
Address expasy.hcuge.ch (or 129.195.254.61)
Directory /databases/swiss-prot/updates
Organism National Center for Biotechnology Information (NCBI)
Address ncbi.nlm.nih.gov (or 130.14.20.1)
Directory /repository/swiss-prot/updates
!! Important notes !!!
Although we try to follow a regular schedule, we do not promise to
update these files every week. In some cases two weeks will elapse in-
between two updates.
Due to the current mechanism used to build a release the entries that
are provided in these updates are not guaranteed to be error free. Also,
for the same reason, new entries do not contain an OC (Organism
Classification) line.
3. ENZYME AND PROSITE
3.1 The ENZYME data bank
3.1.1 Statistics
Release 13.0 of the ENZYME data bank is distributed with release 26 of
SWISS-PROT. ENZYME release 13.0 contains information relative to 3489
enzymes.
3.1.2 Important change to the User's manual
A form is now provided that can be used to fill in the information
necessary for the Nomenclature Committee of the IUBMB to assign an EC
number or to correct error in existing entries. this form can be found
at the end of the User's manual of the ENZYME data bank (Appendix 1).
<PAGE>
The NC-IUBMB will regularly send us updates and additions to the
nomenclature so that they can be integrated into the data bank in a
timely manner.
3.1.3 Citation
A paper has been published that briefly describes the ENZYME data bank.
You can use it if you want to cite ENZYME in a publication:
Bairoch A.
The ENZYME data bank
Nucleic Acids Res. 21:3155-3156(1993).
3.2 The PROSITE data bank
Release 10.2 of the PROSITE data bank is distributed with release 26 of
SWISS-PROT. Release 10.2 contains 635 documentation chapters that
describes 803 different patterns. Release 10.2 does not really represent
a new release; the only changes between releases 10.0; 10.1 and 10.2 are
updating of the pointers to the SWISS-PROT entries whose name have been
modified between releases 25 and 26. The next release of PROSITE (11.0)
will be distributed with release 27 of SWISS-PROT.
Normally release 11 should have been distributed with release 26 of
SWISS-PROT, but technical difficulties have delayed this new release
which will offer many new patterns.
4. WE NEED YOUR HELP !
We welcome feedback from our users. We would especially appreciate that
you notify us if you find that sequences belonging to your field of
expertise are missing from the data bank. We also would like to be
notified about annotations to be updated, if, for example, the function
of a protein has been clarified or if new post-translational information
has become available.
<PAGE>
APPENDIX A: SOME STATISTICS
A.1 Amino acid composition
A.1.1 Composition in percent for the complete data bank
Ala (A) 7.68 Gln (Q) 4.03 Leu (L) 9.19 Ser (S) 7.06
Arg (R) 5.25 Glu (E) 6.26 Lys (K) 5.80 Thr (T) 5.83
Asn (N) 4.43 Gly (G) 7.08 Met (M) 2.35 Trp (W) 1.31
Asp (D) 5.26 His (H) 2.25 Phe (F) 3.99 Tyr (Y) 3.22
Cys (C) 1.79 Ile (I) 5.52 Pro (P) 5.04 Val (V) 6.53
Asx (B) 0.005 Glx (Z) 0.005 Xaa (X) 0.02
A.1.2 Classification of the amino acids by their frequency
Leu, Ala, Gly, Ser, Val, Glu, Thr, Lys, Ile, Asp, Arg, Pro, Asn, Gln,
Phe, Tyr, Met, His, Cys, Trp
A.2 Repartition of the sequences by their organism of origin
Total number of species represented in this release of SWISS-PROT: 4052
A.2.1 Table of the frequency of occurrence of species
Species represented 1x: 1844
2x: 662
3x: 385
4x: 243
5x: 179
6x: 135
7x: 87
8x: 76
9x: 81
10x: 45
11- 20x: 147
21- 50x: 98
51-100x: 32
>100x: 38
<PAGE>
A.2.2 Table of the most represented species
Number Frequency Species
1 2454 Human
2 2222 Escherichia coli
3 1439 Mouse
4 1339 Rat
5 1220 Baker's yeast (Saccharomyces cerevisiae)
6 634 Bovine
7 560 Fruit fly (Drosophila melanogaster)
8 477 Chicken
9 454 Bacillus subtilis
10 362 African clawed frog (Xenopus laevis)
11 340 Salmonella typhimurium
12 333 Rabbit
13 298 Pig
14 251 Vaccinia virus (strain Copenhagen)
15 222 Maize
16 193 Human cytomegalovirus (strain AD169)
17 177 Arabidopsis thaliana (Mouse-ear cress)
177 Rice
19 176 Vaccinia virus (strain WR)
20 167 Bacteriophage T4
21 161 Pea
22 159 Tobacco
159 Wheat
24 151 Pseudomonas aeruginosa
25 142 Caenorhabditis elegans
26 141 Fission yeast (Schizosaccharomyces pombe)
27 133 Barley
28 129 Staphylococcus aureus
29 127 Spinach
30 125 Soybean
31 123 Sheep
32 122 Slime mold (Dictyostelium discoideum)
33 119 Marchantia polymorpha (Liverwort)
34 118 Rhodobacter capsulatus
35 115 Dog
36 113 Pseudomonas putida
37 110 Neurospora crassa
110 Klebsiella pneumoniae
<PAGE>
A.3 Repartition of the sequences by size
From To Number From To Number
1- 50 1915 1001-1100 306
51- 100 3248 1101-1200 188
101- 150 4568 1201-1300 151
151- 200 3081 1301-1400 95
201- 250 2653 1401-1500 80
251- 300 2358 1501-1600 41
301- 350 2165 1601-1700 43
351- 400 2244 1701-1800 36
401- 450 1657 1801-1900 40
451- 500 1851 1901-2000 31
501- 550 1247 2001-2100 12
551- 600 871 2101-2200 37
601- 650 614 2201-2300 47
651- 700 445 2301-2400 16
701- 750 437 2401-2500 17
751- 800 351 >2500 94
801- 850 256
851- 900 270
901- 950 166
951-1000 177
Currently the ten largest sequences are:
RYNR_RABIT 5037 a.a.
RYNR_HUMAN 5032 a.a.
APB_HUMAN 4563 a.a.
APOA_HUMAN 4548 a.a.
RRPA_CVMJH 4488 a.a.
DYHC_TRIGR 4466 a.a.
GRSB_BACBR 4451 a.a.
PLEC_RAT 4140 a.a.
POLG_BVDV 3988 a.a.
VGF1_IBVB 3951 a.a.
<PAGE>
APPENDIX B: ON-LINE EXPERTS
B.1 List of on-line experts for PROSITE and SWISS-PROT
Field of expertise Name Email address
--------------------------- ------------------ ----------------------------
Alcohol dehydrogenases Joernvall H. hans.jornvall@k1m.ki.se
Persson B. bengt.persson@embl-
heidelberg.de
Aldehyde dehydrogenases Joernvall H. hans.jornvall@k1m.ki.se
Persson B. bengt.persson@embl-
heidelberg.de
Alpha-crystallins/HSP-20 Leunissen J.A.M. jackl@caos.caos.kun.nl
de Jong W. u629000@hnykun11.bitnet
Alpha-2-macroglobulins Van Leuven F. fred@blekul13.bitnet
AA-tRNA synthetases class II Leberman R. leberman@frembl51.bitnet
Apolipoproteins Boguski M.S. boguski@ncbi.nlm.nih.gov
AraC family HTH proteins Ramos J.L. jlramos@cnbvx3.cnb.uam.es
Arrestins Kolakowski L.F.Jr. kolakowski@helix.mgh.
harvard.edu
ATP synthase c subunit Recipon H. recipon@ncbi.nlm.nih.gov
Band 4.1 family proteins Rees J. jrees@vax.oxford.ac.uk
Beta-lactamases Brannigan J. jab5@vaxa.york.ac.uk
Beta-transducin family Boguski M.S. boguski@ncbi.nlm.nih.gov
C-type lectin domain Drickamer K. drick@cuhhca.hhmi.columbia.
edu
Chalcone/stilbene synthases Schroeder J. raf@sun1.ruf.uni-freiburg.de
Chaperonins cpn10/cpn60 Georgopoulos C. georgopo@cmu.unige.ch
Chaperonins TCP1 family Willison K.R. willison@icr.ac.uk
Chitinases Henrissat B. bernie@cermav.grenet.fr
Clusterin Peitsch M.C. peitsch@ulbio1.unil.ch
Cold shock domain Landsman D. landsman@ncbi.nlm.nih.gov
CTF/NF-I Mermod N. nmermod@ulys.unil.ch
Gronostajski R. gronosr@ccsmtp.ccf.org
Cytochromes P450 Holsztynska E.J. ela@netcom.uucp
netcom!ela@apple.com
DEAD-box helicases Linder P. linder@urz.unibas.ch
dnaJ family Kelley W. kelley@cmu.unige.ch
EF-hand calcium-binding Cox J.A. cox@sc2a.unige.ch
Kretsinger R.H. rhk5i@virginia.bitnet
Elongation factor 1 Amons R. wmbamons@rulgl.leidenuniv.nl
Enoyl-CoA hydratase Hofmann K.O. khofmann@biomed.biolan.uni-
koeln.de
Fatty acid desaturases Piffanelli P. piffanelli@jii.afrc.ac.uk
fruR/lacI family HTH proteins Reizer J. jreizer@ucsd.edu
GATA-type zinc-fingers Boguski M.S. boguski@ncbi.nlm.nih.gov
GDT/GTP dissociation stimul. Boguski M.S. boguski@ncbi.nlm.nih.gov
GltP family of transporters Hofmann K.O. khofmann@biomed.biolan.uni-
koeln.de
Glucanases Henrissat B. bernie@cermav.grenet.fr
Beguin P. phycel@pasteur.bitnet
<PAGE>
Glutamine synthetase Tateno Y. ytateno@genes.nig.ac.jp
G-protein coupled receptors Chollet A. arc3029@ggr.co.uk
Attwood T.K. bph6tka@biovax.leeds.ac.uk
GTPase-activating proteins Boguski M.S. boguski@ncbi.nlm.nih.gov
HIT family Seraphin B. seraphin@embl-heidelberg.de
HMG1/2 and HMG-14/17 Landsman D. landsman@ncbi.nlm.nih.gov
Inorganic pyrophosphatases Kolakowski L.F.Jr. kolakowski@helix.mgh.
harvard.edu
Integrases Roy P.H. 2020000@saphir.ulaval.ca
Kringle domain Ikeo K. kikeo@genes.nig.ac.jp
Lipocalins Boguski M.S. boguski@ncbi.nlm.nih.gov
Peitsch M.C. peitsch@ulbio1.unil.ch
lysR family HTH proteins Henikoff S. henikoff@sparky.fhcrc.org
MAC components / perforin Peitsch M.C. peitsch@ulbio1.unil.ch
Malic enzymes Glynias M. mglynias@ncsa.uiuc.edu
MAM domain Bork P. bork@embl-heidelberg.de
MIP family proteins Reizer J. jreizer@ucsd.edu
Myelin proteolipid protein Hofmann K.O. khofmann@biomed.biolan.uni-
koeln.de
Pancreatic trypsin inhibitor Ikeo K. kikeo@genes.nig.ac.jp
PEP requiring enzymes Reizer J. jreizer@ucsd.edu
pfkB carbohydrate kinases Reizer J. jreizer@ucsd.edu
Phytochromes Partis M.D. partis@gcri.afrc.ac.uk
Protein kinases Hanks S. hanks@vuctrvax.bitnet
Hunter T. hunter@salk-sc2.sdsc.edu
PTS proteins Reizer J. jreizer@ucsd.edu
Restriction-modification Bickle T. bickle@urz.unibas.ch
enzymes Roberts R.J. roberts@neb.com
Ribosomal protein S15 Ellis S.R. srelli01@ulkyvm.bitnet
Ring-cleavage dioxygenases Harayama S. sharayam@ddbj.nig.ac.jp
Signal sequence peptidases von Heijne G. gvh@csb.ki.se
Dalbey R.E. rdalbey@magnus.acs.ohio-
state.edu
Sodium symporters Reizer J. jreizer@ucsd.edu
Subtilases Brannigan J. jab5@vaxa.york.ac.uk
Siezen R.J. nizo@caos.caos.kun.nl
Thiol proteases Turk B. turk@ijs.ac.mail.yu
Thiol proteases inhibitors Turk B. turk@ijs.ac.mail.yu
TNF family Jongeneel C.V. vjongene@isrecmail.unil.ch
TPR repeats Boguski M.S. boguski@ncbi.nlm.nih.gov
Transit peptides von Heijne G. gvh@csb.ki.se
Type-II membrane antigens Levy S. levy@cellbio.stanford.edu
Uracil-DNA glycosylase Aasland R. aasland@bio.uib.no
Vitamin K-depend. Gla domain Price P.A. pprice@ucsd.edu
XPGC protein Clarkson S.G. clarkson@cmu.unige.ch
Xylose isomerase Jenkins J. jenkins@frira.afrc.ac.uk
WAP-type domain Claverie J.-M. jmc@ncbi.nlm.nih.gov
ZP domain Bork P. bork@embl-heidelberg.de
<PAGE>
African swine fever virus Yanez R.J. ryanez@cbm2.uam.es
Bacteriophage P4 Halling C. chh9@midway.uchicago.edu
Chloroplast encoded proteins Hallick R.B. hallick@arizona.edu
Drosophila Ashburner M. ma11@phx.cam.ac.uk
Escherichia coli Rudd K. rudd@ncbi.nlm.nih.gov
Salmonella typhimurium Rudd K. rudd@ncbi.nlm.nih.gov
Snakes Stocklin R. stocklin@cmu.unige.ch
Requirements to fulfill to become an on-line expert
===================================================
An expert should be a scientist working with specific famili(es) of proteins
(or specific domains) and who would:
a) Review the protein sequences in SWISS-PROT and the patterns/matrices
in PROSITE relevant to their field of research.
b) Agree to be contacted by people that have obtained new sequence(s)
which seem to belong to "their" familie(s) of proteins.
c) Have access to electronic mail and be willing to use it to send and
receive data.
If you are willing to be part of this scheme please contact Amos Bairoch
at the following electronic mail address:
bairoch@cmu.unige.ch
<PAGE>
APPENDIX C: RELATIONSHIPS BETWEEN BIOMOLECULAR DATABASES
The current status of the relationships (cross-references) between some
biomolecular databases is shown in the following schematic:
**********************
*********************** * EPD [Euk. Promot.] *
* EMBL Nucleotide * <----> **********************
* Sequence Data *
****************** * Library * **********************
* FLYBASE * <--> *********************** <----- * ECD [E. coli map] *
* [Drosophila * ^ ^ **********************
* genomic d.b.] * <------+ | |
****************** | | +--------- **********************
| | * TFD [Trans. fact.] *
| | +--------> **********************
| | |
****************** v v v **********************
* REBASE * *********************** * ENZYME [Nomencl.] *
* [Restriction * <--- * SWISS-PROT * <----- **********************
* enzymes] * * Protein Sequence * |
****************** * Data Bank * v
*********************** **********************
****************** ^ ^ | | ^ ^ | * OMIM [Diseases] *
* EcoGene/EcoSeq * | | | | | | +--------> **********************
* [E. coli] * <-----+ | | | | |
****************** | | | | +-----------> **********************
| | | | * ECO2DBASE [2D] *
| | | | **********************
****************** | | | |
* PROSITE * <--------+ | | +---------------> **********************
* [Patterns] * | | * SWISS-2DPAGE [2D] *
****************** | +---------------+ **********************
| v |
| *********************** | **********************
+--------> * PDB [3D structures] * +--> * Aarhus/Ghent [2D] *
*********************** **********************
<PAGE>