UniProtKB/Swiss-Prot protein knowledgebase release 57.13 statistics
1. INTRODUCTION
Release 57.13 of 19-Jan-10 of UniProtKB/Swiss-Prot contains 514212 sequence entries,
comprising 180900945 amino acids abstracted from 186149 references.
393 sequences have been added since release 57.12, the sequence data of
105 existing entries has been updated and the annotations of
473244 entries have been revised.
Number of fragments: 8438
Number of additional sequences produced by alternative splicing, initiation or promoter usage, or ribosomal frameshifting: 28782
Protein existence (PE): entries %
1: Evidence at protein level 68107 13.2%
2: Evidence at transcript level 66344 12.9%
3: Inferred from homology 363910 70.8%
4: Predicted 14325 2.8%
5: Uncertain 1526 0.3%
The growth of the database is summarized below.
2. TAXONOMIC ORIGIN
Total number of species represented in this release of UniProtKB/Swiss-Prot: 12023
The first twenty species represent 107121 sequences: 20.8 % of the total
number of entries.
2.1 Table of the frequency of occurrence of species
Species represented 1x: 5228
2x: 1699
3x: 895
4x: 573
5x: 418
6x: 342
7x: 245
8x: 210
9x: 179
10x: 105
11- 20x: 575
21- 50x: 367
51-100x: 176
>100x: 1011
2.2 Table of the most represented species
------ --------- --------------------------------------------
Number Frequency Species
------ --------- --------------------------------------------
1 20276 Homo sapiens (Human)
2 16214 Mus musculus (Mouse)
3 8823 Arabidopsis thaliana (Mouse-ear cress)
4 7469 Rattus norvegicus (Rat)
5 6552 Saccharomyces cerevisiae (Baker's yeast)
6 5740 Bos taurus (Bovine)
7 4974 Schizosaccharomyces pombe (Fission yeast)
8 4367 Escherichia coli (strain K12)
9 4248 Bacillus subtilis
10 4089 Dictyostelium discoideum (Slime mold)
11 3278 Caenorhabditis elegans
12 3187 Xenopus laevis (African clawed frog)
13 3052 Drosophila melanogaster (Fruit fly)
14 2597 Danio rerio (Zebrafish) (Brachydanio rerio)
15 2350 Oryza sativa subsp. japonica (Rice)
16 2206 Pongo abelii (Sumatran orangutan)
17 2151 Gallus gallus (Chicken)
18 1993 Escherichia coli O157:H7
19 1782 Methanocaldococcus jannaschii (Methanococcus jannaschii)
20 1773 Haemophilus influenzae
21 1752 Salmonella typhimurium
22 1668 Escherichia coli O6
23 1665 Shigella flexneri
24 1550 Mycobacterium tuberculosis
25 1503 Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
26 1360 Sus scrofa (Pig)
27 1341 Salmonella typhi
28 1273 Pseudomonas aeruginosa
29 1213 Mycobacterium bovis
30 1159 Macaca fascicularis (Crab eating macaque) (Cynomolgus monkey)
31 1015 Synechocystis sp. (strain PCC 6803)
32 995 Yersinia pestis
33 991 Archaeoglobus fulgidus
34 940 Vibrio cholerae
35 929 Salmonella paratyphi A
36 922 Staphylococcus aureus (strain N315)
37 922 Staphylococcus aureus (strain Mu50 / ATCC 700699)
38 911 Rhizobium meliloti (Sinorhizobium meliloti)
39 909 Acanthamoeba polyphaga mimivirus (APMV)
40 896 Staphylococcus aureus (strain COL)
41 894 Staphylococcus aureus (strain MW2)
42 888 Staphylococcus aureus (strain MSSA476)
43 885 Staphylococcus aureus (strain MRSA252)
44 882 Oryctolagus cuniculus (Rabbit)
45 879 Escherichia coli O6:K15:H31 (strain 536 / UPEC)
46 879 Salmonella choleraesuis
47 869 Shigella sonnei (strain Ss046)
48 863 Yersinia pseudotuberculosis
49 835 Escherichia coli O9:H4 (strain HS)
50 829 Escherichia coli O139:H28 (strain E24377A / ETEC)
51 823 Shigella boydii serotype 4 (strain Sb227)
52 818 Escherichia coli (strain UTI89 / UPEC)
53 817 Ashbya gossypii (Yeast) (Eremothecium gossypii)
54 814 Escherichia coli (strain ATCC 8739 / DSM 1576 / Crooks)
55 800 Shigella dysenteriae serotype 1 (strain Sd197)
56 795 Candida albicans (Yeast)
57 794 Vibrio parahaemolyticus
58 789 Kluyveromyces lactis (Yeast) (Candida sphaerica)
59 785 Escherichia coli (strain SMS-3-5 / SECEC)
60 778 Erwinia carotovora subsp. atroseptica (Pectobacterium atrosepticum)
61 776 Pasteurella multocida
62 771 Aquifex aeolicus
63 771 Neurospora crassa
64 765 Escherichia coli (strain K12 / DH10B)
65 764 Canis familiaris (Dog)
66 759 Escherichia coli O127:H6 (strain E2348/69 / EPEC)
67 759 Escherichia coli (strain K12 / BW2952)
68 757 Escherichia coli (strain 55989 / EAEC)
69 757 Staphylococcus epidermidis (strain ATCC 35984 / RP62A)
70 756 Escherichia coli O17:K52:H18 (strain UMN026 / ExPEC)
71 756 Escherichia coli O8 (strain IAI1)
72 756 Staphylococcus epidermidis (strain ATCC 12228)
73 750 Escherichia coli (strain SE11)
74 750 Shigella flexneri serotype 5b (strain 8401)
75 750 Escherichia coli O45:K1 (strain S88 / ExPEC)
76 748 Escherichia coli O7:K1 (strain IAI39 / ExPEC)
77 747 Candida glabrata (Yeast) (Torulopsis glabrata)
78 742 Escherichia coli O157:H7 (strain EC4115 / EHEC)
79 738 Streptomyces coelicolor
80 738 Photorhabdus luminescens subsp. laumondii
81 731 Vibrio vulnificus
82 730 Bacillus halodurans
83 726 Escherichia coli O81 (strain ED1a)
84 722 Yersinia enterocolitica serotype O:8 / biotype 1B (strain 8081)
85 721 Bacillus anthracis
86 719 Salmonella enteritidis PT4 (strain P125109)
87 715 Vibrio vulnificus (strain YJ016)
88 715 Salmonella paratyphi B (strain ATCC BAA-1250 / SPB7)
89 713 Salmonella paratyphi A (strain AKU_12601)
90 712 Yersinia pestis bv. Antiqua (strain Nepal516)
91 712 Yersinia pseudotuberculosis serotype O:1b (strain IP 31758)
92 711 Staphylococcus aureus (strain NCTC 8325)
93 710 Salmonella newport (strain SL254)
94 709 Salmonella heidelberg (strain SL476)
95 709 Salmonella agona (strain SL483)
96 708 Yersinia pestis bv. Antiqua (strain Antiqua)
97 708 Salmonella schwarzengrund (strain CVM19633)
98 705 Escherichia coli O1:K1 / APEC
99 699 Salmonella dublin (strain CT_02021853)
100 697 Enterobacter sp. (strain 638)
101 696 Klebsiella pneumoniae subsp. pneumoniae (strain ATCC 700721 / MGH 78578)
102 696 Shigella boydii serotype 18 (strain CDC 3083-94 / BS512)
103 687 Mycoplasma pneumoniae
104 685 Escherichia fergusonii (strain ATCC 35469 / DSM 13698 / CDC 0568-73)
105 684 Pseudomonas syringae pv. tomato
106 683 Pan troglodytes (Chimpanzee)
107 682 Salmonella gallinarum (strain 287/91 / NCTC 13346)
108 682 Klebsiella pneumoniae (strain 342)
109 676 Anabaena sp. (strain PCC 7120)
110 670 Pseudomonas putida (strain KT2440)
111 665 Staphylococcus aureus (strain USA300)
112 665 Yersinia pestis (strain Pestoides F)
113 664 Citrobacter koseri (strain ATCC BAA-895 / CDC 4225-83 / SGSC4696)
114 661 Mycobacterium leprae
115 658 Rhizobium sp. (strain NGR234)
116 653 Serratia proteamaculans (strain 568)
117 645 Escherichia coli
118 645 Bradyrhizobium japonicum
119 642 Zea mays (Maize)
120 641 Staphylococcus aureus (strain bovine RF122 / ET3-1)
121 638 Bacillus cereus (strain ATCC 14579 / DSM 31)
122 637 Yersinia pseudotuberculosis serotype O:3 (strain YPIII)
123 634 Salmonella arizonae (strain ATCC BAA-731 / CDC346-86 / RSK2980)
124 633 Yersinia pseudotuberculosis serotype IB (strain PB1/+)
125 620 Shewanella oneidensis
126 617 Agrobacterium tumefaciens (strain C58 / ATCC 33970)
127 615 Treponema pallidum
128 612 Ralstonia solanacearum (Pseudomonas solanacearum)
129 608 Staphylococcus haemolyticus (strain JCSC1435)
130 608 Enterobacter sakazakii (strain ATCC BAA-894)
131 602 Rhizobium loti (Mesorhizobium loti)
132 602 Staphylococcus saprophyticus subsp. saprophyticus
133 600 Methanobacterium thermoautotrophicum
134 598 Yersinia pestis bv. Antiqua (strain Angola)
135 598 Salmonella paratyphi C (strain RKS4594)
136 598 Emericella nidulans (Aspergillus nidulans)
137 596 Listeria monocytogenes
138 595 Photobacterium profundum (Photobacterium sp. (strain SS9))
139 593 Debaryomyces hansenii (Yeast) (Torulaspora hansenii)
140 592 Yarrowia lipolytica (Candida lipolytica)
141 590 Bacillus cereus (strain ATCC 10987)
142 589 Xanthomonas campestris pv. campestris
143 588 Listeria innocua
144 585 Rickettsia prowazekii
145 584 Helicobacter pylori (Campylobacter pylori)
146 582 Pectobacterium carotovorum subsp. carotovorum (strain PC1)
147 581 Lactococcus lactis subsp. lactis (Streptococcus lactis)
148 579 Neisseria meningitidis serogroup B
149 576 Brucella suis
150 572 Brucella melitensis
151 572 Buchnera aphidicola subsp. Acyrthosiphon pisum
152 567 Bacillus thuringiensis subsp. konkukian
153 565 Helicobacter pylori J99 (Campylobacter pylori J99)
154 562 Buchnera aphidicola subsp. Schizaphis graminum
155 560 Bacillus cereus (strain ZK / E33L)
156 560 Pseudomonas syringae pv. syringae (strain B728a)
157 557 Pseudomonas aeruginosa (strain UCBPP-PA14)
158 556 Neisseria meningitidis serogroup A
159 555 Bacillus licheniformis (strain DSM 13 / ATCC 14580)
160 555 Xanthomonas axonopodis pv. citri (Citrus canker)
161 553 Vibrio fischeri (strain ATCC 700601 / ES114)
162 551 Pseudomonas fluorescens (strain Pf0-1)
163 549 Oceanobacillus iheyensis
164 545 Caulobacter crescentus (Caulobacter vibrioides)
165 545 Clostridium acetobutylicum
166 545 Pseudomonas fluorescens (strain Pf-5 / ATCC BAA-477)
167 538 Pseudomonas syringae pv. phaseolicola (strain 1448A / Race 6)
168 529 Listeria monocytogenes serotype 4b (strain F2365)
169 523 Erwinia tasmaniensis (strain DSM 17950 / Et1/99)
170 522 Sodalis glossinidius (strain morsitans)
171 521 Bordetella bronchiseptica (Alcaligenes bronchisepticus)
172 521 Xylella fastidiosa
173 519 Streptococcus pneumoniae
174 512 Xylella fastidiosa (strain Temecula1 / ATCC 700964)
175 509 Chromobacterium violaceum
176 509 Thermotoga maritima
177 509 Vibrio cholerae serotype O1 (strain ATCC 39541 / Ogawa 395 / O395)
178 507 Bordetella parapertussis
179 507 Buchnera aphidicola subsp. Baizongia pistaciae
180 507 Pseudomonas aeruginosa (strain PA7)
181 505 Bordetella pertussis
182 504 Haemophilus ducreyi
183 503 Staphylococcus aureus (strain Newman)
184 503 Geobacillus kaustophilus
185 500 Pseudomonas entomophila (strain L48)
186 498 Brucella abortus
187 497 Rickettsia conorii
188 496 Bacillus clausii (strain KSM-K16)
189 492 Haemophilus influenzae (strain 86-028NP)
190 491 Deinococcus radiodurans
191 490 Xanthomonas campestris pv. campestris (strain 8004)
192 490 Vibrio harveyi (strain ATCC BAA-1116 / BB120)
193 490 Clostridium perfringens
194 488 Bacillus amyloliquefaciens (strain FZB42)
195 487 Burkholderia pseudomallei (Pseudomonas pseudomallei)
196 487 Shewanella sp. (strain MR-7)
197 485 Aspergillus fumigatus (Sartorya fumigata)
198 484 Pseudomonas aeruginosa (strain LESB58)
199 484 Shewanella sp. (strain MR-4)
200 483 Mannheimia succiniciproducens (strain MBEL55E)
201 483 Mycoplasma genitalium
202 483 Staphylococcus aureus (strain Mu3 / ATCC 700698)
203 482 Streptomyces avermitilis
204 481 Corynebacterium glutamicum (Brevibacterium flavum)
205 479 Proteus mirabilis (strain HI4320)
206 476 Caenorhabditis briggsae
207 475 Oryza sativa subsp. indica (Rice)
208 475 Synechococcus elongatus (strain PCC 7942) (Anacystis nidulans R2)
209 474 Methanosarcina acetivorans
210 472 Burkholderia sp. (strain 383) (Burkholderia cepacia
211 472 Pseudomonas putida (strain F1 / ATCC 700007)
212 472 Brucella abortus (strain 2308)
213 472 Thermosynechococcus elongatus (strain BP-1)
214 468 Enterococcus faecalis (Streptococcus faecalis)
215 465 Acinetobacter sp. (strain ADP1)
216 465 Pseudomonas putida (strain GB-1)
217 464 Rhodopseudomonas palustris
218 464 Xanthomonas campestris pv. vesicatoria (strain 85-10)
219 464 Shewanella frigidimarina (strain NCIMB 400)
220 462 Anabaena variabilis (strain ATCC 29413 / PCC 7937)
221 462 Shewanella sp. (strain ANA-3)
222 461 Burkholderia mallei (Pseudomonas mallei)
223 461 Pyrococcus horikoshii
224 460 Ralstonia eutropha (Cupriavidus necator
225 458 Lactobacillus plantarum
226 457 Streptococcus pneumoniae (strain ATCC BAA-255 / R6)
227 457 Pyrococcus abyssi
228 457 Ralstonia eutropha (strain JMP134) (Alcaligenes eutrophus)
229 455 Methanosarcina mazei (Methanosarcina frisia)
230 454 Staphylococcus aureus (strain JH1)
231 454 Aeromonas hydrophila subsp. hydrophila (strain ATCC 7966 / NCIB 9240)
232 453 Rickettsia felis (Rickettsia azadi)
233 453 Xanthomonas oryzae pv. oryzae (strain MAFF 311018)
234 452 Shewanella baltica (strain OS185)
235 452 Pseudomonas putida (strain W619)
236 452 Halobacterium salinarium (Halobacterium halobium)
237 448 Staphylococcus aureus (strain JH9)
238 448 Thermoanaerobacter tengcongensis
239 448 Streptococcus mutans
240 446 Methylococcus capsulatus
241 446 Ovis aries (Sheep)
242 446 Rhodobacter sphaeroides (strain ATCC 17023 / 2.4.1 / NCIB 8253 / DSM 158)
243 446 Aeromonas salmonicida (strain A449)
244 444 Vibrio fischeri (strain MJ11)
245 443 Hahella chejuensis (strain KCTC 2396)
246 443 Pseudomonas mendocina (strain ymp)
247 441 Streptococcus pyogenes serotype M6
248 441 Chlamydia trachomatis
249 440 Dechloromonas aromatica (strain RCB)
250 439 Rickettsia bellii (strain RML369-C)
2.3 Taxonomic distribution of the sequences
Kingdom sequences (% of the database)
Archaea 18172 ( 4%)
Bacteria 322942 ( 63%)
Eukaryota 158269 ( 31%)
Viruses 14829 ( 3%)
Within Eukaryota:
Category sequences (% of Eukaryota) (% of the complete database)
Human 20277 ( 13%) ( 4%)
Other Mammalia 44526 ( 28%) ( 9%)
Other Vertebrata 15906 ( 10%) ( 3%)
Viridiplantae 28551 ( 18%) ( 6%)
Fungi 25076 ( 16%) ( 5%)
Insecta 7624 ( 5%) ( 1%)
Nematoda 4028 ( 3%) ( 1%)
Other 12281 ( 8%) ( 2%)
3. SEQUENCE SIZE
Repartition of the sequences by size (excluding fragments)
From To Number From To Number
1- 50 8371 1001-1100 3459
51- 100 39770 1101-1200 2389
101- 150 55711 1201-1300 1902
151- 200 55767 1301-1400 1771
201- 250 54341 1401-1500 1394
251- 300 47811 1501-1600 626
301- 350 48188 1601-1700 492
351- 400 41207 1701-1800 407
401- 450 33658 1801-1900 388
451- 500 26943 1901-2000 321
501- 550 19059 2001-2100 192
551- 600 13679 2101-2200 261
601- 650 11441 2201-2300 268
651- 700 8143 2301-2400 168
701- 750 6782 2401-2500 128
751- 800 4766 >2500 1000
801- 850 4115
851- 900 4735
901- 950 3601
951-1000 2520
The average sequence length in UniProtKB/Swiss-Prot is 351 amino acids.
The shortest sequence is GWA_SEPOF (P83570): 2 amino acids.
The longest sequence is TITIN_MOUSE (A2ASS6): 35213 amino acids.
4. JOURNAL CITATIONS
Note: the following citation statistics reflect the number of distinct
journal citations.
Total number of journals cited in this release of UniProtKB/Swiss-Prot: 2037
4.1 Table of the frequency of journal citations
Journals cited 1x: 653
2x: 287
3x: 132
4x: 108
5x: 84
6x: 60
7x: 35
8x: 40
9x: 39
10x: 24
11- 20x: 161
21- 50x: 162
51-100x: 96
>100x: 156
4.2 List of the most cited journals in UniProtKB/Swiss-Prot
Nb Citations Journal name
-- --------- -------------------------------------------------------------
1 17621 Journal of Biological Chemistry
2 8170 Proceedings of the National Academy of Sciences of the U.S.A.
3 4976 Journal of Bacteriology
4 4487 Gene
5 4453 Biochemical and Biophysical Research Communications
6 4279 Nucleic Acids Research
7 3915 FEBS Letters
8 3754 Biochemistry
9 3699 The EMBO Journal
10 3355 Molecular and Cellular Biology
11 3178 Nature
12 3078 European Journal of Biochemistry
13 2979 Journal of Molecular Biology
14 2952 Biochimica et Biophysica Acta
15 2628 Cell
16 2471 Genomics
17 2146 Biochemical Journal
18 2080 Science
19 2007 Journal of Virology
20 1739 Molecular Microbiology
21 1544 Journal of Cell Biology
22 1486 Plant Molecular Biology
23 1339 Virology
24 1336 Genes and Development
25 1302 Molecular and General Genetics
26 1299 Nature Genetics
27 1289 Human Molecular Genetics
28 1272 Plant Physiology
29 1195 The American Journal of Human Genetics
30 1161 Oncogene
31 1153 Journal of Biochemistry
32 1124 Development
33 1065 Human Mutation
34 999 Molecular Biology of the Cell
35 993 Journal of Immunology
36 971 Genetics
37 876 Structure
38 861 Journal of General Virology
39 857 Infection and Immunity
40 834 The Plant Cell
41 810 Archives of Biochemistry and Biophysics
42 786 Molecular Cell
43 782 Blood
44 755 Yeast
45 738 Microbiology
46 711 The Plant Journal
47 707 Journal of Cell Science
48 707 Developmental Biology
49 658 Cancer Research
50 647 FEMS Microbiology Letters
51 630 Current Biology
52 590 Human Genetics
53 582 Nature Structural Biology
54 577 Mechanisms of Development
55 526 Acta Crystallographica, Section D
56 524 Protein Science
57 523 Current Genetics
58 522 Journal of Neuroscience
59 517 Applied and Environmental Microbiology
60 500 Toxicon
61 497 Journal of Clinical Investigation
62 490 Neuron
63 469 Mammalian Genome
64 449 American Journal of Physiology
65 440 Immunogenetics
66 438 The Journal of Experimental Medicine
67 431 Molecular Endocrinology
68 419 Molecular and Biochemical Parasitology
69 405 Journal of Neurochemistry
70 396 The Journal of Clinical Endocrinology and Metabolism
71 380 Endocrinology
72 375 Journal of Molecular Evolution
73 362 DNA and Cell Biology
74 354 DNA Sequence
75 351 Molecular Biology and Evolution
76 350 Bioscience, Biotechnology, and Biochemistry
77 345 Journal of Medical Genetics
78 344 Proteins
79 313 Brain Research. Molecular Brain Research
80 289 Biological Chemistry Hoppe-Seyler
81 289 Plant and Cell Physiology
82 285 Nature Cell Biology
83 284 Comparative Biochemistry and Physiology
84 283 Experimental Cell Research
85 282 Peptides
86 278 Antimicrobial Agents and Chemotherapy
87 275 Journal of Investigative Dermatology
88 274 Cytogenetics and Cell Genetics
89 263 Molecular Pharmacology
90 253 Biology of Reproduction
91 248 Tissue Antigens
92 246 Journal of General Microbiology
93 245 Genome Research
94 240 Neurology
95 237 RNA
96 235 Developmental Dynamics
97 231 Virus Research
98 227 Developmental Cell
99 215 Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
100 205 DNA Research
101 202 Planta
102 202 European Journal of Immunology
103 201 Molecular Plant-Microbe Interactions
104 199 Biochimie
105 195 Annals of Neurology
106 192 European Journal of Human Genetics
107 191 Genes to Cells
108 186 Eukaryotic cell
109 180 Immunity
110 178 Journal of Human Genetics
111 171 The New England Journal of Medicine
112 170 Molecular and Cellular Endocrinology
113 164 Archives of Microbiology
114 164 Investigative Ophthalmology and Visual Science
115 163 American Journal of Medical Genetics
116 163 Molecular Phylogenetics and Evolution
117 160 Nature Structural and Molecular Biology
118 159 DNA
119 155 Insect Biochemistry and Molecular Biology
120 155 EMBO Reports
121 153 Hemoglobin
122 149 The FASEB Journal
123 148 Bioorganicheskaia Khimiia
124 148 Molecular Reproduction and Development
125 148 Diabetes
126 146 Molecular Immunology
127 144 The FEBS Journal
128 143 Archives of Virology
129 142 Glycobiology
130 140 Clinical Genetics
131 136 General and Comparative Endocrinology
132 135 Animal Genetics
133 134 Molecular Genetics and Metabolism
134 134 International Journal of Cancer
135 131 Molecular and Cellular Neuroscience
136 128 British Journal of Haematology
137 128 Journal of Cellular Biochemistry
138 122 Molecular Genetics and Genomics
139 121 American Journal of Medical Genetics. Part A
140 121 Biological Chemistry
141 120 Agricultural and Biological Chemistry
142 118 Nature Immunology
143 118 Journal of Lipid Research
144 116 BMC Genomics
145 116 Journal of the American Chemical Society
146 113 Thrombosis and Haemostasis
147 113 Journal of Protein Chemistry
148 112 Proteomics
149 110 Circulation Research
150 109 Journal of Neuroscience Research
5. STATISTICS FOR SOME LINE TYPES
The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.
Total Number of Average
Line type / subtype number entries per entry
------------------------------------ -------- --------- ---------
References (RL) 911549 1.77
Journal 719498 383525 1.40 1
Submitted to EMBL/GenBank/DDBJ 179482 166329 0.35 2
Submitted to other databases 10529 9158 0.02 3
Book citation 632 618 <0.01 4
Plant Gene Register 559 547 <0.01 5
Thesis 394 392 <0.01 6
Unpublished observations 292 288 <0.01 7
Patent 157 155 <0.01 8
Worm Breeder's Gazette 6 6 <0.01 9
Total number of distinct authors cited in UniProtKB/Swiss-Prot: 283983
Total Number of Average
Line type / subtype number entries per entry Rank
------------------------------------ -------- --------- --------- ----
Comments (CC) 2156141 4.19
ALLERGEN 457 457 <0.01 26
ALTERNATIVE PRODUCTS 18620 18620 0.04 12
BIOPHYSICOCHEMICAL PROPERTIES 2876 2876 0.01 22
BIOTECHNOLOGY 254 252 <0.01 28
CATALYTIC ACTIVITY 214422 195652 0.42 5
CAUTION 6752 6615 0.01 19
COFACTOR 97463 89486 0.19 7
DEVELOPMENTAL STAGE 8656 8656 0.02 16
DISEASE 4492 3076 0.01 20
DISRUPTION PHENOTYPE 2352 2352 <0.01 23
DOMAIN 30691 27389 0.06 10
ENZYME REGULATION 7662 7662 0.01 18
FUNCTION 380132 364407 0.74 2
INDUCTION 11375 11375 0.02 15
INTERACTION 12027 12027 0.02 14
MASS SPECTROMETRY 4194 3165 0.01 21
MISCELLANEOUS 29608 27326 0.06 11
PATHWAY 125209 114260 0.24 6
PHARMACEUTICAL 83 83 <0.01 29
POLYMORPHISM 765 735 <0.01 24
PTM 34974 28347 0.07 8
RNA EDITING 589 589 <0.01 25
SEQUENCE CAUTION 12637 12637 0.02 13
SIMILARITY 596557 489527 1.16 1
SUBCELLULAR LOCATION 294783 289774 0.57 3
SUBUNIT 217540 217540 0.42 4
TISSUE SPECIFICITY 32275 32275 0.06 9
TOXIC DOSE 409 398 <0.01 27
WEB RESOURCE 8287 6577 0.02 17
Total number of comment topics: 29
Total Number of Average
Line type / subtype number entries per entry Rank
------------------------------------ -------- --------- --------- ----
Features (FT) 3168390 6.16
ACT_SITE 127270 75683 0.25 9
BINDING 193500 55547 0.38 4
CA_BIND 3645 1475 0.01 35
CARBOHYD 95663 24520 0.19 13
CHAIN 520761 509574 1.01 1
COILED 18169 12210 0.04 26
COMPBIAS 48463 25319 0.09 18
CONFLICT 115165 40415 0.22 10
CROSSLNK 4680 3053 0.01 34
DISULFID 93802 24771 0.18 14
DNA_BIND 10858 9993 0.02 29
DOMAIN 141430 84104 0.28 6
HELIX 130330 13633 0.25 8
INIT_MET 14724 14724 0.03 27
LIPID 10512 6694 0.02 30
METAL 263239 64718 0.51 3
MOD_RES 173214 58100 0.34 5
MOTIF 31637 20442 0.06 22
MUTAGEN 29521 7062 0.06 24
NON_CONS 1540 631 <0.01 36
NON_STD 347 272 <0.01 38
NON_TER 11465 8699 0.02 28
NP_BIND 102200 66954 0.20 12
PEPTIDE 8407 5352 0.02 32
PROPEP 10294 8668 0.02 31
REGION 88772 49269 0.17 15
REPEAT 87537 12968 0.17 16
SIGNAL 33555 33545 0.07 21
SITE 36315 21612 0.07 20
STRAND 130873 12742 0.25 7
TOPO_DOM 114529 23491 0.22 11
TRANSIT 6434 6348 0.01 33
TRANSMEM 332026 68306 0.65 2
TURN 31103 10769 0.06 23
UNSURE 1079 348 <0.01 37
VAR_SEQ 38563 16551 0.07 19
VARIANT 78797 16434 0.15 17
ZN_FING 27971 12301 0.05 25
Total number of feature keys: 38
Total Number of Average
Line type / subtype number entries per entry Rank Category
------------------------------------ -------- --------- --------- ---- -------------------------------------------
Cross-references (DR) 12073112 23.48
2DBase-Ecoli 84 84 <0.01 113 2D gel databases
Aarhus/Ghent-2DPAGE 126 96 <0.01 110 2D gel databases
AGD 823 817 <0.01 87 Organism-specific databases
ANU-2DPAGE 23 23 <0.01 120 2D gel databases
ArachnoServer 428 423 <0.01 96
ArrayExpress 58014 58014 0.11 35 Gene expression databases
Bgee 37622 37621 0.07 41 Gene expression databases
BindingDB 297 297 <0.01 104 Other
BioCyc 160422 147567 0.31 18 Enzyme and pathway databases
BRENDA 65152 62356 0.13 30 Enzyme and pathway databases
BuruList 330 330 <0.01 103 Organism-specific databases
CAZy 5645 5024 0.01 62 Protein family/group databases
CGD 554 550 <0.01 92 Organism-specific databases
CleanEx 30224 29576 0.06 43 Gene expression databases
COMPLUYEAST-2DPAGE 59 59 <0.01 115 2D gel databases
Cornea-2DPAGE 67 67 <0.01 114 2D gel databases
CTD 61369 60823 0.12 34 Organism-specific databases
CYGD 6628 6522 0.01 61 Organism-specific databases
dictyBase 4211 4089 0.01 70 Organism-specific databases
DIP 10378 10272 0.02 54 Protein-protein interaction databases
DisProt 397 394 <0.01 98 3D structure databases
DOSAC-COBS-2DPAGE 150 150 <0.01 109 2D gel databases
DrugBank 5317 1626 0.01 64 Other
EchoBASE 4159 4124 0.01 72 Organism-specific databases
ECO2DBASE 351 299 <0.01 102 2D gel databases
EcoGene 4353 4350 0.01 68 Organism-specific databases
eggNOG 216331 216331 0.42 15 Phylogenomic databases
EMBL 844747 504537 1.64 3 Sequence databases
Ensembl 89983 69615 0.17 25 Genome annotation databases
euHCVdb 55 44 <0.01 116 Organism-specific databases
FlyBase 5390 5014 0.01 63 Organism-specific databases
Gene3D 235270 193172 0.46 14 Family and domain databases
GeneCards 21083 19821 0.04 47 Organism-specific databases
GeneDB_Spombe 4976 4931 0.01 66 Organism-specific databases
GeneFarm 2682 2667 0.01 79 Organism-specific databases
GeneID 466502 447505 0.91 6 Genome annotation databases
Genevestigator 64306 64306 0.13 33 Gene expression databases
GenomeReviews 369828 350327 0.72 9 Genome annotation databases
GermOnline 41931 41324 0.08 40 Gene expression databases
GlycoSuiteDB 280 280 <0.01 105 PTM databases
GO 2156980 480722 4.19 1 Ontologies
Gramene 4269 4269 0.01 69 Organism-specific databases
H-InvDB 11249 9556 0.02 53 Organism-specific databases
HAMAP 306962 306819 0.60 13 Family and domain databases
HGNC 19531 19359 0.04 48 Organism-specific databases
HOGENOM 358862 358836 0.70 10 Phylogenomic databases
HOVERGEN 75023 75023 0.15 28 Phylogenomic databases
HPA 8707 6564 0.02 56 Organism-specific databases
HSC-2DPAGE 85 85 <0.01 112 2D gel databases
HSSP 28846 28846 0.06 44 3D structure databases
InParanoid 65616 65616 0.13 29 Phylogenomic databases
IntAct 21322 21322 0.04 46 Protein-protein interaction databases
InterPro 1593469 488549 3.10 2 Family and domain databases
IPI 88171 63236 0.17 26 Sequence databases
KEGG 437439 415763 0.85 8 Genome annotation databases
LegioList 759 757 <0.01 88 Organism-specific databases
Leproma 664 661 <0.01 91 Organism-specific databases
ListiList 1185 1177 <0.01 84 Organism-specific databases
MaizeGDB 471 466 <0.01 94 Organism-specific databases
MEROPS 8465 8206 0.02 57 Protein family/group databases
MGI 16093 16042 0.03 50 Organism-specific databases
MIM 15795 12437 0.03 52 Organism-specific databases
MypuList 203 203 <0.01 108 Organism-specific databases
NextBio 48682 48681 0.09 38 Other
NMPDR 129910 129906 0.25 21 Genome annotation databases
OGP 377 377 <0.01 100 2D gel databases
OMA 352715 352715 0.69 11 Phylogenomic databases
Orphanet 3675 2132 0.01 75 Organism-specific databases
OrthoDB 55287 55287 0.11 36 Phylogenomic databases
PANTHER 184715 169553 0.36 17 Family and domain databases
Pathway_Interaction_DB 4569 1666 0.01 67 Enzyme and pathway databases
PDB 64800 15302 0.13 32 3D structure databases
PDBsum 64800 15302 0.13 31 3D structure databases
PeptideAtlas 5168 5168 0.01 65 Proteomic databases
PeroxiBase 674 662 <0.01 90 Protein family/group databases
Pfam 678247 477490 1.32 4 Family and domain databases
PharmGKB 15817 15806 0.03 51 Organism-specific databases
PHCI-2DPAGE 244 244 <0.01 107 2D gel databases
PhosphoSite 19295 19295 0.04 49 PTM databases
PhosSite 267 267 <0.01 106 PTM databases
PhotoList 738 738 <0.01 89 Organism-specific databases
PhylomeDB 120962 120962 0.24 22 Phylogenomic databases
PIR 114910 104966 0.22 23 Sequence databases
PIRSF 79953 79953 0.16 27 Family and domain databases
PMAP-CutDB 1395 1395 <0.01 82 Other
PMMA-2DPAGE 52 52 <0.01 117 2D gel databases
PptaseDB 34 34 <0.01 118 Protein family/group databases
PRIDE 53136 53136 0.10 37 Proteomic databases
PRINTS 136242 117832 0.26 20 Family and domain databases
ProDom 27768 27439 0.05 45 Family and domain databases
ProMEX 437 437 <0.01 95 Proteomic databases
PROSITE 454615 290111 0.88 7 Family and domain databases
PseudoCAP 1212 1203 <0.01 83 Organism-specific databases
Rat-heart-2DPAGE 28 28 <0.01 119 2D gel databases
Reactome 6997 4095 0.01 59 Enzyme and pathway databases
REBASE 376 355 <0.01 101 Protein family/group databases
RefSeq 486655 447770 0.95 5 Sequence databases
REPRODUCTION-2DPAGE 1030 942 <0.01 86 2D gel databases
RGD 7353 7349 0.01 58 Organism-specific databases
SagaList 389 388 <0.01 99 Organism-specific databases
SGD 6640 6537 0.01 60 Organism-specific databases
Siena-2DPAGE 102 102 <0.01 111 2D gel databases
SMART 141568 109276 0.28 19 Family and domain databases
SMR 345533 345533 0.67 12 3D structure databases
STRING 203356 203353 0.40 16 Protein-protein interaction databases
SubtiList 4191 4182 0.01 71 Organism-specific databases
SWISS-2DPAGE 1183 1183 <0.01 85 2D gel databases
TAIR 8907 8794 0.02 55 Organism-specific databases
TCDB 3282 3242 0.01 77 Protein family/group databases
TIGR 33886 33120 0.07 42 Genome annotation databases
TIGRFAMs 2888 2865 0.01 78 Family and domain databases
TubercuList 1578 1542 <0.01 81 Organism-specific databases
UCSC 48481 39493 0.09 39 Genome annotation databases
UniGene 92395 81394 0.18 24 Sequence databases
VectorBase 403 389 <0.01 97 Genome annotation databases
World-2DPAGE 507 507 <0.01 93 2D gel databases
WormBase 3809 3724 0.01 74 Organism-specific databases
WormPep 4045 3269 0.01 73 Organism-specific databases
Xenbase 3615 3542 0.01 76 Organism-specific databases
ZFIN 2506 2495 <0.01 80 Organism-specific databases
Total number of cross-referenced databases: 120
6. AMINO ACID COMPOSITION
6.1 Composition in percent for the complete database
Ala (A) 8.28 Gln (Q) 3.94 Leu (L) 9.67 Ser (S) 6.50
Arg (R) 5.54 Glu (E) 6.77 Lys (K) 5.86 Thr (T) 5.32
Asn (N) 4.05 Gly (G) 7.09 Met (M) 2.42 Trp (W) 1.07
Asp (D) 5.45 His (H) 2.27 Phe (F) 3.86 Tyr (Y) 2.91
Cys (C) 1.35 Ile (I) 5.99 Pro (P) 4.68 Val (V) 6.88
Asx (B) 0.000 Glx (Z) 0.000 Xaa (X) 0.00
Legend: gray = aliphatic, red = acidic, green = small hydroxy,
blue = basic, black = aromatic, white = amide, yellow = sulfur
6.2 Classification of the amino acids by their frequency
Leu, Ala, Gly, Val, Glu, Ser, Ile, Lys, Arg, Asp, Thr, Pro, Asn, Gln,
Phe, Tyr, Met, His, Cys, Trp
7. MISCELLANEOUS STATISTICS
4444 entries are encoded on a mitochondrion, and 3549 are encoded on a plasmid.
12104 entries are encoded on a plastid,
of which 21 are encoded on apicoplasts,
11546 on chloroplasts,
44 on organellar chromatophores,
145 on cyanelles,
149 on non-photosynthetic plastids and
199 on unspecified types of plastid.
Number of entries with at least one sequence correction: 68193