ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
Hosted by NCSC USMirror sites:Canada China Korea Switzerland Taiwan
Search for

         SWISS-PROT PROTEIN KNOWLEDGEBASE RELEASE 41.4 STATISTICS


1.  INTRODUCTION

Release 41.4 of 11-Apr-2003 of Swiss-Prot contains 124464 sequence entries,
comprising 45704421 amino acids abstracted from 104368 references. 

1905 sequences have been added since release 41, the sequence data of
178 existing entries has been updated and the annotations of
10818 entries have been revised. This represents an increase of 2%.

The growth of the database is summarized below.

   



2.  AMINO ACID COMPOSITION

   2.1  Composition in percent for the complete database

   Ala (A) 7.72   Gln (Q) 3.92   Leu (L) 9.57   Ser (S) 6.98
   Arg (R) 5.24   Glu (E) 6.55   Lys (K) 5.97   Thr (T) 5.51
   Asn (N) 4.28   Gly (G) 6.90   Met (M) 2.37   Trp (W) 1.18
   Asp (D) 5.28   His (H) 2.26   Phe (F) 4.06   Tyr (Y) 3.12
   Cys (C) 1.59   Ile (I) 5.89   Pro (P) 4.87   Val (V) 6.66

   Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.01

   

   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   2.2  Classification of the amino acids by their frequency

   Leu, Ala, Ser, Gly, Val, Glu, Lys, Ile, Thr, Asp, Arg, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Cys, Trp


3.  TAXONOMIC ORIGIN

   Total number of species represented in this release of Swiss-Prot: 7830

   The first twenty species represent 52260 sequences:    42 % of the total
   number of entries.


   3.1 Table of the frequency of occurrence of species

        Species represented 1x: 3701
                            2x: 1210
                            3x:  631
                            4x:  406
                            5x:  261
                            6x:  256
                            7x:  198
                            8x:  147
                            9x:  124
                           10x:   66
                       11- 20x:  330
                       21- 50x:  248
                       51-100x:   91
                         >100x:  161


   3.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1       9294  Homo sapiens (Human)
       2       6247  Mus musculus (Mouse)
       3       4893  Saccharomyces cerevisiae (Baker's yeast)
       4       4832  Escherichia coli
       5       3465  Rattus norvegicus (Rat)
       6       2406  Bacillus subtilis
       7       2321  Caenorhabditis elegans
       8       2130  Schizosaccharomyces pombe (Fission yeast)
       9       2092  Arabidopsis thaliana (Mouse-ear cress)
      10       1794  Drosophila melanogaster (Fruit fly)
      11       1773  Haemophilus influenzae
      12       1530  Methanococcus jannaschii
      13       1506  Escherichia coli O157:H7
      14       1394  Bos taurus (Bovine)
      15       1375  Mycobacterium tuberculosis
      16       1257  Salmonella typhimurium
      17       1064  Gallus gallus (Chicken)
      18       1006  Escherichia coli O6
      19        961  Shigella flexneri
      20        920  Synechocystis sp. (strain PCC 6803)
      21        880  Archaeoglobus fulgidus
      22        846  Pseudomonas aeruginosa
      23        845  Xenopus laevis (African clawed frog)
      24        823  Sus scrofa (Pig)
      25        796  Salmonella typhi
      26        716  Aquifex aeolicus
      27        705  Oryctolagus cuniculus (Rabbit)
      28        687  Mycoplasma pneumoniae
      29        681  Rhizobium meliloti (Sinorhizobium meliloti)
      30        624  Vibrio cholerae
      31        599  Treponema pallidum
      32        586  Mycobacterium leprae
      33        572  Buchnera aphidicola (subsp. Acyrthosiphon pisum) 
      34        560  Buchnera aphidicola (subsp. Schizaphis graminum)
      35        544  Yersinia pestis
      36        537  Helicobacter pylori (Campylobacter pylori)
      37        535  Rickettsia prowazekii
      38        526  Streptomyces coelicolor
      39        520  Helicobacter pylori J99 (Campylobacter pylori J99)
      40        498  Bacillus halodurans
      41        495  Methanobacterium thermoautotrophicum
      42        491  Zea mays (Maize)
      43        489  Pasteurella multocida
      44        486  Mycoplasma genitalium
      45        469  Anabaena sp. (strain PCC 7120)
      46        437  Lactococcus lactis (subsp. lactis) (Streptococcus lactis)
      47        421  Thermotoga maritima
      48        417  Oryza sativa (Rice)
      49        406  Borrelia burgdorferi (Lyme disease spirochete)
      50        405  Chlamydia trachomatis
      51        403  Rhizobium sp. (strain NGR234)
      52        394  Canis familiaris (Dog)
      53        394  Neisseria meningitidis (serogroup B)
      54        392  Chlamydia pneumoniae (Chlamydophila pneumoniae)
      55        390  Neisseria meningitidis (serogroup A)
      56        382  Chlamydia muridarum
      57        370  Pyrococcus horikoshii
      58        369  Caulobacter crescentus
      59        367  Listeria monocytogenes
      60        367  Clostridium acetobutylicum
      61        363  Pyrococcus abyssi
      62        363  Rhizobium loti (Mesorhizobium loti)
      63        362  Ralstonia solanacearum (Pseudomonas solanacearum)
      64        360  Listeria innocua
      65        358  Streptococcus pneumoniae
      66        356  Agrobacterium tumefaciens (strain C58 / ATCC 33970)
      67        343  Nicotiana tabacum (Common tobacco)
      68        342  Xylella fastidiosa
      69        339  Deinococcus radiodurans
      70        335  Xanthomonas campestris (pv. campestris)
      71        333  Ovis aries (Sheep)
      72        327  Staphylococcus aureus (strain N315)
      73        326  Halobacterium sp. (strain NRC-1 / ATCC 700922 / JCM 11081)
      74        326  Campylobacter jejuni
      75        324  Staphylococcus aureus (strain Mu50 / ATCC 700699)
      76        319  Clostridium perfringens
      77        316  Dictyostelium discoideum (Slime mold)
      78        312  Corynebacterium glutamicum (Brevibacterium flavum)
      79        306  Sulfolobus solfataricus
      80        304  Staphylococcus aureus (strain MW2)
      81        299  Xanthomonas axonopodis (pv. citri)
      82        291  Streptococcus pyogenes
      83        289  Aeropyrum pernix
      84        289  Pisum sativum (Garden pea)
      85        285  Pyrococcus furiosus
      86        279  Staphylococcus aureus
      87        276  Brucella melitensis
      88        268  Bacteriophage T4
      89        266  Neurospora crassa
      90        265  Triticum aestivum (Wheat)
      91        264  Rickettsia conorii
      92        264  Candida albicans (Yeast)
      93        263  Thermoanaerobacter tengcongensis
      94        258  Hordeum vulgare (Barley)
      95        254  Vaccinia virus (strain Copenhagen)
      96        254  Methanosarcina mazei (Methanosarcina frisia)
      97        254  Glycine max (Soybean)
      98        253  Methanosarcina acetivorans
      99        251  Lycopersicon esculentum (Tomato)
     100        248  Rhodobacter capsulatus (Rhodopseudomonas capsulata)


   
   3.3  Taxonomic distribution of the sequences

   

   Kingdom        sequences (% of the database)
    Archaea            7218 (  6%)
    Bacteria          47533 ( 38%)
    Eukaryota         61240 ( 49%)
    Viruses            8473 (  7%)


   Within Eukaryota:

   

    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                   9294 ( 15%)           (  7%)
     Other Mammalia         16152 ( 26%)           ( 13%)
     Other Vertebrata        5824 ( 10%)           (  5%)
     Viridiplantae           9814 ( 16%)           (  8%)
     Fungi                   9364 ( 15%)           (  8%)
     Insecta                 3394 (  6%)           (  3%)
     Nematoda                2535 (  4%)           (  2%)
     Other                   4863 (  8%)           (  4%)


   3.4  Annotation of high-priority organisms


   


   



4.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50    2287             1001-1100     1141
                 51- 100    8563             1101-1200      812
                101- 150   12707             1201-1300      573
                151- 200   11465             1301-1400      387
                201- 250   11626             1401-1500      312
                251- 300   10218             1501-1600      217
                301- 350   10234             1601-1700      168
                351- 400    9944             1701-1800      120
                401- 450    7585             1801-1900      129
                451- 500    6631             1901-2000      108
                501- 550    5184             2001-2100       59
                551- 600    3439             2101-2200       97
                601- 650    2786             2201-2300       99
                651- 700    2031             2301-2400       57
                701- 750    1777             2401-2500       56
                751- 800    1486             >2500          330
                801- 850    1122
                851- 900    1163
                901- 950     830
                951-1000     712

   


   The average sequence length in Swiss-Prot is 367 amino acids.

   The shortest sequence is  GRWM_HUMAN (P01157):     3 amino acids.
   The longest sequence is   SNE1_HUMAN (Q8NF91):  8797 amino acids.


5.  JOURNAL CITATIONS

   Note: the following citation statistics reflect the number of distinct
         journal citations.

   Total number of journals cited in this release of Swiss-Prot: 1325


   5.1 Table of the frequency of journal citations

        Journals cited 1x:  498
                       2x:  167
                       3x:   89
                       4x:   60
                       5x:   49
                       6x:   41
                       7x:   26
                       8x:   27
                       9x:   21
                      10x:   10
                  11- 20x:  102
                  21- 50x:   99
                  51-100x:   40
                    >100x:   96


   5.2  List of the most cited journals in Swiss-Prot

   Nb    Citations   Journal name
   --    ---------   -------------------------------------------------------------
    1         9201   Journal of Biological Chemistry
    2         5039   Proceedings of the National Academy of Sciences of the U.S.A.
    3         3641   Nucleic Acids Research
    4         3631   Journal of Bacteriology
    5         3401   Gene
    6         2675   FEBS Letters
    7         2614   Biochemical and Biophysical Research Communications
    8         2436   European Journal of Biochemistry
    9         2393   Biochemistry
   10         2180   The EMBO Journal
   11         2056   Nature
   12         2033   Biochimica et Biophysica Acta
   13         1829   Journal of Molecular Biology
   14         1764   Genomics
   15         1598   Cell
   16         1560   Molecular and Cellular Biology
   17         1257   Biochemical Journal
   18         1167   Science
   19         1124   Plant Molecular Biology
   20         1118   Molecular and General Genetics
   21         1069   Molecular Microbiology
   22          858   Journal of Biochemistry
   23          832   Virology
   24          754   Human Molecular Genetics
   25          702   Journal of Cell Biology
   26          652   Nature Genetics
   27          600   Journal of Virology
   28          593   Plant Physiology
   29          590   Genes and Development
   30          582   Human Mutation
   31          554   Oncogene
   32          542   The American Journal of Human Genetics
   33          531   Infection and Immunity
   34          530   Yeast
   35          520   Journal of Immunology
   36          498   Journal of General Virology
   37          474   Archives of Biochemistry and Biophysics
   38          456   Structure
   39          447   FEMS Microbiology Letters
   40          433   Microbiology
   41          408   Development
   42          383   Human Genetics
   43          380   Nature Structural Biology
   44          377   Current Genetics
   45          352   Genetics
   46          350   Molecular and Biochemical Parasitology
   47          336   Blood
   48          318   Applied and Environmental Microbiology
   49          314   Journal of Clinical Investigation
   50          303   Molecular Endocrinology
   51          284   DNA and Cell Biology
   52          284   Mammalian Genome
   53          283   Journal of Molecular Evolution
   54          282   Protein Science
   55          279   Developmental Biology
   56          271   Biological Chemistry Hoppe-Seyler
   57          256   Cancer Research
   58          249   Journal of Experimental Medicine
   59          248   Neuron
   60          244   Immunogenetics
   61          244   Mechanisms of Development
   62          231   Endocrinology
   63          229   Journal of General Microbiology
   64          223   DNA Sequence
   65          219   Acta Crystallographica, Section D
   66          215   The Plant Cell
   67          213   Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
   68          212   Molecular Biology of the Cell
   69          210   Journal of Cell Science
   70          193   Molecular Biology and Evolution
   71          192   Brain Research. Molecular Brain Research
   72          189   The Plant Journal
   73          185   Journal of Neurochemistry
   74          183   Journal of Neuroscience
   75          161   Comparative Biochemistry and Physiology
   76          160   Cytogenetics and Cell Genetics
   77          156   DNA
   78          155   The Journal of Clinical Endocrinology and Metabolism
   79          155   Bioscience, Biotechnology, and Biochemistry
   80          150   Molecular Pharmacology
   81          145   Toxicon
   82          144   Antimicrobial Agents and Chemotherapy
   83          141   American Journal of Physiology
   84          131   Biochimie
   85          127   Bioorganicheskaia Khimiia
   86          125   Proteins
   87          125   Virus Research
   88          124   DNA Research
   89          123   Molecular Plant-Microbe Interactions
   90          119   Hemoglobin
   91          117   Peptides
   92          115   Current Biology
   93          114   Agricultural and Biological Chemistry
   94          112   Journal of Investigative Dermatology
   95          111   Molecular and Cellular Endocrinology
   96          106   Genome Research
   97          100   Molecular Cell


6.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some Swiss-Prot lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                     236510              1.90
   Journal                          202609    114005    1.63
   Submitted to EMBL/GenBank/DDBJ    31371     26413    0.25
   Unpublished observations            535       531   <0.01
   Plant Gene Register                 468       458   <0.01
   Submitted to Swiss-Prot             468       466   <0.01
   Book citation                       460       450   <0.01
   Thesis                              192       190   <0.01
   Submitted to other databases        190       189   <0.01
   Unpublished results                 123       121   <0.01
   Patent                               92        91   <0.01
   Worm Breeder's Gazette                2         2   <0.01

Comments (CC)                       414220              3.33
   SIMILARITY                       120602    105431    0.97
   FUNCTION                          78373     77063    0.63
   SUBCELLULAR LOCATION              56267     56267    0.45
   CATALYTIC ACTIVITY                40450     38045    0.32
   SUBUNIT                           34492     34492    0.28
   PATHWAY                           18129     17598    0.15
   TISSUE SPECIFICITY                13929     13929    0.11
   COFACTOR                          12482     12482    0.10
   MISCELLANEOUS                      7944      7312    0.06
   PTM                                7243      6648    0.06
   ALTERNATIVE PRODUCTS               4037      4037    0.03
   DOMAIN                             3613      3320    0.03
   INDUCTION                          3606      3606    0.03
   CAUTION                            3444      3261    0.03
   DEVELOPMENTAL STAGE                3395      3395    0.03
   DISEASE                            2265      1877    0.02
   ENZYME REGULATION                  1766      1766    0.01
   MASS SPECTROMETRY                   896       813    0.01
   DATABASE                            842       775    0.01
   POLYMORPHISM                        348       339   <0.01
   BIOTECHNOLOGY                        50        50   <0.01
   PHARMACEUTICAL                       47        47   <0.01

Features (FT)                       664824              5.34
   DOMAIN                            97486     29473    0.78
   TRANSMEM                          78072     17192    0.63
   CONFLICT                          48130     16912    0.39
   CARBOHYD                          45883     11232    0.37
   DISULFID                          42140     10973    0.34
   TURN                              39177      2956    0.31
   METAL                             38095     10271    0.31
   STRAND                            36304      2644    0.29
   HELIX                             27742      2845    0.22
   ACT_SITE                          24809     15515    0.20
   VARIANT                           23819      4458    0.19
   CHAIN                             23600     19302    0.19
   REPEAT                            22771      3776    0.18
   NP_BIND                           15790     11115    0.13
   SIGNAL                            14915     14913    0.12
   MOD_RES                           13433      7558    0.11
   NON_TER                           10331      7882    0.08
   BINDING                            8202      6348    0.07
   ZN_FING                            7875      2806    0.06
   VARSPLIC                           7174      3336    0.06
   SITE                               6444      4425    0.05
   INIT_MET                           5628      5591    0.05
   PROPEP                             4748      4059    0.04
   MUTAGEN                            4485      1386    0.04
   DNA_BIND                           4284      4030    0.03
   CA_BIND                            4051      1150    0.03
   LIPID                              2955      2400    0.02
   TRANSIT                            2617      2596    0.02
   PEPTIDE                            2534      1017    0.02
   NON_CONS                            806       413    0.01
   UNSURE                              290       123   <0.01
   SE_CYS                              116        78   <0.01
   THIOETH                              94        32   <0.01
   THIOLEST                             24        24   <0.01

Cross-references (DR)              1039217              8.35
   EMBL                             234705    118155    1.89
   InterPro                         197565    105320    1.59
   Pfam                             134362    100588    1.08
   PROSITE                          107203     67869    0.86
   PIR                               47040     35734    0.38
   PRINTS                            39696     35082    0.32
   SMART                             39012     29665    0.31
   HSSP                              38193     38193    0.31
   GO                                36337     12037    0.29
   TIGRFAMs                          31733     29373    0.25
   ProDom                            30371     29067    0.24
   HAMAP                             25218     25122    0.20
   PDB                               11817      3567    0.09
   TIGR                              11289     11243    0.09
   MIM                                8235      7136    0.07
   Genew                              7947      7899    0.06
   MGD                                5909      5894    0.05
   SGD                                4937      4883    0.04
   EcoGene                            4228      4226    0.03
   MEROPS                             3325      3231    0.03
   TRANSFAC                           2474      2224    0.02
   WormPep                            2466      2277    0.02
   SubtiList                          2366      2365    0.02
   FlyBase                            2275      2212    0.02
   GeneDB_SPombe                      2145      2115    0.02
   TubercuList                        1404      1367    0.01
   StyGene                            1213      1210    0.01
   SWISS-2DPAGE                        810       809    0.01
   ListiList                           728       673    0.01
   Leproma                             590       586   <0.01
   Gramene                             414       412   <0.01
   MaizeDB                             405       401   <0.01
   HIV                                 370       354   <0.01
   REBASE                              358       353   <0.01
   ECO2DBASE                           351       299   <0.01
   DictyDb                             319       316   <0.01
   GlycoSuiteDB                        259       259   <0.01
   ZFIN                                229       229   <0.01
   PHCI-2DPAGE                         211       211   <0.01
   MypuList                            135       135   <0.01
   Aarhus/Ghent-2DPAGE                 128        98   <0.01
   Siena-2DPAGE                        104       104   <0.01
   HSC-2DPAGE                           85        85   <0.01
   PhosSite                             53        53   <0.01
   SagaList                             52        52   <0.01
   COMPLUYEAST-2DPAGE                   50        50   <0.01
   PMMA-2DPAGE                          47        47   <0.01
   Maize-2DPAGE                         39        39   <0.01
   ANU-2DPAGE                           15        15   <0.01


7.  MISCELLANEOUS STATISTICS

Total number of distinct authors cited in Swiss-Prot: 165729

Total number of entries encoded on a chloroplast: 3185
Total number of entries encoded on a mitochondrion: 2386
Total number of entries encoded on a cyanelle: 145
Total number of entries encoded on a plasmid: 2643

Number of fragments: 8009
Number of additional sequences encoded on splice variants: 5845


ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
Hosted by NCSC USMirror sites:Canada China Korea Switzerland Taiwan