There are 994 sequences for which at least one PFAM (or PFAM B) domain can be found through association by BLAST.
These 994 sequences have a total of 3110 domains according to Pfam, just over three domains per protein.
How you calculate the percentage of broken domains depends on your point of view.
Here we have provided as many calculations as possible.
The 3110 domains are split by trimming the N-terminal, trimming the C-terminal, deletion or insertions on 485 occasions. That
computes to 16.1% of the domains. Of the 994 sequences with Pfam domains, 42.5% (423 sequences) have at least one broken domain.
53 sequences have two broken domains and in three sequences three of the domains have been split.
However, many of the sequences are fragments of other, larger sequences. There is
weak evidence for their extension. In the absence of further evidence, these sequences
should be treated not as separate, whole sequences, but as if they are identical to
the larger sequences. When these sequence fragments are removed only 304 sequences
contain split domains. This is is 39.5% of the sequences with Pfam domains because
there are 772 different sequences wth Pfam domains (not 994) in this case.
When only those isoforms that vary in sequence are considered, (identical
sequences and loci with just a single representative are ignored) this percentage rises to 46.5% (271 sequences with broken domains out of 584 non-identical sequence variants).
Another way to calculate the percentage of sequences with broken domains would be
to look at just those sequences that are splice variants of the primary sequence. There are 210 loci with at least one splice variant that can be considered as having a
distinct sequence from the primary sequence. Discounting the primary sequence in each
locus, there are 261 sequences with broken domains (in 10 of the loci with sequence
different splice variants all the sequences - incuding the primary sequences - split a domain) and a total of 373
non-identical splice variants. So 70% of the sequence different splice variants
recorded in the 1% of the human genome we have looked at split at least one
Pfam domain.
Of course, there are caveats. In 30 of the 304 cases of domain splitting the insertion or deletion is 4 residues or less. Some of the domains are Pfam B domains, so less
defined and many of these split
domains do not have equivalent PDB structures so it is impossible to know whether it would make sense to split the PFAM-defined domain in these case.
However, there are examples of domains being split where we do have structural
information.
Separate work using comparisons with the nearest PDB structure suggests that at least 50% of these splits will not allow the domain to refold as in the PDB structural template.