Everything2
Near Matches
Ignore Exact
Full Text
Everything2

How many bits are in the human genome?

created by The Alchemist

(idea) by The Alchemist (1.2 wk) (print)   ?   1 C! I like it! Thu Nov 02 2000 at 20:10:37

This question has a simple answer and a complicated one. The calculation starts like this : 3 billion base pairs - each of which can be one of four (A,C,G,T). So biology uses base 4 not base two - the result is around the 6 billion bit mark and comparable to the space on a hard drive.

However, this is where the analogy with computers breaks down. The genome's 'bits' can be further modified by methylation and acetylation. Considering only methyl groups, this means that each base can take on eight, not four, possible forms. Although this would seem to double the amount of information it doesn't work like that. Such meta-data is used for various other purposes, like chromatin assembly. Simply put, this is a little like data archiving or compression - although the reality is much more complex.

Indeed, the amount of information contained in any eukaryotic genome is a difficult question. On the analogous hard drive, how much of the data is information? Some bits represent files, others are empty - but there are files that have been 'deleted' and partially written over. Are these remnants pieces of information or junk which litters the drive. Oddly, the same situation is found on the genome; repetitive elements, old viruses, 'deleted' genes and so on. Nature is much more parsimonious than us and finds a use for everything, even so called junk.

What can you do with junk? Again, the computer is inadequate to illustrate this. DNA might seem like a digital medium, but it does have a definite physical basis. A particular sequence of bases has a definite shape, which can be subtly different from another sequence. This, in turn, might affect how the molecule coils in the local region and perhaps neighbouring ones. More importantly, noncoding bases can be methylated - which allows them to alter chromatin structure.


(idea) by -brazil- (2.5 y) (print)   ?   I like it! Mon May 14 2001 at 10:28:25

The Alchemist used a very good metaphor by comparing the genome with a hard drive, however it goes actually much further: how much of the information actually resides on the disk, and how much of it is in the drive mechanics, electronics and firmware?

Recent research results suggest that a significant part of the information that makes our bodies work the way they do is not contained in the DNA itself, but rather in the metabolism of the cells that contain it, which is is not only necessary to interpret the information, but actually modifies it and offers "added value" (like, for example, the cache in a hard drive).


(idea) by ariels (4 d) (print)   ?   1 C! I like it! Mon May 14 2001 at 14:01:02

(Unlike ariels,) TheAlchemist and -brazil- above say important, interesting things.

I can only offer you this. The human genome is ~3Gbp long, or 6Gbits. Luckily, I have a copy (HGP output!) stashed away somewhere...

% cat ./NCBI/DNAinput*[^x] | gzip -c > /home/ariels/abc.gz
% gzip -l /home/ariels/abc.gz
compressed  uncompr. ratio uncompressed_name
875526472 2940172494  70.2% ../../abc
Which is really pathetic. Even accounting for various housekeeping information we store in the files (apart from the genomic sequence), I'd expect to manage 25% at least (the amount for straightforward 2-bit per bp scrunching).

bzip2 doesn't do much better -- it gives a compressed size of 803381365bytes, for a ratio of ~72.7% -- still less than the minimal decent ratio of 75%!


printable version
chaos

How is the information in DNA modified by metabolism? Penises have higher bandwidth than cable modems Science in the Next Millennium bp
How many genes do we (humans) have? Parsimonious algorithmic information theory My entire genetic makeup can be entered on a single CD-Rom
Insert cockroach #2 and press enter junk DNA Human Genome Project polymorphic base class
central dogma DNA elegant program One day, E2 will attain sentience. And I'll be there to see it.
Using gzip to do computational linguistics Chromatin ASN.1 shotgun sequencing
chromosome non-compressibility of random data Copy Number of Ants in the world vs. Number of Leaves
Y'know, if you log in, you can write something here, or contact authors directly on the site. Create a New User if you don't already have an account.
  Epicenter
Login
Password

password reminder
register

Everything2 Help

Cool Staff Picks
The best nodes of all time:
Pangram
Coase Theorem
Normalization
pumpkin
Yezidi
dialect-specific Chinese characters
Blockbuster
Zardoz
Searle's Chinese room
Classical Music Starter Guide
Bread pudding
Dead cat bounce
space
New Writeups
jjen
Sorrier than I ever thought I would be(personal)
locke baron
Moskva class antisubmarine cruiser(thing)
Wuukiee
May 15, 2008(idea)
locke baron
Kuznetsov class aircraft carrier(thing)
_lesra
for abby(thing)
Adaptive Child
Annie's garden salsa(recipe)
Simulacron3
Zig-Zag(thing)
Ouzo
Special Grilled Cheese(fiction)
Noung
Tiananmen Square Massacre(idea)
aneurin
Lord St Clair(person)
artman2003
Assholes and Douchebags: A Comparison(person)
locke baron
Tyan Thunder K8WE(thing)
locke baron
Udaloy class destroyer(thing)
Scaevola
Same-sex marriage(idea)
SteveMurrayFromNZ
British Standard Handful(idea)
Everything 2 is brought to you by the letter C and The Everything Development Company