Expressed
Sequence
Tag.
A
copy of a short
segment of
RNA (or its
reverse complement), usually towards its
3' end. ESTs may be produced a
sequenced
cheaply and
rapidly, by reading them once. An EST is around 400
bp long, and in the absence of
alternative splicing is supposed to contain enough information to
identify the RNA molecule from which it originated.
A public database of ESTs, dbEST, contains > 2 * 106 ESTs.
Problems with this naïve approach:
- Alternative splicing is more common than was thought.
- Sequence quality deteriorates along the EST; sometimes the dependable bases are few.
- It is not known how many types of RNA exist.
- ESTs may also come from hnRNA (instead of the intended mRNA), and may well be from a non-coding region.
At work, I work on Compugen's LEADS project, which aims (among other things) to reconstruct the original RNA from given ESTs. This is nontrivial.