All posts by Wei-Yin Chen

Taking apart a door under the deck

I was cleaning up all the broken glass from a failed glass cutting experiment on a junk door under my deck that was unwisely put there.

The door had a lot of broken glass still on it and I had to try to dismantle the part of the door where there is a lot of glass trapped.

I used the 35,000 rpm Dreml tool plus a jig saw. I wish I had a reciprocating saw, but these two tools got a lot of the job done. If I can find a reciprocating saw with a metal cutting tool, then I can take all sorts of things apart. BWAHAHA!

Table of Contents

250 Million Year-Old Bacteria Shows Intraspecies Molecular “clock” has stopped

Steganography in Biology

RNA Antennas from nucleus to cytoplasm

Protein structure prediction via sequence comparisons

non coding DNA may create RNA computers and factories in nucleus

Ribosome Assembly Takes Place Partly Inside Cell

linc RNA serves as cellular air-traffic controllers

3mers from Synthetic Genomes vs. 3mers from Real Genomes as Viewed in Skittle

Test Download DNA Polyconstraint Viewer Mockup

Short and Simple Introduction to 3mers and DFTs in Engineering and DNA

Exons in Eukaryotes Identified by Improved 3mer Detector

Mission of Bio Languages Website

Protein structure prediction via sequence comparisons

This isn’t the most clear article, but it shows many mysteries can be unlocked by comparing proteins and DNA between species. It suggests biology is optimized for scientific discovery and reverse engineering.

At the moment, we have a large list of archaea to analyze, but have switched priorities due to some extremely exciting new ideas regarding protein function prediction based on machine learning techniques (which sounds AI-cool, but is more statistics-cool) which we have developed in house, and on revised proteome data for Mouse and Human.
We have decided to re-run this new mouse and human data through our domain prediction pipeline and send results to the grid in order to get the best possible protein structure data. With improvements and updates to our pre- and post-processing methods and increased sampling on the grid (we’re now folding 100,000 structures per domain, up from 30,000!), we will be able to approach the problem of protein structure prediction in a novel and potentially game-changing way with the best data available.

250 million year-old bacteria shows INTRASpecies molecular “clock” has stopped

Paradox of Ancient Bacterium

The isolation of microorganisms from ancient materials and the verification that they are as old as the materials from which they were isolated continue to be areas of controversy. Almost without exception, bacteria isolated from ancient material have proven to closely resemble modern bacteria at both morphological and molecular levels. This fact has historically been used by critics to argue that these isolates are not ancient but are modern contaminants introduced either naturally after formation of the surrounding material (for further details, see Hazen and Roeder 2001<$REFLINK> and the reply by Powers, Vreeland, and Rosenzweig 2001<$REFLINK> ) or because of flaws in the methodology of sample isolation (reviewed recently in Vreeland and Rosenzweig 2002<$REFLINK> ). Such criticism has been addressed experimentally by the development of highly rigorous protocols for sample selection, surface sterilization, and contamination detection and control procedures. Using the most scrupulous and well-documented sampling procedures and contamination-protection techniques reported to date, Vreeland, Rosenzweig, and Powers (2000)<$REFLINK> reported the isolation of a sporeforming bacterium, Bacillus strain 2-9-3, from a brine inclusion within a halite crystal recovered from the 250-Myr-old Permian Salado Formation in Carlsbad, NM.

The evidence presented here clearly indicates that isolate 2-9-3 should be considered a strain of S. marismortui under the established standards of 16S rRNA systematics, which state that isolates sharing >97% identity should be considered as the same species (Stackebrandt and Goebel 1994<$REFLINK> ). But does such a close relationship to modern bacteria mean that isolate 2-9-3 is itself modern? The answer to this question must be sought by resolving what appears to be an increasingly common paradox. We have a large set of rigorous geological and microbiological data which can be interpreted in favor of the antiquity of these organisms, and an equally large set of rigorously obtained molecular data which can be interpreted in favor of their modernity. As it stands, our present molecular work can neither confirm nor disprove the age of isolate 2-9-3.

Steganography in biology


Finally, we come to the research theme that I find most intriguing. Steganography, if you look in the dictionary, is an archaism that was subsequently replaced by the term “cryptography.” Steganography literally means “covered writing.” With the rise of digital computing, however, the term has taken on a new life. Steganography belongs to the field of digital data embedding technologies (DDET), which also include information hiding, steganalysis, watermarking, embedded data extraction, and digital data forensics. Steganography seeks efficient (that is, high data rate) and robust (that is, insensitive to common distortions) algorithms that can embed a high volume of hidden message bits within a cover message (typically imagery, video, or audio) without their presence being detected. Conversely, steganalysis seeks statistical tests that will detect the presence of steganography in a cover message.

Consider now the following possibility: What if organisms instantiate designs that have no functional significance but that nonetheless give biological investigators insight into functional aspects of organisms. Such second-order designs would serve essentially as an “operating manual,” of no use to the organism as such but of use to scientists investigating the organism. Granted, this is a speculative possibility, but there are some preliminary results from the bioinformatics literature that bear it out in relation to the protein-folding problem (such second-order designs appear to be embedded not in a single genome but in a database of homologous genomes from related organisms).

While it makes perfect sense for a designer to throw in an “operating manual” (much as automobile manufacturers include operating manuals with the cars they make), this possibility makes no sense for blind material mechanisms, which cannot anticipate scientific investigators. Research in this area would consist in constructing statistical tests to detect such second-order designs (in other words, steganalysis). Should such second order designs be discovered, the next step would be to seek algorithms for embedding these second-order designs in the organisms. My suspicion is that biological systems do steganography much better than we, and that steganographers will learn a thing or two from biology, though not because natural selection is so clever, but because the designer of these systems is so adept at steganography.

Such second-order steganography would, in my view, provide decisive confirmation for ID. Yet even if it doesn’t pan out, first-order steganography (i.e., the embedding of functional information useful to the organism rather than to a scientific investigator) could also provide strong evidence for ID. For years now evolutionary biologists have told us that the bulk of genomes is junk and that this is due to the sloppiness of the evolutionary process. That is now changing. For instance, Amy Pasquenelli at UCSD, in commenting on long stretches of seemingly barren DNA sequences, asks us to reconsider the contents of such junk DNA sequences in the light of recent reports that a new class of non-coding RNA genes are scattered, perhaps densely, throughout these animal genomes. (microRNAs: Deviants no Longer. Trends in Genetics 18(4) (4 April 2002): 171-3.) ID theorists should be at the forefront in unpacking the information contained within biological systems. If these systems are designed, we can expect the information to be densely packed and multi-layered (save where natural forces have attenuated the information). Dense, multi-layered embedding of information is a prediction of ID.

It’s time to bring this talk to an end. I close with two images (both from biology) and a final quote. The images describe two perspectives on how the scientific debate over intelligent design is likely to play out in the coming years. From the vantage of the scientific establishment, intelligent design is in the position of a mouse trying to move an elephant by nibbling at its toes. From time to time the elephant may shift its feet, but nothing like real movement or a fundamental change is about to happen. Let me emphasize that this is the perspective of the scientific establishment. Yet even adopting this perspective, the scientific establishment seems strangely uncomfortable. The mouse has yet to be squashed, and the elephant (as in the cartoons) has become frightened and seems ready to stampede in a panic.

The image that I think more accurately captures how the debate will play out is, ironically, an evolutionary competition where two organisms vie to dominate an ecological niche (think of mammals displacing the dinosaurs). At some point, one of the organisms gains a crucial advantage. This enables it to outcompete the other. The one thrives, the other dwindles. However wrong Darwin might have been about selection and competition being the driving force behind biological evolution, these factors certainly play a crucial role in scientific progress. It’s up to ID proponents to demonstrate a few incontrovertible instances where design is uniquely fruitful for biology. Scientists without an inordinate attachment to Darwinian evolution (and there are many, though this fact is not widely advertised) will be only too happy to shift their allegiance if they think that intelligent design is where the interesting problems in biology lie.

Bill Dembski

non coding DNA may create RNA computers and factories in nucleus

I’ve posted some examples of transcribed but untranslated RNAs role in the cell. It is reasonable then to speculate large amounts of non-coding DNAs create RNAs within the cell that perform computation and manufacturing. The amount of computational requirements needed to implement multicellular architecture may be greatly underestimated. If multi cellular creatures require large amount of computation for manufacturing and maintenance, then RNAs may be an ideal molecule to perform this given recent discoveries.

Computing with RNA
Devices that self-assemble from biological molecules could represent the future of drug delivery.

By Duncan Graham-Rowe on October 17, 2008 . Scientists in California have created molecular computers that are able to self-assemble out of strips of RNA within living cells. Eventually, such computers could be programmed to manipulate biological functions within the cell, executing different tasks under different conditions. One application could be smart drug delivery systems, says Christina Smolke, who carried out the research with Maung Nyan Win and whose results are published in the latest issue of Science.

The use of biomolecules to perform computations was first demonstrated by the University of Southern California’s Leonard Adleman in 1994, and the approach was later developed by Ehud Shapiro of the Weizmann Institute of Science, in Rehovot, Israel. But according to Shapiro, “What this new work shows for the first time is the ability to detect the presence or absence of molecules within the cell.”

That opens up the possibility of computing devices that can respond to specific conditions within the cell, he says. For example, it may be possible to develop drug delivery systems that target cancer cells from within by sensing genes used to regulate cell growth and death. “You can program it to release the drug when the conditions are just right, at the right time and in the right place,” Shapiro says.

Ribosome assembly takes place partly inside cell nucleus

Many of the transcribed but untranslated RNAs in the nuclear region participate in the manufacture of ribosomes. One such RNA is the snoRNA which is associated with the snoRNP. Wikipedia describes snoRNP

Small nucleolar RNAs (snoRNAs) are a class of small RNA molecules that primarily guide chemical modifications of other RNAs, mainly ribosomal RNAs, transfer RNAs and small nuclear RNAs. There are two main classes of snoRNA, the C/D box snoRNAs, which are associated with methylation, and the H/ACA box snoRNAs, which are associated with pseudouridylation. SnoRNAs are commonly referred to as guide RNAs but should not be confused with the guide RNAs that direct RNA editing in trypanosomes.

Here is a description of snoRNAs and snoRNPs

These snoRNPs are involved in the assembly of ribosomes inside the nucleus.

After transcription of the pre-ribosomal RNAs, most steps in eukaryotic ribosome synthesis occur within the nucleolus. Here, the pre-rRNAs are processed to yield the mature rRNA species (Fig. 3), which also undergo extensive covalent modification. In bacteria, rRNA modifications are made by conventional enzymes, but in eukaryotes most modification involves methylation of the sugar 2′ hydroxyl group (2′-O-methylation) or pseudouridine (psi) formation, which occur at sites that are selected by base pairing with a host of SMALL NUCLEOLAR RIBONUCLEOPROTEIN (snoRNP) particles55. Human cells contain over 100 species of snoRNP, and each pre-rRNA molecule must transiently associate with a member of each species. During pre-rRNA transcription and processing, many of the 80 or so ribosomal proteins assemble onto the mature rRNA regions of the pre-RNA. Many mutations known to inhibit ribosome synthesis in yeast are believed to act mainly at the level of ribosome assembly, but this process is poorly characterized.

linc RNA serves as cellular Air-Traffic Controllers

BOSTON – Earlier this year, a scientific team from Beth Israel Deaconess Medical Center (BIDMC) and the Broad Institute identified a class of RNA genes known as large intervening non-coding RNAs or “lincRNAs,” a discovery that has pushed the field forward in understanding the roles of these molecules in many biological processes, including stem cell pluripotency, cell cycle regulation, and the innate immune response.
But even as one question was being answered, another was close on its heels: What, exactly, were these mysterious molecules doing?

They now appear to have found an important clue. Described in the July 14 issue of the Proceedings of the National Academy of Sciences (PNAS) the scientific team from BIDMC and the Broad Institute shows that lincRNAs – once dismissed as “genomic junk” – have a global role in genome regulation, ferrying proteins to assist their regulation at specific regions of the genome.

“I like to think of them as genetic air traffic controllers,” explains co-senior author John Rinn, PhD, a Harvard Medical School Assistant Professor of Pathology at BIDMC and Associate Member of the Broad Institute. “It has long been a mystery as to how widely expressed proteins shape the fate of cells. How does the same protein know to regulate one genomic location in a brain cell and regulate a different genomic region in a liver cell? Our study suggests that in the same way that air traffic controllers organize planes in the air, lincRNAs may be organizing key chromatin complexes in the cell.”