Archive for the ‘genetics’ Category

Protein alignments, ART, and buggy Nexus Files

July 12, 2008

Here are some notes about a strange problem with ART and Nexus files

If you attempt to run ART, and see an error message like this:

ART is veriyfing that your tree file corresponds with your sequence file. .

-> ART says: some of the taxa in your treefile do not exist in your sequence file.

Missing taxa:
SpADH2 KlADH2 SkADH4 ScADH3 KmADH2 KLADH4 KwADH3 SkADH3 SpADH5 SbADH2 SbADH5 SpADH1 SkADH2 PsADH2 KlADH3 KwADH4 SbADH3 PsADH1 KmADH1 ScADH2 SpADH3 ScADH5 KlADH1 SkADH1 KwADH1 SbADh1 ScADH1

Error!
ART was unable to import your data.  See previous errors.

The problem is that ALL of your taxa are not found in your sequence file.  Obviously, you might feel like you’re going crazy, because your alignment looks fine.

SOLUTION: In your Nexus alignment, find a line which looks like this:

FORMAT DATATYPE=PROTEIN  SYMBOLS = ” 1 2 3 4″  MISSING=? GAP=- ;

Remove the “SYMBOLS” parameter.  Apparently, this parameter trips-up the BioNexus parser.

FORMAT DATATYPE=PROTEIN MISSING=? GAP=- ;

. . . and hopefully that should solve the problem!

“DNA in a tigh squeeze”

October 16, 2007

Today Rob Phillips (http://www.rpgroup.caltech.edu/) gave a talk titled, “DNA in a tight squeeze: the other life of a macromolecular assembly.”

Phillips et al. study the atomic-level physics of DNA.  Their recent work focuses on DNA loops which are formed when transcription factors bind to regulatory sites.  Specifically, they modified the lac operon such that they can insert sequences of arbitrary length and content between the Oid and O1 sites.   A surprising result is that DNA is happiest to make loops of 75.5 base pair lengths, whereas the persistence length for DNA is 115 base pairs.  I think this result has ramifications for our understanding of the fitness of regulatory regions. I would like to see an experiment which explores evolutionary fitness with regard to loop length between cis-regulatory sites.

Phillips also showed results from his study correlating the osmotic pressure inside a viral capsid to the speed of genetic ejection.  His results show that viruses eject genetic material (into their host cell) very quickly at first.  However, as the inter-viral osmotic pressure declines, the rate of ejection also drops.  Eventually, the pressure reaches a point where no genetic material is ejected at all.  I have several questions about the content of the genetic material which may or may not be ejected.  Does it transcribe into a functional protein?  Or, does this “tail” material contain garbage which can safely be left inside the virus?

I find this research exciting because it intersects physics, chemistry, biology, statistics, and computer science.

(Finally: check-out the VIPER project for atomic-level exploration of viral structures) 

ART version 1.4

August 6, 2007

A new version of ART (an ancestral reconstruction tool) is available for release. This software package is a labor of love, and lately consumes my “spare” time. I have big plans for ART (including a web interface and visualization tools), but in the meantime this project is stable and functional.

What does ART do?

  • ART wraps CodeML into an easier-to-use and error-safe tool.
  • ART calculates the maximum a posteriori (MAP) ancestor. The MAP concept is a new statistical practice currently being developed in Joe Thornton’s lab.
  • ART manages your phylogenetic reconstruction projects in a SQL database.

Introduction to Computational Proteomics, latest issue of PLoS

August 1, 2007

Here is a TERRIFIC article in the recent issue of PLoS:

“Introduction to Computational Proteomics” by Jacques Colinge and Keiryn L. Bennett.

I like this article because it summarizes a large body of research, and it’s written for a non-CompBio audience. Introductory articles in CompBio are rare. Well-written introductory articles are even rarer. Enjoy.


Follow

Get every new post delivered to your Inbox.