This video is exciting:
Pavzner and his team address the challenge of determining a protein’s sequence, given mass spectrometry data. This problem is challenging because every protein undergoes translational modifications, and therefore mass-specs actually measure proteins which deviate (slightly) from their non-translated nucleotide sequences.
An unsolved challenge is to correctly infer a protein’s sequence, given mass-spec data. However, we CAN compare mass-specs against a database of existing (and known) mass-specs in order to guess the sequences. This process is computationally expensive because it requires a linear search of the database for every queried mass-spec search key.
Pavzner present a more efficient algorithm for “guessing” a protein’s sequence, given its mass-spectrum. His technique involves constructing a network of spectral data, and then using that network as the basis for a search. This is remarkably faster than traditional database searches. Pavzner, et al, apply this technique to whole-genome spectrometric data, and yield promising results.
My obscure notes:
17:00 – Why is the signal-to-noise ratio reduced “six-fold” ?
18:01 – Look-up reference for “anti-symmetric pass approach” to solving an alignment between two sequences of unequal length. Can we use this storage technique for other information domains: phylogenetic trees? Electroencephalographic data?
28:07 – The use of “snake venom” makes any science project sound cool.



