Friday, December 18, 2009

[TECH] Loops in Bi-directed De Bruijn Graphs

De Bruijn graphs recently found applications in Sequence Assembly($ SA$). $ SA$ problem is close to Shortest Common super String($ SCS$) problem, however there are some fundamental differences. Given a set of input strings $ S=\{s_1,s_2,s_3\ldots s_n\}$ strings the $ SCS$ problem outputs a string $ s^*$ such that every $ s_i$ is a sub-string in $ s^*$ and $ \nexists s': length(s') < length(s^*)$. On the other hand the input to the $ SA$ problem is a set $ R=\{r_1,r_2,r_3\ldots r_n\}$ of strings, every $ r_i$ is a randomly chosen sub-string of a bigger string $ G$ - called the genome. Unfortunately we do not know what $ G$ is, so given $ R$ the $ SA$ problem asks to find the $ G$ such that we can explain every $ r_i$. Adding to the complexity of $ SA$ problem $ G$ may contain several repeating regions and the direct appliction of $ SCS$ to solve $ SA$ problem would result in collapsing the repeating regions in $ G$ - so we cannot directly reduce the $ SA$ problem to $ SCS$. Also the string $ G$ is special string coming from the $ DNA$ so its double stranded and each $ r_i$ may be originating from the forward or reverse compliments of $ G$. \includegraphics[scale=0.5]{loops_debrujin.eps}

A $ k-$De Bruijn graph $ B^k(R)$ on a set of strings $ R$ is a graph whose vertices correspond to the unique $ k-$mers (a string of length $ k$) from all the $ r_i \in R$ and whose edges correspond to a some $ k+1-$mer in some $ r_j \in R$. Also to consider the double stranded-ness we also include a reverse compliment of every $ k-$mer into the graph $ B^k(R)$. In the following simple example I show how a loop (i.e. both the forward and reverse compliments overlapping each other by $ k-1$) can originate into $ B^k(R)$. This interesting situation seems to be referred to as hairpin-loop.

No comments: