De Bruijn graphs recently found applications in
Sequence Assembly(

).

problem is close to
Shortest Common super String(

) problem, however
there are some fundamental differences. Given a set of input strings

strings the

problem outputs a string

such that every

is a sub-string in

and

. On the other hand the input to the

problem
is a set

of strings, every

is a randomly chosen
sub-string of a bigger string

- called the
genome. Unfortunately we do not
know what

is, so given

the

problem asks to find the

such that we can
explain every

. Adding to the complexity of

problem

may contain several
repeating regions and the direct appliction of

to solve

problem would
result in collapsing the repeating regions in

- so we cannot directly reduce
the

problem to

. Also the string

is special string coming from the

so its double stranded and each

may be originating from the forward or
reverse compliments of

.
A
De Bruijn graph
on a set of strings
is a graph whose vertices
correspond to the unique
mers (a string of length
) from all the
and whose edges correspond to a some
mer in some
. Also to consider
the double stranded-ness we also include a reverse compliment of every
mer into
the graph
. In the following simple example I show how a loop (i.e. both
the forward and reverse compliments overlapping each other by
) can originate
into
. This interesting situation seems to be referred to as hairpin-loop.