My Journey Towards Emptiness!!: 2009

Thursday, December 24, 2009

[TECH] An interesting problem on the mailing list [gdb@gnu.org]

Some one asked the following question of GDB mailing list.

Hi, 

  I was wondering how come the call to "call cos(.2)" returns the wrong
answer in:

#include "math.h"
double    (*fcn)(double);
main() 
{
  double t;
  fcn = cos;
  t=cos(.2);
  t=cos(-.2);
}

(gdb) call cos(.2)
$16 = 104
(gdb) call fcn(.2)
$17 = 0.98006657784124163

Thanks,
-- 
View this message in context:  http://old.nabble.com/Why-%22call-cos%28.2%29%22-returns-garbage--tp26909516p26909516.html 
Sent from the Gnu - gdb - General mailing list archive at Nabble.com.

Here is my analysis.

Hello,

Looks like 'gdb' is missing debug information about the function 'cos' , 
and the debugger implicitly assumes all the functions return an 'int' ,
 if the function is not compiled in debug mode.

I derive my explanation from the following test case.

1. Create a function 'double test_cos(double)' and compile it in optimized mode with -O3 using gcc
 "$gcc -c -O3 test1.c"

2. Complie the file containing the 'main' function in debug mode
"$gcc -g test.c test1.o -lm"

============[gdb a.out]====================

(gdb) list 0
1       #include "math.h"
2       #include "test1.h"
3       double    (*fcn)(double);
4       main()
5       {
6                double t;
7                 fcn = cos;
8                  t=cos(.2);
9                   t=cos(-.2);
10                      test_cos(.2);
(gdb) call fcn(0.2)
$4 = 0.98006657784124163
(gdb) call cos(0.2)
$5 = -1073792992
(gdb) call test_cos(0.2)
$6 = -1073792992
=====================================

============[test1.h]=================
#ifndef _test_1_h
#define _test_1_h
double test_cos(double);
#endif
====================================

============[test1.c]==================
#include "math.h"
#include "test1.h"
double test_cos(double a){
        return cos(a);
}
====================================

============[test.c]==================
#include "math.h"
#include "test1.h"

double    (*fcn)(double);
main()
{
         double t;
          fcn = cos;
           t=cos(.2);
            t=cos(-.2);
                test_cos(.2);
}
===================================

May be the developers of gdb can you more details.

also
=======================================
(gdb) call cos
$6 = {text variable, no debug info} 0xc325d0 {cos}
=======================================

Wednesday, December 23, 2009

[TECH] Manipulating DNA sequences with bits

The alphabet of DNA sequences consists of 4 alphabets {A,T,G,C} and they exists in compliments -- A-T, G-C. Often in sequence assembly algorithms we need to squeeze these into bits for saving space since we have too many of them. For example a typical input to a sequence assembler consists of at least 10 million reads (string of fixed length, say 21). In my last post I briefly described the Sequence Assembly (SA) problem. As I pointed one of the basic operations on these reads would be to obtain a reverse compliment of a given read. For example our read is ATTACCAG , then its reverse compliment would be CTGGTAAT -- reverse the original sequence replace every symbol by its compliment (A-T, G-C). The following routines may be useful when we squeeze these symbols into bits using only 2bits per symbol. While using the bit-wise operators I realized that == operator has more precedence than & (bit-wise or). So if you a&b==b then that will always be true, so we need to be more careful use (a&b)==b.

#define MASK_A 3
#define MASK_T 0
#define MASK_G 1
#define MASK_C 2
/*Lets fix the maximum size of the read as 32 base pairs*/
typedef unsigned long long kmer_t;


/*Takes a read of length K and returns its bit represenation*/
kmer_t CreateBKmer(char *kmer, unsigned long K){
    unsigned long i;
    unsigned char bitc;
    /*A binary k-mer*/
    kmer_t bkmer = (kmer_t) 0; 
    for(i=0; i<K; i++){
        bkmer <<= 2; 
        bkmer |= Char2Bit(kmer[i]);
    }
    return bkmer;
}

/*Return the reverse compliment of the kmer_a*/
kmer_t ReverseCompliment(kmer_t a, unsigned int K){
    kmer_t rc=0;
    unsigned int i=0;
    for(i=0; i<K; i++){
        rc <<=2;
        rc |= (3-(a&3));
        a >>=2;
    }
    return rc;
}
/*Print the ASCII equivalent of the underlying bits*/
void PrintKmer(kmer_t a, unsigned int K, FILE *ptr){
    char ntide;
    kmer_t mask_a = MASK_A;
    kmer_t mask_t = MASK_T;
    kmer_t mask_g = MASK_G;
    kmer_t mask_c = MASK_C;

    mask_a <<= 2*(K-1); mask_t <<= 2*(K-1); 
    mask_g <<= 2*(K-1); mask_c <<= 2*(K-1);

    do{
        if((a&mask_a) == mask_a){
            ntide = 'A';
        }else if((a&mask_c) == mask_c){
            ntide = 'C';
        }else if((a&mask_g) == mask_g){
            ntide = 'G';
        }else{ /*This should be the last because T=0*/
            ntide = 'T';
        }
        fprintf(ptr, "%c", ntide);
        mask_a >>=2; mask_t >>=2;
        mask_g >>=2; mask_c >>=2;
    }while(mask_a); /*MASK_A = 3 */
    fprintf(ptr, "\n");
}

inline char Bit2Char(unsigned char a){
    switch(a){
        case MASK_A:
            return 'A';
        case MASK_G: 
            return 'G'; 
        case MASK_C:
            return 'C';
        case MASK_T:
            return 'T';
    }
}

inline unsigned char Char2Bit(char c){
    switch(c){
        case 'A':
            return MASK_A;
        case 'T':
            return MASK_T; 
        case 'G':
            return MASK_G;
        case 'C':
            return MASK_C; 
    }
}

Friday, December 18, 2009

[TECH] Loops in Bi-directed De Bruijn Graphs

De Bruijn graphs recently found applications in Sequence Assembly(

problem is close to Shortest Common super String(

) problem, however there are some fundamental differences. Given a set of input strings $S=\{s_1,s_2,s_3\ldots s_n\}$ strings the

problem outputs a string

such that every

is a sub-string in

and $\nexists s': length(s') < length(s^*)$ . On the other hand the input to the

problem is a set $R=\{r_1,r_2,r_3\ldots r_n\}$ of strings, every

is a randomly chosen sub-string of a bigger string

- called the genome. Unfortunately we do not know what

is, so given

the

problem asks to find the

such that we can explain every

. Adding to the complexity of

problem

may contain several repeating regions and the direct appliction of

to solve

problem would result in collapsing the repeating regions in

- so we cannot directly reduce the

problem to

. Also the string

is special string coming from the

so its double stranded and each

may be originating from the forward or reverse compliments of

. $\includegraphics[scale=0.5]{loops_debrujin.eps}$

A De Bruijn graph on a set of strings is a graph whose vertices correspond to the unique mers (a string of length ) from all the $r_i \in R$ and whose edges correspond to a some mer in some $r_j \in R$ . Also to consider the double stranded-ness we also include a reverse compliment of every mer into the graph . In the following simple example I show how a loop (i.e. both the forward and reverse compliments overlapping each other by ) can originate into . This interesting situation seems to be referred to as hairpin-loop.

Saturday, December 12, 2009

[TECH] Simplified Proof For The Application Of Freivalds' Technique to Verify Matrix Multiplication

We first give a simple and alternative proof for Theorem-

in [Randomized Algorithms, Motwani R. and Raghavan P., pp 162--163]. Later in Theorem 2 we show that the assumption on the uniformness is not necessary.
$\begin{theorem} Let $A$,$B$\ and $C$\ be three $n\times n$\ matrices such that $... ...tribution. Then $P[AB\vec{r} = C\vec{r} \vert AB \neq C] \leq 1/2$ \end{theorem}$

Proof. Let

be a $n\times n$ matrix and $\vec{x_1}, \vec{x_2}\ldots \vec{x_n}$ be the column vectors of

. Then $X\vec{r} = \sum_{i=1}^{n}r_i\vec{x_1}$ . This means that multiplying a vector with a matrix is linear combination of the columns, the coefficient

is the $i^{th}$ component of $\vec{r}$ . Since $\vec{r}$ is a boolean and

acts as an indicator variable on the selection of column $\vec{x_i}$ . So if $\vec{r}$ is chosen from a uniform distribution

Now let and $\vec{d_1},\vec{d_2}\ldots \vec{d_n}$ be the column vectors of , similarly let $\vec{c_1},\vec{c_2}\ldots \vec{c_n}$ be the column vectors of . Let $Y =\{\vec{d_j} \vert \vec{d_j}\neq \vec{c_j}$ , clearly $\vert Y\vert \geq 1$ since $C\neq D$ . Then $P[AB\vec{r} = C\vec{r} \vert AB \neq C] = \displaystyle\Pi_{\vec{d_i}\notin Y}P[r_i] = (1/2)^{n-\vert Y\vert} \leq 1/2$ since $1 \leq \vert Y\vert\leq n-1$ . Intuitively this means we select our random vector $\vec{r}$ such that for all $d_i \in Y$ , such a selection will always ensure $AB\vec{r} = C\vec{r}$ even though $AB\neq C$ . $\qedsymbol$

$\begin{theorem} Let $A$,$B$\ and $C$\ be three $n \times n$\ matrices. Let $\vec... ...$f(r)$\ is an arbitrary probability density/distribution function. \end{theorem}$

Proof. Continuing with the proof of Theorem- 1 , $P[AB\vec{r} = C\vec{r}\vert AB\neq C] = \displaystyle\Pi_{\vec{d_i}\notin Y} P[r=r_i] \leq f(r)$ . $\qedsymbol$

$\begin{corollary} There always exists an $\Theta(n^2)$\ time Monte Carlo algorit... ...creasing error probability, for the problem to check if $AB=C$. \end{corollary}$

My Journey Towards Emptiness!!