Friday, August 27, 2010

[TECH] Combinatorial counting under symmetries with Polya's Theorem

Recently I have started looking at the problem of enumerating Graphs. During my journey I was thrilled at the beauty of Polya's theorem. After applying the theorem on several examples I felt that I should write an article which explains the context and need for Polya's theorem.

This is the first blog I'm writing after taking a break over the summer. I had few things which had happened to me recently. I'm going to talk about them some other time as they are more philosophical.

Friday, April 09, 2010

[TECH] Super-fast tokenizing of sequence word patterns

C++ code colored by C++2HTML
#!/bin/perl 
#
# AHO-CORASICK-TOKENIZER on words, will also work on any delemiter.
# INPUT(1): Set of patterns (sequence of words) 'P'
# INPUT(2): Text 'T' 
# OUTPUT: report all the occuring patterns -- we don't even 
# want to know their locations. 
#
# NOTE: We don't want to worry much about the time required to 
# construct the failure function. All we need is to improve the
# scan speed.
#
# PATTERN FILE FORMAT:
# <annotation> $$$ pattern
#
# vamsik@engr.uconn.edu 04/08/2010
#

($#ARGV == 1) or die("USAGE: perl <name> pattern_file text_file");

open(PATTERN_FILE, $ARGV[0]) or die($!);

open(TEXT_FILE, $ARGV[1]) or die($!);
%AC_TREE;
%PATTERN_HASH;

$ANNOT_DELIM = ";;a;;";
$FAILED_DELIM = ";;f;;"; 
$HEIGHT_DELIM = ";;h;;"; #to move within the text

sub BuildKwordTree{

# root of the Aho-Corasick tree #
	my @params = @_;
	my $ac_root = $params[0];

	my $PFILE = $params[1];
	my ($line, $pword_list, $pattern, $curr_node, $pword);

	
	while(<$PFILE>){
		$line = $_; chomp($line);

		if($line =~ /([^\$]+)\$\$\$ (.*)/){
			$annotation = $1;

			$pattern = $2;
# split the pattern into words delimited by any #
# non-newline character. #
			@pword_list = split(/\s+/, $pattern);

			$curr_node = $ac_root; 
			foreach $pword (@pword_list){
				${$curr_node}{$pword} = 
					{} unless exists ${$curr_node}{$pword};

				$curr_node = ${$curr_node}{$pword};
			}
# put-the annotation into the last-node of the key-word tree 
			${$curr_node}{$ANNOT_DELIM} = $annotation;
		}
	}
}

#
# Build the failure function level-by level
#
#
sub BuildFailureFunction{
	my @params = @_;
	my $ac_root = $params[0];

	my ($k, @bfs_list, $keyword); 
#
# Incremental algorithm; For level-1 its the $ac_root itself  
#
	${$ac_root}{$FAILED_DELIM} = $ac_root;

	${$ac_root}{$HEIGHT_DELIM} = 0;
	foreach $k (keys %$ac_root){

		if(($k eq $FAILED_DELIM) or ($k eq $ANNOT_DELIM)
			or ($k eq $HEIGHT_DELIM)){

			next;
		}
		${${$ac_root}{$k}}{$FAILED_DELIM} =  $ac_root;

		${${$ac_root}{$k}}{$HEIGHT_DELIM} = 1;
		push(@bfs_list, ${$ac_root}{$k});
	}

	
	while($#bfs_list >= 0){
		$nref = shift(@bfs_list); #node

		foreach $keyword (keys %$nref){

			if(($keyword eq $FAILED_DELIM) or 
					($keyword eq $ANNOT_DELIM) or 
					($keyword eq $HEIGHT_DELIM)){

				next;
			}

			$nnref = ${$nref}{$keyword};

			$fpref = ${$nref}{$FAILED_DELIM}; 
			while(!(exists ${$fpref}{$keyword})

				and $fpref != $ac_root){
				$fpref = ${$fpref}{$FAILED_DELIM};
			}

			if(exists ${$fpref}{$keyword}){
				${$nnref}{$FAILED_DELIM} = ${$fpref}{$keyword};
			}else{

				${$nnref}{$FAILED_DELIM} = $ac_root; 
			}
			${$nnref}{$HEIGHT_DELIM} = ${$nref}{$HEIGHT_DELIM}+1;

			push(@bfs_list, $nnref);
		}
	}
}
#
# Reads word delimited text from the stream or a file
#
sub MatchWordDelimText{
	my @params = @_;

	my $ac_root = $params[0];
	my $stream = $params[1];

	my ($curr_state, $annot, $sword, $c, $height);

	my $line;

	while(<$stream>){
		$line = $_; chomp($line);

		$curr_state = $ac_root;
		@swords = split(/\s+/, $_);

		$c = 0; 
		while($c <= $#swords){

			$sword = $swords[$c];
# Change the state of the automaton by reading the input 
#			print "I:$sword C:$c 1:$curr_state ";
			if(exists ${$curr_state}{$sword}){

				$curr_state = ${$curr_state}{$sword};
				$c++;
			}else{

				if($curr_state == $ac_root){
					$c++;
				}
				$curr_state = ${$curr_state}{$FAILED_DELIM};
			}

#			print "2:$curr_state \n";
			if(exists ${$curr_state}{$ANNOT_DELIM}){
				$annot = ${$curr_state}{$ANNOT_DELIM};

				print "$annot \n";
			}
		}
	}
}

sub StressTestMatchWordDelimText{
	my @params = @_;

	my $ac_root = $params[0];
	my $stream = $params[1];

	my ($curr_state, $annot, $sword, $c, $height);

	my $line;
	my ($stress_me, $ss);
	$line = <$stream>;

	print "ABSTRACT: $line\n";
	chomp($line);
	$stress_me = 1000000;

	for($s=0; $s< $stress_me; $s++){
		if($s%100000 == 0){

			print "iteration $s \n";
		}
		$curr_state = $ac_root;
		@swords = split(/\s+/, $_);

		$c = 0; 
		while($c <= $#swords){

			$sword = $swords[$c];
# Change the state of the automaton by reading the input 
#			print "I:$sword C:$c 1:$curr_state ";
			if(exists ${$curr_state}{$sword}){

				$curr_state = ${$curr_state}{$sword};
				$c++;
			}else{

				if($curr_state == $ac_root){
					$c++;
				}
				$curr_state = ${$curr_state}{$FAILED_DELIM};
			}

#			print "2:$curr_state \n";
			if(exists ${$curr_state}{$ANNOT_DELIM}){
				$annot = ${$curr_state}{$ANNOT_DELIM};

#				print "$annot \n";
			}
		}
	}
}

#
# Just to see how the tree looks -- for viewing pleasure :)
#
sub PrintKwordTree{
	my @params = @_;

	my $ac_root = $params[0]; 
	my $level = $params[1];

	my ($k, $aref, $annot, $j, $nref, $fref, $height);

	foreach $k (keys %$ac_root){
		if(($k eq $ANNOT_DELIM) 
			or ($k eq $FAILED_DELIM) or ($k eq $HEIGHT_DELIM)){

			next;
		}
		for($j=0; $j<$level; $j++){

			print " ";
		}
		$fref = ${${$ac_root}{$k}}{$FAILED_DELIM};

		$nref = ${$ac_root}{$k};
		$height = ${$nref}{$HEIGHT_DELIM};

		print "-$k-> N=$nref F=$fref h=$height\n";
		PrintKwordTree(${$ac_root}{$k}, $level+1);
	}

	

	if(exists ${$ac_root}{$ANNOT_DELIM}){
		$annot = ${$ac_root}{$ANNOT_DELIM};

		for($j=0; $j<$level; $j++){
			print " ";
		}

		print "; $annot ;\n\n";
		return;
	}
}
#
# A simple unit-test
#
sub main{
	BuildKwordTree(\%AC_TREE, PATTERN_FILE);

#	PrintKwordTree(\%AC_TREE, 0);
	BuildFailureFunction(\%AC_TREE);
#	PrintKwordTree(\%AC_TREE, 0);
# Now build the failure function.
	print "please enter the words\n";
#	MatchWordDelimText(\%AC_TREE, STDIN);
	StressTestMatchWordDelimText(\%AC_TREE, TEXT_FILE);

	print "\n....THANK YOU......\n";
}
main();

Thursday, March 11, 2010

[PHIL] My best lines in "The zen and art of motor cycle maintenance"

We often read a lot. On the other hand we also tend to forget most of what we read. However the essence/flavor/curx what ever you might call may still persist. Some time if were asked to recall about something -- say a novel for example. Then some very distinctively marked paragraphs would normally on top of your head. So whenever I think about "The zen and the art of motorcycle maintenance" the following few paragraphs are always on my mind. You can read the full book here

In time...six months; five years, perhaps...a change could easily begin to take place. He would become less and less satisfied with a kind of dumb, day-to-day shopwork. His creative intelligence, stifled by too much theory and too many grades in college, would now become reawakened by the boredom of the shop. Thousands of hours of frustrating mechanical problems would have made him more interested in machine design. He would like to design machinery himself. He’d think he could do a better job. He would try modifying a few engines, meet with success, look for more success, but feel blocked because he didn’t have the theoretical information. He would discover that when before he felt stupid because of his lack of interest in theoretical information, he’d now find a brand of theoretical information which he’d have a lot of respect for, namely, mechanical engineering.

So he would come back to our degreeless and gradeless school, but with a difference. He’d no longer be a grade-motivated person. He’d be a knowledge-motivated person. He would need no external pushing to learn. His push would come from inside. He’d be a free man. He wouldn’t need a lot of discipline to shape him up. In fact, if the instructors assigned him were slacking on the job he would be likely to shape them up by asking rude questions. He’d be there to learn something, would be paying to learn something and they’d better come up with it.

Motivation of this sort, once it catches hold, is a ferocious force, and in the gradeless, degreeless institution where our student would find himself, he wouldn’t stop with rote engineering information. Physics and mathematics were going to come within his sphere of interest because he’d see he needed them. Metallurgy and electrical engineering would come up for attention. And, in the process of intellectual maturing that these abstract studies gave him, he would he likely to branch out into other theoretical areas that weren’t directly related to machines but had become a part of a newer larger goal. This larger goal wouldn’t be the imitation of education in Universities today, glossed over and concealed by grades and degrees that give the appearance of something happening when, in fact, almost nothing is going on. It would be the real thing.

Wednesday, March 10, 2010

[TECH] Tutte matrix and Sufficiency of Perfect Matching.

I know that this is a standard results. But a hands on proof is really useful because it makes things more clear. I knew the fact that any permutation could be broken into disjoint cycles. However I did not know how the sign of the permutation changes if we swap the elements with in a disjoint cycle. It turns out that some of the fundamental properties of the signs of the permutation can be used to prove the sufficiency of a perfect matching in graph. A Tutte Matrix is defined as follows.

A_{ij} = \begin{cases} x_{ij}\;\;\mbox{if}\;(i,j) \in E \mbox{ and } i<j\\
-x_{ji}\;\;\mbox{if}\;(i,j) \in E \mbox{ and } i>j\\
0\;\;\;\;\mbox{otherwise} \end{cases}

Given a Tutte matrix we now prove the sufficiency of perfect matching

Determinant of Tutte matrix is as follows $ det(T) = \Sigma_{\pi \in S_n} (-1)^{sgn(\pi)}\Pi_{i=1}^{n} t_{i,\pi(i)}$. We know that every permutation can be expressed as disjoint cycles. For example permutation $ \{5, 1, 4, 3, 2\}$ can be expressed as two cycles $ \{(1\rightarrow 5 \rightarrow 2 \rightarrow 1) , (3 \rightarrow 4 \rightarrow 3) \}$. Also $ sgn(\pi)$ is the number of swaps we need do required to convert $ \{1, 2, \ldots n\}$ to the permutation $ \pi$. We observe the following important properties which help solve the problem.
  • PROPERTY-1 If $ c_i = (t_1\rightarrow t_2 \rightarrow t_3 \ldots t_n\rightarrow t_1)$ is a cycle in $ \pi$ then if we replace $ c_i$ with reverse cycle $ c'_i = (t_n\rightarrow t_1\rightarrow t_2 \rightarrow \ldots t_n)$ then $ sgn(\pi)$ remains the same - because we did not change the number of swaps required but just changed the order of swaps. For instance, let $ \pi=\{5, 1, 4, 3, 2\}$ then cycle $ c_1 = \{(1\rightarrow 5 \rightarrow 2 \rightarrow 1)\}$ now we replace it with $ c'_1 =\{2\rightarrow 5 \rightarrow 1 \rightarrow 2\}$. This results in $ \pi'=\{2, 5, 4, 3, 1\}$.
  • PROPERTY-2 Let $ c_i$ is a cycle of odd length in $ \pi$ and let $ \pi'$ be the permutation by replacing $ c_i$ with its reverse cycle $ c'_i$. Then the corresponding terms in the determinant for $ c_i$ and $ c'_i$ are $ \Pi_{\pi(i)\in c_i} t_{i,\pi(i)}$ and $ \Pi_{\pi(i)\in c_i} t_{\pi(i),i}$. For example in the previous example $ c_1 = \{(1\rightarrow 5 \rightarrow 2 \rightarrow 1)\}$ so the corresponding term in the determinant for $ c_1$ is $ t_{1,5}t_{5,2}t_{2,1}$. On the other hand the reverse cycle $ c'_1 \{2 \rightarrow 5 \rightarrow 1 \rightarrow 2\}$ has the following term $ t_{2,5}t_{5,1}t_{1,2}$in the determinant.
  • PROPERTY-3 If all the cycles in the permutation $ \pi$ are even and corresponding term in the determinant (i.e. $ \Pi_{i=1}^{n} t_{i,\pi(i)}$) is non-zero then its the set $ \{(1,\pi(1)), (2,\pi(2))\ldots
(n,\pi(n))\}$ is a valid perfect match in the graph. This is because by the definition of Tutte matrix $ t_{u,v}$ represents an edge in the graph and the permutation $ \pi$ is giving a subset of $ n$ edges since the product is non-zero, on the other than this sub-set of edges is a match.

"IF $ G$ has a perfect match THEN $ det(T)$ has at least one non-zero term"

Let $ M$ be the perfect match in $ G$ then we construct a permutation $ \pi(u)=v, \pi(v)=u$ where $ (u,v)\in M$, this permutation on the other hand has even cycles - because we created a cycle for every pair. Now we apply PROPERTY-3 and the term corresponding to $ \pi$ in $ det(T)$ is non-zero.

IF $ det(T)=0$ THEN $ G$ has no perfect matching

Now consider the case when $ G$ does not have no perfect matching. Then by PROPERTY-3 all the terms corresponding permutations with even cycles must be zero - otherwise our assumption leads to contradiction. Now consider the permutations with at least one odd cycle, let $ \pi$ be that permutation and $ c_i$ be the odd cycle. Now we replace $ c_i$ with its reverse cycle $ c'_i$ and create the permutation $ \pi'$. By PROPERTY-1 $ sgn(\pi) = sgn(\pi')$. Now we exploit the skew-symmetry of the Tutte matrix. By PROPERTY-2 the terms corresponding to $ \pi$, $ \pi'$ in the determinant are as follows. $ \Pi_{\pi(k)\in c_i} t_{k,\pi(k)}$, $ \Pi_{\pi(k)\in c_i} t_{\pi(k),k}$. But by skew-symmetry $ t_{\pi(k),k} = - t_{k,\pi(k)}$, on the other hand since the cycle length is odd this gives that the actual terms corresponding to $ \pi$ and $ \pi'$ are $ (-1)^{sgn(\pi)}\Pi_{i=1}^{n}t_{i,\pi(i)}$ and $ -(-1)^{sgn(\pi')}\Pi_{i=1}^{n}t_{i,\pi(i)}$. This means all the terms corresponding to permutations with at least one odd cycle can be grouped and pairs such as $ \pi$ and $ \pi'$ be cancelled. So the only contribution for the $ det(T)$ comes from permutations will all even cycles - in this case we have none, so $ det(T)=0$.