Friday, December 21, 2007

[TECH] Finding frequent itemsets faster than FP-Growth algorithm

I wanted to implement my prime-number mapping intersection algorithm into the TM(Transaction Mapping) Algorithm, I got the FP-Tree implementation from PERL FP Tree and was trying to test this on a data of around 300,000 records, it works till 10,000 records with support and confidence of 0.8 and 0.9 and fails if the number of records > 20,000. I'm not sure if its a perl bug but the following code which is the cause of the problem seems difficult to understand



===================================================================
@association_rules = map $_->[0], (sort {$b->[1] <=> $a->[1]} 
      (map [$_, $_->confidence], @association_rules));      
===================================================================


I tried to bless the reference $_ but it did'nt work. The following is the error it gives.

[vamsik@abadon Tree-FP-0.04]$ !perl
perl test_vamsi.pl
Setting support
Mining rules...
Can't call method "confidence" on an undefined value at /usr/lib/perl5/site_perl/5.8.8/Tree/FP.pm line 397,  line 20001.

I need to fix this and send it to the FPTree maintainer tomorrow. I guess its really a issue to implement the FPTree in perl because of the speed and time especially when the algorithm need to work on huge data just in my case. A good idea is to re-write FPTree in C and extend TM based on that and compare the results.

No comments: