Monday, December 31, 2007

[TECH] An observation to merge upper convex hulls in O(log(n)) time rather than O(log^2(n)).

The well known divide and conquer algorithm for finding convex hull for a given set of points, needs to merge smaller hulls H_1 and H_2 for doing this merge the algorithm uses lemma 3.2 (in Book Fundamentals of Computer Algorithms). This lemma essentially finds the tangent line (u,v) in O(log^2(n)) time.

We can in fact merge the hulls in more efficient manner, the observation is based on the fact that any point with maximum y-coordinate should definitely be on the hull. Let p1 and p2 be the points in hulls H_1 and H_2 with maximum y-coordinate, then we can safely state that either p1 or p2 will be on the combined hull of H_1 and H_2, we can determine which of these two points lie on the merged hull by comparing the y-coordinates of p1,p2. Among these two the point with maximum y-coordinate will definitely be on the merged hull so one end of the tangent line (u,v) is fixed, the other end of the tangent line can be probed in O(log(n)) time, this basically saves the extra O(log(n)) time which is spent previously in lemma 3.2 for finding the other end of tangent in H_2

Tuesday, December 25, 2007

[TECH] Hanging simulation..

I came across a interesting problem in which the entire simulation was hanging just because of an extra non-sensitive always block , in the verilog code the statement "always dummy_reg = in;" was causing both VCS and NC-VERILOG hang. Theoretically speaking irrespective of what I do in my design the simulation should stop after 500 steps because I have a "#500 $finish" in the initial block of the module "TestHangMux", although I got around this problem by removing the "dummy_reg" totally in the multiplexer, but I still don't understand why the simulation was hanging irrespective of the "#500 $finish" statement, is it because we have multiple non-sensitive "always" statement? the LRM(Language Reference Manual) says that all the "always" blocks execute in parallel just like parallel processors in that that the statement "#500 $finish" should have executed in parallel and stop the simulation but why it hangs?



===================================
module HangMux(in,sel,out);
input [1:0]in;
input sel; output out;
reg [1:0]dummy_reg;
reg out;

always @(sel) begin
    case(sel)
        0: out = dummy_reg[0];
        1: out = dummy_reg[1];
    endcase 
end

always dummy_reg = in;


endmodule

module TestHangMux(out_net);
output out_net;
wire out_net;
reg [1:0]in_reg;
reg sel;

initial begin
 in_reg = 2'b01;
 sel = 0;
#500 $finish;
end

always begin
#10 sel = ~sel;
end

HangMux hang_me (in_reg,sel,out_net);
endmodule
==============================


Friday, December 21, 2007

[TECH] Finding frequent itemsets faster than FP-Growth algorithm

I wanted to implement my prime-number mapping intersection algorithm into the TM(Transaction Mapping) Algorithm, I got the FP-Tree implementation from PERL FP Tree and was trying to test this on a data of around 300,000 records, it works till 10,000 records with support and confidence of 0.8 and 0.9 and fails if the number of records > 20,000. I'm not sure if its a perl bug but the following code which is the cause of the problem seems difficult to understand



===================================================================
@association_rules = map $_->[0], (sort {$b->[1] <=> $a->[1]} 
      (map [$_, $_->confidence], @association_rules));      
===================================================================


I tried to bless the reference $_ but it did'nt work. The following is the error it gives.

[vamsik@abadon Tree-FP-0.04]$ !perl
perl test_vamsi.pl
Setting support
Mining rules...
Can't call method "confidence" on an undefined value at /usr/lib/perl5/site_perl/5.8.8/Tree/FP.pm line 397,  line 20001.

I need to fix this and send it to the FPTree maintainer tomorrow. I guess its really a issue to implement the FPTree in perl because of the speed and time especially when the algorithm need to work on huge data just in my case. A good idea is to re-write FPTree in C and extend TM based on that and compare the results.