Saturday, January 05, 2008

[TECH] Association rules of by Data Mining (TM Algorithm) on Cancer Data.

I have datamined the Cancer data at http://breastscreening.cancer.gov/rfdataset/ using the TM(Transaction Mapping) and FP-Growth algorithm, this is what I have done to mine the association rules.

1. Randomly partition the data into two parts, I partitioned the data into part1 of size = 148458 records, part2 of size = 153897.
2. Used the part1 (148458 records) and found association rules of support >=0.4 and confidence >=0.4 , I got 72 rules from this.
3. For each of the rule (in step 2) I found the support and confidence of each of the rules in part2, it looks like the support and confidence is close to the support and confidence in training data (part1).


The 72 rules of step1 [
http://www.engr.uconn.edu/~vkk06001/CancerDataMining/rules.txt ]

Support and Confidence of each of this rules in part2
[http://www.engr.uconn.edu/~vkk06001/CancerDataMining/training_result.txt ]

I have made the rules human readable removing all the encoding please
see the rules
[
http://www.engr.uconn.edu/~vkk06001/CancerDataMining/human_readable.txt ]

These are in the following format
==============RULE:1=================
SUP:0.402 ,CONF:0.412,TRAIN_SUP:0.404,TRAIN_CONF:0.414
{
Diagnosis of invasive breast cancer within one year of the index
screening mammogram = no,
}
IMPLIES ===>
{
Diagnosis of invasive or ductal carcinoma in situ breast cancer within
one year of the index screening mammogram = no,
menopaus = postmenopausal or age>=55,
hispanic = no,
}
==============RULE:2=================

SUP indicates support of this rule in part2 , CONF indicates confidence of this
rule in part2, TRAIN_SUP indicates the support of this rule in part1 and
TRAIN_CONF indicates the confidence of this rule in part1.

These rules may not make any sense for me but it might make sense for a cancer doctor. There are several useful perl programs for people who want to do some datamining please feel free to use them http://www.engr.uconn.edu/~vkk06001/CancerDataMining , let me know if you have any questions.

No comments: