My Journey Towards Emptiness!!: [TECH] Association rules of by Data Mining (TM Algorithm) on Cancer Data.

Saturday, January 05, 2008

[TECH] Association rules of by Data Mining (TM Algorithm) on Cancer Data.

I have datamined the Cancer data at http://breastscreening.cancer.gov/rfdataset/ using the TM(Transaction Mapping) and FP-Growth algorithm, this is what I have done to mine the association rules.

1. Randomly partition the data into two parts, I partitioned the data into part1 of size = 148458 records, part2 of size = 153897.
2. Used the part1 (148458 records) and found association rules of support >=0.4 and confidence >=0.4 , I got 72 rules from this.
3. For each of the rule (in step 2) I found the support and confidence of each of the rules in part2, it looks like the support and confidence is close to the support and confidence in training data (part1).


The 72 rules of step1 [
http://www.engr.uconn.edu/~vkk06001/CancerDataMining/rules.txt ]

Support and Confidence of each of this rules in part2
[http://www.engr.uconn.edu/~vkk06001/CancerDataMining/training_result.txt ]

I have made the rules human readable removing all the encoding please
see the rules
[
http://www.engr.uconn.edu/~vkk06001/CancerDataMining/human_readable.txt ]

These are in the following format
==============RULE:1=================
SUP:0.402 ,CONF:0.412,TRAIN_SUP:0.404,TRAIN_CONF:0.414
{
Diagnosis of invasive breast cancer within one year of the index
screening mammogram = no,
}
IMPLIES ===>
{
Diagnosis of invasive or ductal carcinoma in situ breast cancer within
one year of the index screening mammogram = no,
menopaus = postmenopausal or age>=55,
hispanic = no,
}
==============RULE:2=================

SUP indicates support of this rule in part2 , CONF indicates confidence of this
rule in part2, TRAIN_SUP indicates the support of this rule in part1 and
TRAIN_CONF indicates the confidence of this rule in part1.

These rules may not make any sense for me but it might make sense for a cancer doctor. There are several useful perl programs for people who want to do some datamining please feel free to use them http://www.engr.uconn.edu/~vkk06001/CancerDataMining , let me know if you have any questions.

No comments:

Post a Comment

My Quotes....

"Success is going from failure to failure without the loss of enthusiasm........"

".....greatest work is done only when there is no selfish motive to prompt it....." --Swami Vivekananda (Complete Works).

"....He works best who works without any motive, neither for money, nor for fame, nor for
anything else; and when a man can do that, he will be a Buddha, and out of him will come the power to work
in such a manner as will transform the world. This man represents the very highest ideal of
Karma-Yoga." -- Vivekananda about Buddha (Complete works Vol:1)