I have datamined the Cancer data at http://breastscreening.cancer.gov/rfdataset/ using the TM(Transaction Mapping) and FP-Growth algorithm, this is what I have done to mine the association rules.
1. Randomly partition the data into two parts, I partitioned the
data into part1 of size = 148458 records, part2 of
size = 153897.
2. Used the part1 (148458 records) and found association rules of
support >=0.4 and
confidence >=0.4 , I got 72 rules from this.
3. For each of the rule (in step 2) I found the support and confidence
of each of the rules in
part2, it looks like the support and confidence is close to the
support and confidence in
training data (part1).
The 72 rules of step1 [ http://www.engr.uconn.edu/~vkk06001/CancerDataMining/rules.txt ] Support and Confidence of each of this rules in part2 [http://www.engr.uconn.edu/~vkk06001/CancerDataMining/training_result.txt ] I have made the rules human readable removing all the encoding please see the rules [ http://www.engr.uconn.edu/~vkk06001/CancerDataMining/human_readable.txt ] These are in the following format ==============RULE:1================= SUP:0.402 ,CONF:0.412,TRAIN_SUP:0.404,TRAIN_CONF:0.414 { Diagnosis of invasive breast cancer within one year of the index screening mammogram = no, } IMPLIES ===> { Diagnosis of invasive or ductal carcinoma in situ breast cancer within one year of the index screening mammogram = no, menopaus = postmenopausal or age>=55, hispanic = no, } ==============RULE:2================= SUP indicates support of this rule in part2 , CONF indicates confidence of this rule in part2, TRAIN_SUP indicates the support of this rule in part1 and TRAIN_CONF indicates the confidence of this rule in part1.
These rules may not make any sense for me but it might make sense for a cancer doctor. There are several useful perl programs for people who want to do some datamining please feel free to use them http://www.engr.uconn.edu/~vkk06001/CancerDataMining , let me know if you have any questions.
No comments:
Post a Comment