Mining Significant Geographical Association Rules from Uncertain Data — ASN Events

Mining Significant Geographical Association Rules from Uncertain Data (15247)

Anshu Zhang 1 , Wenzhong Shi 1
  1. The Department of Land Surveying and Geo-Informatics, The Hong Kong Polytechnic University, Hong Kong

Geographical association rules are implicit patterns showing antecedent-consequent associations between entities or data fields in geographical databases. Mining geographical association rules in geographical information systems (GIS) has become a powerful tool for decision support. Meanwhile, the reliability of resultant rules remains a highly concerned and challenging issue. 

This study presents a new approach for mining geographical association rules with improved reliability. Improvements are made in controlling two prevalent and critical threats to the quality of the resultant rules: the numerous spurious rules included in the result, and inevitable errors in GIS data. First, to filter the spurious rules, a statistical significance test is applied to the rules. Only statistically significant rules will be included in the final result. This method has been successful in general association rule mining. For the first time, this study systematically adapts the method to and interprets the result within geographical contexts. Second, to reduce the impact of source data error on the statistical test, the study develops an original mathematical model for data error propagation in the test, and a corresponding method to correct the test for the errors.

The new approach was first experimented on computer synthesized data, so as to conclusively evaluate the correctness of resultant rules. The resultant rule set turned out to contain many fake rules without the statistical test. After the test was applied, the entire result only had less than 5% risk of containing any spurious rules. However, when the data had error, the test caused considerable loss of true rules as well. More than 40% of the lost true rules were recovered using the original test correction method.

Experiments were also conducted on real-world GIS data about land uses and socioeconomics. The results about pruning fake rules and recovering true rules were similar to the synthetic data experiments. Most fake rules conveyed false links between certain land uses and socioeconomic changes. If not pruned, these rules could mislead user to harmful false decisions. The test correction method recovered some most meaningful rules involving land use changes. Such rules were very sensitive and almost completely lost due to data errors. In sum, the new approach significantly increased the value of resultant rules for making correct land use related decisions.