Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.
Digital Object Identifier (DOI) : 10.14569/IJACSA.2013.041128
Article Published in International Journal of Advanced Computer Science and Applications(IJACSA), Volume 4 Issue 11, 2013.
Abstract: Computing statistical dependence of terms in textual documents is a widely studied subject and a core problem in many areas of science. This study focuses on such a problem and explores the techniques of estimation using the expected mutual information measure. A general framework is established for tackling a variety of estimations: (i) general forms of estimation functions are introduced; (ii) a set of constraints for the estimation functions is discussed; (iii) general forms of probability distributions are defined; (iv) general forms of the measures for calculating mutual information of terms (MIT) are formalised; (v) properties of the MIT measures are studied and, (vi) relations between the MIT measures are revealed. Four estimation methods, as examples, are proposed and mathematical meanings of the individual methods are respectively interpreted. The methods may be directly applied to practical problems for computing dependence values of individual term pairs. Due to its generality, our method is applicable to various areas, involving statistical semantic analysis of textual data.
D. Cai and T.L. McCluskey, “A General Framework of Generating Estimation Functions for Computing the Mutual Information of Terms” International Journal of Advanced Computer Science and Applications(IJACSA), 4(11), 2013. http://dx.doi.org/10.14569/IJACSA.2013.041128