Rough Approximations for Incomplete Information*

—Rough set under incomplete information has been extensively studied. Based on valued tolerance relation for incomplete information system, several approaches were presented to dealing with the attribute reductions and rule extraction. We point out some drawbacks in the existing papers for valued tolerance relation based rough approximations and propose a new kind of rough approximation operators which is a generalization of Pawlak approximation operators for complete information system. Some basic properties of the approximation operators are investigated.


INTRODUCTION
The rough set theory (RST), proposed by Pawlak [7], is an effective tool for data analysis.It can be used in information system to describe the dependencies among attributes and evaluate the significance of attributes and derive decision rules.In an information system, each object in the universe is associated with some information that is characterized by a set of attributes.Objects characterized by the same information are indiscernible with the available information about them.Based on the indiscernibility relation, classical rough set theory has been used successfully in attribute reduction of information and decision systems.
In many practical situations, it may happen that the precise values of some of the attributes in an information system are not known, i.e. are missing or known partially.Such a system is called an incomplete information system.In order to deal with incomplete information systems, classical rough sets have been extended to several general models by using other binary relations or covers on the universe [1,[8][9][10]14,16].Based on these extended rough set models, the researchers have put forward several meaningful indiscernibility relations in incomplete information system to characterize the similarity of objects.For instance, Slowinski [11] proposed two different approaches to replace unknown value of attribute by specific subsets of values.Grzymala et al [2,3] performed computational studies on the medical data, where unknown values of attributes were replaced using probabilistic techniques.Kryszkiewicz introduced a kind of indiscernibility relation, called tolerance relation, to handle incomplete information tables [5,6].Stefanowski [12] introduced two generalizations of the rough sets theory to handle the missing value.The first generalization introduces the use of a non symmetric similarity relation in order to formalize the idea of absent value semantics.The second proposal is based on the use of valued tolerance relations.A logical analysis and the computational experiments show that for the valued tolerance approach it is possible to obtain more informative approximations and decision rules than using the approach based on the simple tolerance relation.The tolerance relation has also been generalized to constrained similarity relation and constrained dissymmetrical similarity relation [4,13,15].This paper is devoted to the discussion of valued tolerance relation based rough approximation operators.We pointed out that the lower (upper) approximability presented in [12] is not the generalization of Pawlak approximations.A new kind of lower (upper) approximability is proposed.Some basic properties are analyzed.

II. SIMILARITY RELATION FOR INCOMPLETE INFORMATION TABLE
Rough sets have been introduced by Pawlak [7] as an approach for analyzing vague information.Following Pawlak, an information table is a pair , whereU is a set of objects, A is a set of attributes such that aA  , : Clearly, () IND B is an equivalence relation.Let UB be the family of all the equivalence classes of the equivalence relation () For each XU  , the lower and upper approximation of X are defined by [7]: The rough set is characterized by its lower and upper approximations.

Let ( , ) IT U A 
be an incomplete information table.Kryszkiewicz [5] introduced the notion of tolerance relation.The key point in this approach is to interpret an unknown value of the attribute as similar to all other possible values for this attribute.Such an interpretation corresponds to the idea that such values are just missing, but they do exist.The tolerance relation B T with respect to BA  is defined as [5]: Clearly B T is a reflexive and symmetric relation, but not necessarily transitive.We denote by ( .The lower and upper approximations of XU  are Stefanowski [12] introduced the absent values semantics for incomplete information tables.In this approach it is assumed that objects may be partially described not only because of our imperfect knowledge, but also because it is definitely impossible to describe them on all the attributes.The unknown values are not allowed to compare.Based on this point, the similarity relation B S is defined as: B S is a reflexive and transitive relation, but not necessarily symmetric.Based on B S , the lower and upper approximations of XU  are defined as: In order to characterize incomplete information more precisely, Stefanowski [12] introduced the notion of valued tolerance relation.Let aA  be an attribute and a V the set of its known values.Given an object xU  with () ax  , the probability that () Moreover, if both values are unknown, then the probability that x is similar to y on the attribute is 2 a V  .Thus, the probability ( , ) a R x y for x is similar to y is defined by: . Based on ( , )   B R x y , the B  lower and the B  upper approximability of X by set Z are defined as: where () is the tolerance class of element z , x is the membership degree of element x in the set X ( {0,1} x  ), , TS and I are t-norm, t-conorm and fuzzy implication respectively.
In this model, each subset of U may be a lower or upper approximation of X , but to a different degree which is denote as lower (upper) approximability.
  can be proved similarly.
(2) Let 1 x be the membership degree of element in the set 1 X and 2 x be the membership degree of element in the set 2 X ( 12 , {0,1} xx . By 12 XX  , it follows that   can be proved similarly. (3) By 12

BB 
we have   IND B .We note that the lower (upper) approximability will decrease with the increase of elements in Z .This does not coincide with the basic idea of Pawlak's rough set.In Pawlak rough set model, whether a set is lower www.ijarai.thesai.org(upper) approximation is definite.It does not happen that, the smaller the set, the more possible it is lower (upper) approximation.Actually, ( 5) and ( 6) are based on the observation that, in classical rough set, ( ( ) where B X and B X are lower and upper approximations of X respectively.It is worth noticing that this is a necessary condition but not sufficient.Actually, we have (2) can be proved similarly.
Based on this theorem, we propose the following definition.
a domain of a and a aA VV   .Each subset of attributes BA  defines an indiscernibility relation () IND B as: This work has been supported by the National Natural Science Foundation of China (Grant No. 61473239, 61175044) and The Fundamental Research Funds for the Central Universities of China (Grant No. 2682014ZT28).www.ijarai.thesai.org ) B Tx the tolerance class