A Mathematical Model for Comparing Memory Storage of Three Interval-Based Parametric Temporal Database Models

Interval-Based Parametric Temporal Database Model (IBPTDM) captures the historical changes of database object in single tuple. Such data model violates 1NF and it is difficult to be implemented on top of conventional Database Management Systems (DBMS). The reason behind that, IBPTDM cannot directly use relational storage structure or query evaluation technique that depends on atomic attribute values as well as it is unfixed attribute size. 1NF model with its features can be used to solve such challenge. Modeling timevarying data in 1NF model raise a question about memory storage efficiency and ease of use. A novel approach for representing temporal data in 1NF model and compare it with other main approaches in literature is the main goal of this research. To this end, a mathematical model for comparing a three different storage models is demonstrated to illustrate that the proposed model is more efficient than other approaches under certain conditions. The simulation results showed that the proposed model overcomes the needless redundancy of data, achieves saving in memory storage, and it is easy to be implemented in relational data model or to be adapted with a production systems that need to track temporal aspects of functioning database Systems. Keywords—Valid-time data model; N1NF; tnterval-based timestamping; temporal data model; 1NF


INTRODUCTION
Modeling temporal database is considered a vital and highly demanding problem.That is why varieties of techniques have been proposed to address this problem from different viewpoints [1]- [3].Modeling temporal database in relational framework differs in many dimensions [4]- [11].The most frequently stated approaches are tuple timestamping with First Normal Form (1NF), and attribute timestamping with Non-First Normal Form (N1NF).Based on the timestamp of the data, the first approach (1NF) has two distinctions namely, Tuple Timestamping Single Relation (TTSR), and Tuple Timestamping Multiple Relations (TTMR).
Models under TTSR approach are discussed by [1], [4], [5], [12]- [14].An example of some of these temporal data models are LEGOL 2.0 by Jones [15], Temporally Oriented Data Model by Ariav [16], HSQL by Sarda [17], and TQuel by Snodgrass [18].TTSR approach introduces redundancy, where attribute values that change at different time are repeated in multiple tuples.Furthermore, Steiner in [19] stated that, the main disadvantage of this approach is that the fact about a real world entity is spread over several tuples, where each tuple represents a state during a certain time period in the real world.
Models under TTMR approach have solved the problem of data redundancy in TTSR by decomposing the temporal relation as follows: time-varying attributes are distributed over multiple relations, and time-invariant attributes are gathered into separates relation.Many temporal data model discussed in literature are categorized under this approach [5].An example of some of these temporal data models are Temporal Relational Model by Navathe and Ahmed [20], Snodgrass [18], Tansel [10], and Kvet [14].The data models under this approach need a variation of join -known as temporal intersection join-that is used for combining the information for an object.Temporal intersection join is generally expensive to be implemented.
The second approach (N1NF) violates the atomicity of single data representations and based on the timestamp, the data can be timestamped in the level of tuple or in the level of attributes [5], [8]- [10].An example of this approach is the parametric temporal data model that is based on attributestimestamping and that uses a temporal element as a timestamp [21].The bitemporal conceptual data model (BCDM) is another example of such approach that forms the basis for the temporal structured query language (TSQL) proposed by Jensen [5].BCDM is based on tuple-timestamping and it uses interval-based timestamping [2].
Due to the needless redundancy of data in TTSR approach, expensive implementation in TTMR approach, and the implementation difficulty in parametric temporal data model in www.ijacsa.thesai.orgtop of relational data model, a new approach to model, implements, and query TDB in relational framework is proposed.The proposed approach is referenced as Tuple Timestamp Historical Relation (TTHR) [22].This temporal data model (TTHR) is based on a tuple timestamping for the lifespan time of database objects, and it is also based on attributes timestamping for the historical valid time changes of time varying attributes.TTHR is in 1NF and it is an extension and reducible of Snodgrass (Tquel) temporal data model [14,19].TTHR mimics the features of TTSR and TTMR as well as the most common temporal database models discussed in literature.
Storage efficiency of temporal database systems has a direct impact to the system performance; therefore, in this study we will compare the three approaches (TTSR, TTMR, and TTHR) in terms of memory storage point of view.To measure the storage costs, we will establish mathematical model (formulas) for the three approaches.It can give us a reasonable judgment to determine whether TTHR is suitable for the implementation of the parametric temporal data model in top of conventional DBMS.Throughout our investigations into storage efficiency, we will show that the TTHR approach is comparable to the other approaches and it is even better under certain conditions.A similar study for calculating the efficiency of memory storage has been done by Atya in [2], this study compared the Snodgrass model (which is under TTMR-based approach in our study) with Tansel N1NF relational nested model.A study by Noh in [23]   interval end point and are the minimum and maximum boundary of the interval, both belong to the interval.Intervals can be defined as open, half-closed, or closed.In this study, Left-Right bounded (closed) representation for periods of validity is considered.Intervals can be compared to show their relative positions using Allen's interval logic [24], [25].Fig. 1 shows Emp relation which represents the historical changes of employees' time-varying data with Address, Tel_no, Supervssn, Dno (department number), Salary, and Rank.SSN, name, and Birth_date is considered as time invariant attributes.It can be shown in Fig. 1 that the information about database object is modeled in one tuple and each time-varying attribute is timestamped by one or more time interval.An example of time intervals is [0 4] and [5 10] which timestamp the valid change time of address of employee Nashwan.The single time instance can be represented in interval-based, as an example time instance 10 has [10 10] representation.The three data model (TTSR, TTMR, and TTHR) are based on schema extension approach of conventional relational data model and it can be implemented in conventional RDBMS.

III. TEMPORAL DATA REPRESENTATIONS IN THREE APPROACHES
In this section, the Emp relation shown in Fig. 1 is going to be mapped to TTSR approach, TTMR, and TTHR approach (proposed model), respectively.

A. TTSR data representation approach
In

B. TTMR data representation approach
In TTMR, the relations are represented by: snapshot relation ( , , , ) , for each time varying attribute there are separate relations for the lifespan time that are all in 1NF relations [3], [18], [21], [27], [28].The relations in Fig. 3 show the representation of Emp relation in Fig. 1 using TTMR representation.

C. TTHR data representation approach
In TTHR the general representation of T R (temporal relational schema) is accomplished as two relations namely,   As shown in Fig. 4,  is equivalent to Updated_V that stores the old value of the updated time-varying attributes.

_
Att index attribute stores the position of time-varying attributes location in the main relation Employees, such that the domain ( _ Att index ) = {0,3, 4,5, 6, 7,8} , where 0 is used to index the object's lifespan time and 3, 4, 5, 6, 7,and 8 are used to index the time-varying attributes as shown in Fig. 4. Granularity of chronon is assumed one month for both lifespan time and valid time.Integers are used as timestamp components that can be thought as dates, for example the integer 7 represents the date of 'April 2012'.

IV. DISCUSSION OF MEMORY STORAGE COSTS
In this section, we will formalize the storage costs of the three different approaches for representing the interval-based temporal data models in relational framework.The notations uses in this study are given in Table 1.Let T R be a temporal relational schema with an arbitrary set of attributes , where these attributes can be classified into 4 groups: key attributes, Time-invariant attributes (Unchangeable), Time-varying attributes (Changeable), and Timestamps (temporal) attributes.These groups can be represented by K, U, C, and T respectively.
Thus the schema of temporal relation can be redefined as , where The cost of different attribute types are defined as:   Definition 2: The update frequency of time-varying attributes in a period of time is calculated as:    A function to be defined on the subset attributes  , where and return the size of the attributes group in byte.

  z Cost
The cost of a tuple(row) z in relation instance t r is the summation of the cost of all subsets attributes equals to as stated in (1), ( 2), ( 3) and ( 4) of time  is calculated as: www.ijacsa.thesai.org An update in any C A requires the insertion of a new row with all attributes.Using ( 6) and ( 7), the memory storage cost of one object represented by TTSR approach can be defined as:

B. TTMR data representation approach
The temporal relation in TTMR is represented as To calculate the memory storage efficiency of intervalbased temporal database relation represented by TTHR approach, a general formula is constructed for calculating the size of a single tuple in a temporal relation.The cost of storing one tuple x in relation instance   TTMR R r is calculated as stated in (1), ( 2), ( 3) and ( 4), as follows:

 
x Cost can be represented as: The variable i represents the total number of time varying attributes , as in Eqns.5, then the equation becomes: (10) Using ( 9) and (10), the memory storage cost of one object represented by TTMR approach can be defined as: As stated in (1), ( 2), ( 3) and ( 4 Such that, Att_index: is an attribute to index the timevarying attributes with one byte size. : is a new added attribute of variant data type to hold data from different types.Its size is assumed to be the same size of the largest field size in and ( 13), the memory storage cost of one object represented by TTHR approach can be defined as: .

V. COMPARISONS OF MEMORY STORAGE COST AND RESULT ANALYSIS OF THE THREE APPROACHES
In this section, we will mimic the storage cost of the three models based on various settings of the parameters that have direct impact to the temporal data storage.The Default values are initiated with consideration of general cases as follows: K= www.ijacsa.thesai.org9, U= 110, C=37, T=20,  =9.Fig. 5 shows the memory storage cost for the initial values for the different parameters that construct the temporal relation.For these values, TTSRbased approach shows worse storage costs comparing to TTMR-based and TTHR-based approaches.However, the graph shows a positive indication that TTHR can be used as an efficient storage that is better than TTMR-based approach until the value of  = 40.After this point it seems that both TTHR and TTMR have the same storage efficiency.Fig. 6 shows the storage costs of the temporal relational approach after freezing all the parameters and varying the sizes of the time-varying attributes (C ).For these values, TTSRbased approach shows worse storage costs comparing to TTMR-based and TTHR-based approaches.However, the graph shows a positive indication that TTHR can be used as an efficient storage that is better than TTMR-based approach until the value of C = 150 byte.After this point it seems that both TTHR and TTMR have the same storage efficiency.We increase K value from 9 to 300 bytes.As we can see, the TTHR-based approach shows the best storage efficiency the others.However, it is shown that the difference of storage efficiency is marginal between the TTHR-based approach and the TTMR-based approach.
A similar study for calculating the efficiency of memory storage has been done by Atya in [2], this study compared the Snodgrass model (which is under TTMR-based approach in our study) with Tansel N1NF relational nested model.A study by Noh in [23] has introduced a new platform for modeling temporal database under XML-based platform.He compared the relational model as he named it (which is under TTSRbased approach in our study) with XML-based, and objectoriented based approach.

VI. CONCLUSION
A new approach for representing temporal database in relational data model has been demonstrated in this research work.A comparison study of the proposed model (TTHR) with the main models in literature (TTSR and TTMR) with respect to the memory storage efficiency has been mathematically illustrated.To measure the storage costs, we have established a mathematical model (formulas) for the three approaches.The measurement of the performance is represented by the size of the whole stored temporal data as stated in [22], [29], [31].It has been proved that TTHR has achieved significant saving in memory storage that ranges between 68%-81% over TTSR approach, and 10%-32% over TTMR.The memory storage save is based on the average change of the time varying attributes [29], [30], [31].A validation and verification study of the correctness and the expressiveness of TTHR model has been depicted in [32].Finally, TTHR mimics TTMR in data representation by removing the needless redundancy of data.Moreover, TTHR mimics TTSR in representing the current valid data in one relation, to benefit from querying the current snapshot data which costs a lot in TTMR as stated in [22].

ACKNOWLEDGMENT
This paper was supported by the Deanship of Scientific Research (DSR), King Abdulaziz University.The authors, therefore, acknowledge with thanks to DSR's technical support.
has introduced a new platform for modeling temporal database under XMLbased platform.He compared the relational model as he named it (which is under TTSR-based approach in our study) with XML-based, and object-oriented based approach.II.INTERVAL-BASED PARAMETRIC TEMPORAL DATABASE MODEL Interval-based parametric temporal database model uses N1NF and attribute-timestamping data model with intervalbased timestamps.The time interval 12

Fig. 2
Fig. 2 demonstrates the Emp relation after transformation form parametric interval-based representation shown in Fig. 1.The evolution of data in Emp relation represented in TTSR approach is shown in Fig. 2. The semantic of update operation follows the temporal update operation introduced in literature [2], [3], [26].The consequence of updating any time-varying attribute results in inserting a new tuple with the new updated values and new time points as shown in Fig. 2. Deleting any tuple is accomplished by updating ls T to instant time point.The highlighted tuple with red color in Fig. 2 is an example of logical delete.

Fig. 3 .
Fig. 3. Emp relation in TTMR approach.The temporal relation schema (Emp) in Fig.3that is corresponding to TTMR R is a new auxiliary relation schema that is created as have the following meaning, Att_index: is a variable to identify the time-varying attribute m C A which begins updated such that 1 m j   . is a new attribute that corresponds to attribute Updated_V as shown in Fig. 4.This attribute stores the updated value of any attribute in C A set. vs T : represents the Valid Start Time (VST ).ve T : represents the Valid End Time (VET ).The purpose of this representation is to keep the latest (current snapshot data) updated data in one relation T R , and the historical changes of the validity of the time-varying data in the auxiliary relation _ T R V T [22, 29].A relation instance is denoted by t r , and _ t r vt , where () .For tuples the symbols x, y and z can be used, thus a tuple, ls T and le T of that particular object (tupleVT is/are referencing to tuple x in t r .The tuple(s) in _ t r vt consist of the primary key of x , the identity(Att_index) of the time-varying attribute in x , the updated time-varying attributes value  in x , and the time validity of the updated attribute vs T and ve T .A subset of the domain of lifespan time is associated with each tuple in T R shows that the existence of the object recorded by the tuple is true in the modeled reality during each lifespan chronon in that subset.A subset of domain of valid times is associated with each tuple in _ T R V T , represents the fact that the tuple _ [x] t r vt records the change of the validity of m C a in x .This fact is considered true in the modeled reality, such that the time of validity strictly contained in the time of the lifespan of x .Thus, the associated time with a tuple in TTHR is interval- based temporal timestamp.The tuples in t r are timestamped by the lifespan time of the object denoted by ls t , whereas the tuples in _ t vt r are timestamped by the valid-time denoted by v t , both consisting of a temporal chronon in the time dimension spanned by lifespan and valid time.

Fig. 4 .
Fig. 4. Emp relation in TTHR approach.The example in the Fig. 4 uses two relations: Emp, describing employees information, such that, this relation is corresponding to T R in TTHR, and the auxiliary relation Emp_VT that is used to record the changes of the validity of the time-varying attributes in Emp as well as the changes of the lifespan of the objects in Emp.The different types of attributes tuple in TTSR will be timestamped by valid time and lifespan time.The cost of storing the history of the changes of C A with   C fA  times in a period (lifespan interval)


CA .The cost of storing the history of the changes of each times in a period/interval of time  can be calculated as: ), the cost of storing the history of changes of C A with   C fA  times in a period/interval (lifespan interval) of time  can be calculated as:

Fig. 7
Fig. 7 shows the storage efficiency after freezing all the parameters and varying the sizes of key attributes ( K ) value variations.

where T is defined as a set of time points in the domain D . Interval start point 1 t and
Tel_no, Supervssn, D_no, Salary, Rank

TABLE I
K ASet of key attributes