Future of Information and Communication Conference (FICC) 2025
28-29 April 2025
Publication Links
IJACSA
Special Issues
Future of Information and Communication Conference (FICC)
Computing Conference
Intelligent Systems Conference (IntelliSys)
Future Technologies Conference (FTC)
International Journal of Advanced Computer Science and Applications(IJACSA), Volume 11 Issue 2, 2020.
Abstract: Data Accuracy is one of the main dimensions of Data Quality; it measures the degree to which data are correct. Knowing the accuracy of an organization's data reflects the level of reliability it can assign to them in decision-making processes. Measuring data accuracy in Big Data environment is a process that involves comparing data to assess with some "reference data" considered by the system to be correct. However, such a process can be complex or even impossible in the absence of appropriate reference data. In this paper, we focus on this problem and propose an approach to obtain the reference data thanks to the emergence of Big Data technologies. Our approach is based on the upstream selection of a set of criteria that we define as "Accuracy Criteria". We use furthermore a set of techniques such as Big Data Sampling, Schema Matching, Record Linkage, and Similarity Measurement. The proposed model and experiment results allow us to be more confident in the importance of data quality assessment solution and the configuration of the accuracy criteria to automate the selection of reference data in a Data Lake.
Mohamed TALHA, Nabil ELMARZOUQI and Anas ABOU EL KALAM, “Towards a Powerful Solution for Data Accuracy Assessment in the Big Data Context” International Journal of Advanced Computer Science and Applications(IJACSA), 11(2), 2020. http://dx.doi.org/10.14569/IJACSA.2020.0110254
@article{TALHA2020,
title = {Towards a Powerful Solution for Data Accuracy Assessment in the Big Data Context},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2020.0110254},
url = {http://dx.doi.org/10.14569/IJACSA.2020.0110254},
year = {2020},
publisher = {The Science and Information Organization},
volume = {11},
number = {2},
author = {Mohamed TALHA and Nabil ELMARZOUQI and Anas ABOU EL KALAM}
}
Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.