Computer Vision Conference (CVC) 2026
21-22 May 2026
Publication Links
IJACSA
Special Issues
Computer Vision Conference (CVC)
Computing Conference
Intelligent Systems Conference (IntelliSys)
Future Technologies Conference (FTC)
International Journal of Advanced Computer Science and Applications(IJACSA), Volume 16 Issue 9, 2025.
Abstract: As clinical cancer research increasingly depends on large, diverse datasets, concerns about data duplication have grown. Duplicates can undermine data integrity, skew analytical results, and reduce the reproducibility of studies. This review explores how visualization can play a critical role in identifying and managing duplicates in non-image clinical cancer data. Drawing from literature in biomedical informatics, data quality, and visual analytics, it synthesizes current approaches and highlights key challenges. Using a scoping review methodology, we analyzed studies published over the past two decades, focusing on non-image clinical datasets. Studies were selected based on relevance to duplicate detection and visualization, excluding those centered on image or video data. Major datasets like The Cancer Genome Atlas (TCGA), The Cancer Imaging Archive (TCIA), and the North American Association of Central Cancer Registries (NAACCR) are examined to show how duplication occurs across genomic, clinical, and registry data. The review assesses existing visualization techniques based on their scalability, interactivity, integration with deduplication algorithms, and how well they address core data quality dimensions. While some tools offer scalable and interactive features, few provide clear visual representations of duplicates, especially those involving complex temporal and multidimensional patterns. Several methodological gaps are identified, including limited integration of data quality metrics, inadequate support for tracking changes over time, and a lack of standardized evaluation frameworks. To address these issues, the review advocates for the development of practical, user-friendly visualization tools that combine duplicate detection with key indicators of data quality. By offering a more complete and intuitive view of clinical datasets, such tools can help researchers and clinicians make better-informed decisions, ultimately improving the reliability and impact of cancer research. Bridging the gap between technical detection and visual understanding is essential for advancing data-driven healthcare and ensuring high-quality, reproducible outcomes.
Nurul A. Emran and Ruhaila Maskat. “A Review of Visualization Techniques for Duplicate Detection in Cancer Datasets”. International Journal of Advanced Computer Science and Applications (IJACSA) 16.9 (2025). http://dx.doi.org/10.14569/IJACSA.2025.0160958
@article{Emran2025,
title = {A Review of Visualization Techniques for Duplicate Detection in Cancer Datasets},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2025.0160958},
url = {http://dx.doi.org/10.14569/IJACSA.2025.0160958},
year = {2025},
publisher = {The Science and Information Organization},
volume = {16},
number = {9},
author = {Nurul A. Emran and Ruhaila Maskat}
}
Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.