Abstract: In this paper, advancing web scale knowledge extraction and alignment by integrating few sources has been considered by exploring different methods of aggregation and attention in order to focus on image information. An improved model, namely, Wrapper Extraction of Image using DOM and JSON (WEIDJ) has been proposed to extract images and the related information in fastest way. Several models, such as Document Object Model (DOM), Wrapper using Hybrid DOM and JSON (WHDJ), WEIDJ and WEIDJ (no-rules) are been discussed. The experimental results on real world websites demonstrate that our models outperform others, such as Document Object Model (DOM), Wrapper using Hybrid DOM and JSON (WHDJ) in terms of mining in a higher volume of web data from a various types of image format and taking the consideration of web data extraction from deep web.
Keywords: Data extraction; Document Object Model; web data extraction; Wrapper using Hybrid DOM and JSON; Wrapper Extraction of Image using DOM and JSON