Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.
Digital Object Identifier (DOI) : 10.14569/IJACSA.2010.010209
Article Published in International Journal of Advanced Computer Science and Applications(IJACSA), Volume 1 Issue 2, 2010.
Abstract: The massive adoption of social media has provided new ways for individuals to express their opinions online. The blogosphere, an inherent part of this trend, contains a vast array of information about a variety of topics. It is a huge think tank that creates an enormous and ever-changing archive of open source intelligence. Mining and modeling this vast pool of data to extract, exploit and describe meaningful knowledge in order to leverage structures and dynamics of emerging networks within the blogosphere is the higher-level aim of the research presented here. Our proprieteary development of a tailor-made feed-crawler-framework meets exactly this need. While the main concept, as well as the basic techniques and implementation details of the crawler have already been dealt with in earlier publications, this paper focuses on several recent optimization efforts made on the crawler framework that proved to be crucial for the performance of the overall framework.
Justus Bross, Patrick Hennig, Philipp Berger and Christoph Meinel, “RSS-Crawler Enhancement for Blogosphere-Mapping ” International Journal of Advanced Computer Science and Applications(IJACSA), 1(2), 2010. http://dx.doi.org/10.14569/IJACSA.2010.010209