Usage of Dedicated Data Structures for URL Databases in a Large-scale Crawling
Computer Science Journal AGH
Jan 2009
The article discuss usage of Berkeley DB data structures such as hash tables and b-trees for implementation of a high performance URL database. The article presents a formal model for a data structures oriented URL database, which can be used as an alternative for a relational oriented URL database. Keywords: crawling, crawler, large-scale, Berkeley DB, URL database, URL repository, data structures.