Recently I have worked with several web scraping projects. I though I can write my tips so that it comes to usages of others. I am also writing a library for grabbing contents from a few popular article resources like www.articlesnatch.com, www.articlebase.com, www.ezinearticles.com.
Initially I have used simple html dom for traversing the html. It is easy and nice but the script is memory hog. I even sometime would failed to work under 256MB allocated RAM for PHP, specially when you run such traversing in a few (loop) cycles. So, I totally dropped using that and used PHP’s DomDocument.
In my projects I have used cURL for getting contents from remote URL. But here I will show by using simple function file_get_contents().
Getting Articles’ Links under any Category
The category page of article page lists a number of links to articles with a few lines of excerpts. We will fetch the links only.
First of all retrieve contents from remote URL:
//prepare URL $category = 'Marketing'; $page = 1; $url = "http://www.articlebase.com/".strtolower($category)."-articles/$page/";