Articlesnatch.com scraping tutorial, getting full article

In my last three tutorials I have discussed how to scrap contents from www.articlebase.com. In this part, I will show how to scrap contents from www.articlesnatch.com. However, unlike the previous tutorials, I will not use DOMDocument in this part. I will not use regular expressions either.

I will show how to get full article. I won’t show how to get articles/links under any category as articlesnatch.com offers feed for each category. So it is easy to get article summary and links of any category. As the feed does not include full text, I will just show how to get it.

Getting Article Body

$html = file_get_contents($link);

We need the contents that is within the div with a class named “KonaBody”. That mean, our target contents are within:

<div class="KonaBody">

......

......

</div>

So, we may remove anything before this div.

$desc = strstr($html,'<div class="KonaBody">');

Continue reading →