scraping tutorial – part 2, getting search links

In the first part, I have shown how to get links under any category. Now, we will get links when you search with a search term.

Getting HTML

$keyword = ‘beauty’;

$page = intval($page);
$url = “”.strtolower(urlencode($keyword)).”&page=”.urlencode($page);

$html = file_get_contents($url);
if(!$html) return false;

Initialize objects

$dom = new DOMDocument();
$dom = new DOMXPath($dom);

Continue reading → scraping tutorial – part 1, getting links under category

Recently I have worked with several web scraping projects. I though I can write my tips so that it comes to usages of others. I am also writing a library for grabbing contents from a few popular article resources like,,

Initially I have used simple html dom for traversing the html. It is easy and nice but the script is memory hog. I even sometime would failed to work under 256MB allocated RAM for PHP, specially when you run such traversing in a few (loop) cycles. So, I totally dropped using that and used PHP’s DomDocument.
In my projects I have used cURL for getting contents from remote URL. But here I will show by using simple function file_get_contents().

Getting Articles’ Links under any Category

The category page of article page lists a number of links to articles with a few lines of excerpts. We will fetch the links only.

First of all retrieve contents from remote URL:

//prepare URL

$category = 'Marketing';

$page = 1;

$url = "".strtolower($category)."-articles/$page/";

Continue reading →