Articlebase.com scraping tutorial – part 2, getting search links

In the first part, I have shown how to get links under any category. Now, we will get links when you search articlebase.com with a search term.

Getting HTML

$keyword = ‘beauty’;

$page = intval($page);
$url = “http://www.articlesbase.com/find-articles.php?q=”.strtolower(urlencode($keyword)).”&page=”.urlencode($page);

$html = file_get_contents($url);
if(!$html) return false;

Initialize objects

$dom = new DOMDocument();
@$dom->loadHTML($html);
$dom = new DOMXPath($dom);

Continue reading →

Articlebase.com scraping tutorial – part 1, getting links under category

Recently I have worked with several web scraping projects. I though I can write my tips so that it comes to usages of others. I am also writing a library for grabbing contents from a few popular article resources like www.articlesnatch.com, www.articlebase.com, www.ezinearticles.com.

Initially I have used simple html dom for traversing the html. It is easy and nice but the script is memory hog. I even sometime would failed to work under 256MB allocated RAM for PHP, specially when you run such traversing in a few (loop) cycles. So, I totally dropped using that and used PHP’s DomDocument.
In my projects I have used cURL for getting contents from remote URL. But here I will show by using simple function file_get_contents().

Getting Articles’ Links under any Category

The category page of article page lists a number of links to articles with a few lines of excerpts. We will fetch the links only.

First of all retrieve contents from remote URL:

//prepare URL

$category = 'Marketing';

$page = 1;

$url = "http://www.articlebase.com/".strtolower($category)."-articles/$page/";

Continue reading →

URL Shortening using PHP

Yesterday while developing PunBB Twitter extension, Invarbrass of Projanmo Forum helped me by providing some URL shortening snippets for several websites. Those were very simple to use. I am sharing so that it comes to others’ usages.

to.ly
function CompressURL($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://to.ly/api.php?longurl=".urlencode($url));
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_HEADER, 0);
$shorturl = curl_exec ($ch);
curl_close ($ch);
return $shorturl;
}
echo CompressURL("http://projanmo.com");

Continue reading →