You might already know that PHP’s in_array() function matches the string needle case sensitive way. However, today I was needed something that matches in case insensitive way. Before doing it myself, I googled it and I found a ready made solution. It is quite simple. Here it is:
function in_arrayi($needle, $haystack)
foreach ($haystack as $value)
if (strtolower($value) == strtolower($needle))
Please note that this case-insensitive version is, at least, 5 times slower than in_array().
In the first part, I have shown how to get links under any category. Now, we will get links when you search articlebase.com with a search term.
$keyword = ‘beauty’;
$page = intval($page);
$url = “http://www.articlesbase.com/find-articles.php?q=”.strtolower(urlencode($keyword)).”&page=”.urlencode($page);
$html = file_get_contents($url);
if(!$html) return false;
$dom = new DOMDocument();
$dom = new DOMXPath($dom);
Continue reading →
Recently I have worked with several web scraping projects. I though I can write my tips so that it comes to usages of others. I am also writing a library for grabbing contents from a few popular article resources like www.articlesnatch.com, www.articlebase.com, www.ezinearticles.com.
Initially I have used simple html dom for traversing the html. It is easy and nice but the script is memory hog. I even sometime would failed to work under 256MB allocated RAM for PHP, specially when you run such traversing in a few (loop) cycles. So, I totally dropped using that and used PHP’s DomDocument.
In my projects I have used cURL for getting contents from remote URL. But here I will show by using simple function file_get_contents().
Getting Articles’ Links under any Category
The category page of article page lists a number of links to articles with a few lines of excerpts. We will fetch the links only.
First of all retrieve contents from remote URL:
$category = 'Marketing';
$page = 1;
$url = "http://www.articlebase.com/".strtolower($category)."-articles/$page/";
Continue reading →