Today I came to office a bit late. Immediately after coming I was informed that a parcel has arrived on my name. I was expecting a dummy camera lens as it is supposed to come. However, seeing the packet I was a bit suspicious. The dummy lens’s package would not be so small. After I opened it, I could not believe my eyes. It is amazonkindle! Yes, I ordered it on 15th May from www.aliexpress.com! But still unbelievable. Do you know why? Keep following.
In aliexpress, when a order is placed and payment is done the order status was ‘Shipment Required‘. The seller was supposed to ship it and mark it as ‘Shipped’ and provide me the tracking ID. However, till now I did not receive any tracking and the order status was ‘Shipment Required’ last night. Seeing the seller is delaying unnecessarily, I canceled the order and aliexpress.com refunded me the whole money within few minutes. I also gave negative feedback to the buyer for wasting my time.
So, I guess you now can understand why I am so surprised seeing it. The seller must be stupid! Why s/he did not update the status of the order?
Anyway, as still in office, I could not get much time to explore it. I will explore it at home. I wish the seller will contact me and I will arrange a way to pay him.
The benefit of aliexpress is that it does keeps the money in escrow until you confirm that the product is reached to you and it is upto your expectation. I first heard about aliexpress.com from a member of Projanmo Forum.
My version is Kindle 3 WiFi + 3G. Costs (including shipping) was US$191. Also I had to pay 1050 tk here as excise duty and other fees.
Currently I am working with Prosperent.com’s affiliate network and making an auto blog plugin for wordpress. The plugin is intended to fetch products from prosperent and create wordpress posts. You will earn commission from the sales through your affiliate link! The affiliate links will also be given by prosperent. So, you don’t need to think about it. You just need to requests their API with the API Key and they will mask the product URL with your affiliate ID so the impressions and clicks are accounted under your affiliate account.
Prosperent already provides a simple and beautiful PHP class that makes enjoyable while working with their API. You just need to write a few lines to get products from them.
In my last three tutorials I have discussed how to scrap contents from www.articlebase.com. In this part, I will show how to scrap contents from www.articlesnatch.com. However, unlike the previous tutorials, I will not use DOMDocument in this part. I will not use regular expressions either.
I will show how to get full article. I won’t show how to get articles/links under any category as articlesnatch.com offers feed for each category. So it is easy to get article summary and links of any category. As the feed does not include full text, I will just show how to get it.
Getting Article Body
$html = file_get_contents($link);
We need the contents that is within the div with a class named “KonaBody”. That mean, our target contents are within:
Recently I have worked with several web scraping projects. I though I can write my tips so that it comes to usages of others. I am also writing a library for grabbing contents from a few popular article resources like www.articlesnatch.com, www.articlebase.com, www.ezinearticles.com.
Initially I have used simple html dom for traversing the html. It is easy and nice but the script is memory hog. I even sometime would failed to work under 256MB allocated RAM for PHP, specially when you run such traversing in a few (loop) cycles. So, I totally dropped using that and used PHP’s DomDocument.
In my projects I have used cURL for getting contents from remote URL. But here I will show by using simple function file_get_contents().
Getting Articles’ Links under any Category
The category page of article page lists a number of links to articles with a few lines of excerpts. We will fetch the links only.
My first scraping work was www.stock.projanmo.com where I have fetched and processed stock data from www.dsebd.org and www.biasl.net. I had to scrap them as they did not have any syndication feed. I had to process line by line. That was tedious job.
Later, I have worked with eBay product scraping for a few of my clients. In many cases, I did not need to take much trouble as they have web services. Whatever, that was most boring tasks as I am not good at Regular Expression. So, I have denied a lots of such tasks.
Recently, one of my old customer requested me to work again on scraping for collecting articles from www.articlesnatch.com and auto blog in wordpress. It also was comparatively easy as it has RSS feed for search page. But the RSS had summary of article. I had to fetch the whole article.
Yesterday, I have started a pretty big scrapping project. I also took helping hands to complete it fast. This time, I had to scrap articles from www.articlebase.com and autoblog in wordpress on some preselected schedules (wordpress’s native cron). As they don’t have any feed for search keyword/category, it is a bit complex comparing to previous one. However, as I already have gain some scraping experience, it was very easy for me. And most surprisingly, I am now getting interest on scraping .
I have updated the DNS but still I am getting the site from old server. Is this your problem?
This occurs due to propagation time. When you update the DNS it needs up to 72 hours to populate on internet. All ISP uses DNS caching to cache the DNS query results. So, it needs some time to refresh the cache to reflect your changes.
But in your computer you can solve it easily. You can override the cache. Just follow the steps:
domain = ‘binodon4all.com’
new server’s ip = 126.96.36.199
Open the C:\Windows\System32\Drivers\etc\hosts file using notepad
Write the following line at the end of the file:
Save the file
Open command prompt. Type
Open your browser. Now browse binodon4all.com. You should get site from new server
For Linux & Mac OS:
Login as root
Open /etc/hosts file typing the following command vi /etc/hosts
If you find vi command difficult use: gedit /etc/hosts
At the end of the file add the same line as previous: 188.8.131.52 binodon4all.com
Save the file. Restart your network by: service network restart
Try browsing binodon4all.com. You should get site from new server.