Okay, since you should understand what we are going to do, I will explain step by step.
It is fairly simple code. But, the digging out of what you want may be tricky.
So, first step it to capture the full page. For our talk, we will use the page you posted:
http://fr.aliexpress.com/item/1pcs-lot-Portable-Sonar-LCD-Fish-Finder-Alarm-100M-AP-ice/410762949.html
There are three steps to the process.
-
Grab the page and store in an variable. This step easy. Below is the code to do this. Just a few
lines of code and I have used it a lot and it is fast and works well.
-
Decode the page and strip out the two items we want. This is done using string functions. But, the
page coming in is complicated and we will cheat a bit to locate the data we need. This process is really
two steps. First, we need to study the page itself and look at it’s HTML code to locate what we need and
then figure out code that will strip out the items we do not need. Only a few lines of code once we find
how the page is laid out.
-
Save the picture and costs in some form that is displayable. Usually, this will be save to your database
and the picture saved on your server. But, since you just need to display the picture, we can steal it from
their site and in this manner, we just have to save the name and price on your site. Or, if you just want to
display it, you can just display the values. Very little code here, too.
Now, those are the 3 simple steps. Here is step #1 which will grab that page and save it for next step:
[php]
<?php
// Get a web file (HTML, XHTML, XML, image, etc.) from a URL. Return an array containing the HTTP server response header fields and content.
// This is a CURL function which grabs the page for you
function get_web_page( $url )
{
$user_agent="Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0
Mozilla/5.0 (Windows NT 6.2; WOW64)
AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/31.0.1650.63
Safari/537.36 ";
$options = array(
CURLOPT_CUSTOMREQUEST =>"GET", //set request type post or get
CURLOPT_POST =>false, //set to GET
CURLOPT_USERAGENT => $user_agent, //set user agent
CURLOPT_COOKIEFILE =>"cookie.txt", //set cookie file
CURLOPT_COOKIEJAR =>"cookie.txt", //set cookie jar
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
// Read the web page and check for errors:
$url = "http://fr.aliexpress.com/item/1pcs-lot-Portable-Sonar-LCD-Fish-Finder-Alarm-100M-AP-ice/410762949.html";
$result = get_web_page( $url );
if ( $result['errno'] != 0 ) echo "... error: bad url, timeout, redirect loop ...";
if ( $result['http_code'] != 200 ) echo "... error: no page, no permissions, no service ...";
$page = $result['content'];
?>
[/php]
The variable $page now holds the entire page… Now for part two that will strip out what you need.
So, to find your data, I pulled up your page and used VIEW-SOURCE to see the page code. In there,
I located the price for the item in this sample. It is in a “Data-Table” and a SPAN inside of that named
with the price after the span. Easy to locate.
Therefore, to grab the price, you find that item and pull out the price. For the picture, I found it inside
the class named . So, all that is needed is to pull these two out of the
page code. This next section takes the resulting page above in the variable $page and locates these
two items and displays them. (the above code with added lines!)
[php]
<?php
// Get a web file (HTML, XHTML, XML, image, etc.) from a URL. Return an array containing the HTTP server response header fields and content.
// This is a CURL function which grabs the page for you
function get_web_page( $url )
{
$user_agent="Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0
Mozilla/5.0 (Windows NT 6.2; WOW64)
AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/31.0.1650.63
Safari/537.36 ";
$options = array(
CURLOPT_CUSTOMREQUEST =>"GET", //set request type post or get
CURLOPT_POST =>false, //set to GET
CURLOPT_USERAGENT => $user_agent, //set user agent
CURLOPT_COOKIEFILE =>"cookie.txt", //set cookie file
CURLOPT_COOKIEJAR =>"cookie.txt", //set cookie jar
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
// Read the web page and check for errors:
$url = "http://fr.aliexpress.com/item/1pcs-lot-Portable-Sonar-LCD-Fish-Finder-Alarm-100M-AP-ice/410762949.html";
$result = get_web_page( $url );
if ( $result['errno'] != 0 ) echo "... error: bad url, timeout, redirect loop ...";
if ( $result['http_code'] != 200 ) echo "... error: no page, no permissions, no service ...";
$page = $result['content'];
// We now have the page, locate the picture and price inside it
$start = strpos($page, '
') + 34; // This item plus the length of it...
$end = strpos($page, "", $start) - 2; // This item minus one for end of previous item...
$picture = substr($page, $start, $end-$start); // Place this text into <>'s and it will display on site...
echo "
". $picture . "
";
// We have the picture image text, now capture the price
$start = strpos($page, '') + 52; // This item plus the length of it...
$end = strpos($page, "", $start); // This item minus one for end of previous item...
$price = substr($page, $start, $end-$start); // Place this text into <>'s and it will display on site...
echo "
". $price . "
";
?>
[/php]
I tested this code as-is and it works perfectly. You will just have to set this into your code as needed.
So, test this page if you wish. It strips out the picture and text for the price and displays them both.
(You only need the function code once per page.) Hope that helps!