Anyone else frequent imdb.com? A couple months ago I was checking out some trailers and stumbled upon IMDB’s error 404 page. If you haven’t seen it, it’s quite awesome.
Anyways, I wanted to use that 404 page on my latest project. It seemed very humorous. So, without further adieu, here’s my cURL script for scraping the quotes from IMDB’s 404.
Check out my live example here: http://andrewliesenfeld.com/pxbpanel/404
[php]<?php
//Establish URL to strip code from
$url = “http://www.imdb.com/404”;
//Initialize cURL session
$ch = curl_init($url);
//Turn data into string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
//Execute cURL session
$content = curl_exec($ch);
//Close cURL session
curl_close($ch);
//Find and select
preg_match("/<div class=“error_attrib”.?>([^`]?)</div>/", $content, $imdb_attrib);
//Find and select
preg_match("/<div class=“error_quote”.?>.?</div>/", $content, $imdb_quote);
//Create variable of first instance of $imdb_quote
$strip = $imdb_quote[0];
//Remove code and get the text
$quote = strip_tags($strip);
//Place the IMDB quote in an HTML structure of your choice
echo ‘
//Create variable of first instance of $imdb_attrib
$attrib = $imdb_attrib[0];
//Create variable for root URL
$rootURL = “http://imdb.com”;
//Find link () in $attrib and strip the href value (it’s a relative path, “/title/tt0032138/”)
preg_match("/(?<=href=("|’))[^"’]+(?=("|’))/", $attrib, $urlL);
//Combine the root URL with the relative path to the movie (“http://imdb.com/title/tt0032138/”)
$combineURL = $rootURL . $urlL[0];
//Replace the “new” link with the “old” link
$finalURL = preg_replace("/(?<=href=("|’))[^"’]+(?=("|’))/",$combineURL,$attrib);
//Echo the movie attributor
echo $finalURL;
//I liked the structure of the code for error_attrib so I didn’t bother stripping the tags. Here’s the code structure for error_attrib:
//
?>[/php]
I hope someone finds this useful.
Enjoy,
awl19