Simple Web Crawler Question

I’m just starting web crawling and already having problems.
I have a URL

https://www.bop.gov/PublicInfo/execute/inmateloc?todo=query&output=json&inmateNumType=IRN&inmateNum=94970004

When you run it in a browser you get a plain text result, it is JSON but looks to be plain text.
I’m trying to capture those results using php. I’ve tried fopen() and file_get_contents() and they don’t seem to be working for me.

I just need to figure out how to get them into a variable. I know what to do after that. Also I’d like to avoid using any plugin like curl. Any ideas? Or am I just doing it wrong?

This is what I had:

$homepage = file_get_contents(‘https://www.bop.gov/PublicInfo/execute/inmateloc?todo=query&output=json&inmateNumType=IRN&inmateNum=94970004’);
echo $homepage;

I get nothing.

Any ideas? Thanks

cURL isn’t a plugin. Why the aversion to using it? You need an a browser user agent to not be blocked.

<?php
function getSslPage($url)
{
    $agent = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';

    $ch = curl_init();
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($ch, CURLOPT_VERBOSE, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_USERAGENT, $agent);
    curl_setopt($ch, CURLOPT_URL, $url);
    $result = curl_exec($ch);
    curl_close($ch);
    return $result;
}

try {
    $homepage = getSslPage("https://www.bop.gov/PublicInfo/execute/inmateloc?todo=query&output=json&inmateNumType=IRN&inmateNum=94970004");
    $data = json_decode($homepage));
    print_r($data);
} catch (Exception $e) {
    echo "<p><b>{$e->getCode()}</b></p>";
    echo $e->getMessage();
}

Thanks… This is what I get

This page isn’t working

bec*.pro** is currently unable to handle this request.

HTTP ERROR 500

before, I would just get a blank page… when I took out the try/catch and echoed ‘ok’ it worked fine.

UPDATE::::
I tinkered and this worked!!!

$agent = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_URL, "https://www.bop.gov/PublicInfo/execute/inmateloc?todo=query&output=json&inmateNumType=IRN&inmateNum=94970004");
$result = curl_exec($ch);
curl_close($ch);
echo $result;

Thanks so much for pointing me in the right direction.

Sponsor our Newsletter | Privacy Policy | Terms of Service