Well, you are talking about a lot of code depending on what you need for the resulting data. I have several keyword scrapers for various sites that acquire data for me nightly. But, I do find that each website needs different special handling. Scraping sites is actually easy. The hard part is pulling out the correct data you need.
Please explain what and why you need to scrape 10k keywords for. You mentioned Amazon, so are you attempting to do price-checks? This is quite easy with Amazon’s site. Here is sample code for getting this info from Amazon. You will need to know their ISIN number which they use for indexing products. Gives you an idea what you can do with little code. This code was found with the help of Google.
[php]<?php
/* Enter the Amazon Product ISIN */
$amazonISIN = "B00OTWNSMM";
/* Grab the content of the HTML web page */
$html = file_get_contents("http://www.amazon.com/gp/aw/d/$amazonISIN");
/* Clean-up */
$html = str_replace("&nbsp;", "", $html);
/* The magical regex for extracting the price */
$regex = '/\<b\>(Prezzo|Precio|Price|Prix Amazon|Preis):?\<\/b\>([^\<]+)/i';
/* Return the price */
if (preg_match($regex, $html, $price)) {
$price = number_format((float)($price[2]/100), 2, '.', '');
echo "The price for amazon.com/dp/$amazonISIN is $price";
} else {
echo "Sorry, the item is out-of-stock on Amazon";
}
?>[/php]
Obviously, this code is simple and for only one product at a time. You would need to keep a database list of their ISIN numbers and parse thru them in a loop. Also, most websites only allow a certain number of calls to their pages in a set time. Google for instance will lock your IP out if you do too many requests in one minute. Occasionally, you need to have your code sleep for part of a second between scrapes to make it work correctly. In one of my scrapers, I add a " sleep(.5); " command after each scrape and the site lets me keep going. With no delay, it locks out my IP after about 3 minutes and the code fails.
In other words, there is a lot of things to think about for a project like this. Not a complicated process, but, you have to design it in the correct manner. Perhaps you should give us a little more info on what you are scraping.