We would like 17 sites crawled that use the same or similar content management systems.
We would like just the HTML of articles (no other type of page and no navigation, header, footer etc) put into a MySQL table. These will be retrieved and shown within our template, which includes our own simple stylesheet. It will need to conform to this stylesheet, though we can adjust the stylesheet to match your needs to some degree.
We also need the images from within these articles saved and served from our server.
There will be at least 5 million pages to download in this project. Do not bid unless you have dealt with this order of magnitude.
It will need to update all pages every month.
For the list of sites, please contact me.
In your response, please include:
B1. What hardware is required
B2. How long it will take to complete each milestone
B3. What experience you have working with large datasets
B4. What languages and technologies you plan to use.