Hi,
for the last several years, we have been working on scraping, extraction and crawling websites to extract and retrieve specific attributes and aggregation from various unstructured data, websites and have assembled and synchronized to databases and datastores before and have used Scrapy framework for some of our work. In particular, we have extracted information from Groupon, Wikipedia, Youtube and other product based web sites for very specific attributes.
Please find below our short experience summary.
* Have several years experience developing Text Mining and Information Extraction and Analytics for web crawling, scraping, extraction and aggregation from unstructured big data such as web-pages and text corpus, assembling, synchronizing and populating them into databases, datastores and search-indexes(Lucene, Solr) for analysis, search, reporting and dashboard.
* Extensive experience using Perl, PHP, Python, C, Java, .NET with MySql, Oracle, MS-SQL Server
* Information Extraction Tools : Scrapy Framework, Weka, R, Excel, Perl-CPAN Packages for Extraction.
Estimated Budget : ~ 245 USD ( Timeline : 5-7 days )
Price,milestones and timelines flexible and negotiable based on exact project specifications and details or for any additional project work.
Would you be interested in sharing more information regarding your project.