I need an application that would extract words from table from this site: [login to view URL] database. The application should have utility to browse entries (grouped by same words with number of same instance) and option to search through entries (similar to Overture Search Term Sugesstions [login to view URL]). The search and browsing should be limited by date.
The site collects data from form posting and an entry vanishes pretty quickly (around 2-3 sec). I would like to collect us much as possible, so the collection procedure should run least every 2 seconds (this time should be changable but if made through cron job it wont be difficult). The collection procedure should eliminate already collected words (from past collections).
The application should run on Linux/Mysql, the front end (browsing) preferably as website.
Application:
· Extract data from [login to view URL]
· Extract every 1-2 seconds
· Eliminate duplicates
· Search entries (with grouping option, count numbers of instance)
The system should work this way: Script for fetching should run from different (web) servers and will fill the database on master server. The master server will run cron job that will start scripts on different servers in cycles, so cron job needs to have a “list of scripts locations ??" [login to view URL]?. The php script for fetching should be the same on all servers (absolute paths to mySQL server) ??" so that is no problem. Displaying of records (search results) is only necessary on master server.
Each keyword fetched should be considered as unique entry to database. Under eliminating duplicates I was thinking on keywords that were already fetched from site (and not same keywords taken in like 15 minutes). We want to know WHAT and HOW OFTEN people search for different things.
I am also open to other suggestions.
## Deliverables
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):
a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.
b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).
## Platform
Linux/MySQL