Seeking a script to provide similar functionality to this website:
[login to view URL]
Essentially to compare files to Google, or another specified target, working at phrase level (commas perhaps?). Provide feedback in terms of the number of duplicated phrases, preferably the number of times the phrase has been found in the target location and the overall percentage of duplication.
But, I want to go further than the website referenced.
I want to be able to compare files submitted to my site, automatically, and to see a result for the level of duplication in the admin panel of my site. I would also like to be able to use the tool to examine my site's current content with a view to cleaning that up. (over 30,000 articles)
I would like the facility to automatically reject content whose degree of duplication exceeds a specified (variable) threshold. If the work to include this functionality is relatively trivial, it can be provided straight away. If the work is non-trivial then we can add this in the future, but it should be bourne in mind.
By preference I would prefer for the script to be relatively stand alone, ie a seperate script called from the existing scripts rather than as a part of the existing code.
I will provide the URL and scripts for the site where this project will be used.
Clarifying questions will be happily answered with a view to attaining a good outcome!