We are trying to find duplicate content added to our site by users that already appear on Google search index. We need some software to do this for us!
We are a user based content site and we require some script/code/service to find and remove duplicate content across our domain that is already published on the web.
Need some software that will take a large sitemap and check all the pages for duplicate content on the web/search engines like Google.
Also to then take a RSS feed and check daily.
Extra notes:
Just to clarify. We require the software/script to check each page in a list of pages, ie. a [login to view URL] file.
We are trying to find duplicate content added to our site by users that already appear on Google search index. We need some software to do this for us!
It would also be great if you could scrape data from the html file of the original webpage (not our domain) and provide a date it was published.