Basically, what we need creating is a Bayesian filter (VB.NET DLL) which is capable of accepting a string (i.e. document or letter) and then to determine statistically, based on words contained within a ‘good word file’ and a ‘bad word file’ whether the whole string is considered as junk or not.
This is done by determining how many times each unique word in the string appears in the ‘good word and bad word’ files, determining a single stat for each unique word, and then generating a single stat for the entire string ??" a calculation based upon the individual stats derived for each word.
Each time a word is encountered it should be analysed based upon the above and then also logged into the appropriate file (once determined if the entire string is junk or not) incrementing the existing value for that word in the relative file, the relative file is determined based upon whether the file is junk or not.
The project should also include a ‘excluded word’ file, whereby words can be excluded from being included in the filter if they appear in this file, this will cater for such words which are common in everyday English language such as: in, a, the, etc.
The project should be written as a DLL (VB.NEt) and should be as OOP as possible, i.e. example classes: word, words (collection), excluded word, excluded words (collection) etc.
All in all it would be a pretty easy and fun project to complete. I simply do not have the available time to potter away at it, as I would prefer it to be completed as soon as possible ??" due to other commitments I can not achieve this.
Please note that we will require the code to be as commented as possible - simply to help us with any modifications that may need doing on our part.
Further information on Bayesian Filters can be found here: <[login to view URL]>
## Deliverables
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):
a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.
b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).
## Platform
The DLL should run on Windows 98 through to Windows XP.