Classifier

Cancelado Publicado Mar 28, 2016 Pagado a la entrega
Cancelado

You have to participate in the Kaggle competition and have to submit a 2-page report (using the provided template at the end of this description) and an implementation code.

As part of the practical assessment you are required to participate in the Kaggle in Class Competition "Are you in the UK's sunniest place?". The assessment grade, which is worth 50% of the total grade, is separated into 2 components: the 2-page report and the source code. The code component will be weighted based on your performance in the Kaggle competition. No submission to the competition means 0.0 weight. You have to make at least one submission to the Kaggle competition to get a non-zero weight!

The competition is a binary classification problem (label 1 for sunny and label 0 for not sunny). You are provided with 950 labelled training data (550 of sunny scenes and 400 of not sunny scenes), and 1050 of test data, which are not labelled. The task is to develop a binary-class classifier that predicts the labels for the test data set. Each data instance is represented as a 4608 dimensional feature vector. This vector is a concatenation of 4096 dimensional deep Convolutional Neural Networks (CNNs) features extracted from the fc7 activation layer of CaffeNet and 512 dimensional GIST features (this representation is given therefore you do not need to perform any feature extraction on images).

Additionally, you are also provided with two types of information that might be useful when building your classifier: a) the proportion of positive (sunny) data points and the proportion of not sunny data points in the test set, and b) confidence of the label annotation for each training data point. You can choose to incorporate or to ignore these additional data.

You can use any of your favourite classifiers. Some of the classifiers that we have discussed/will discuss in the class up to the mid-term are: decision tree, random forest, linear perceptron, naive Bayes, support vector machine, and logistic regression. We will not cover k-nearest neighbour (kNN) algorithm yet by the mid-term but this algorithm is one of the simplest and it can perform very well as well. You are not required to code the classifier from scratch. Feel free to use some of machine learning toolboxes such as Weka (in Java), scikit-learn (in Python), shogun (in C++), or stats (in Matlab). I value your creativity in solving the classification problem. You have to reason which classifier or combination of classifiers you use, how you handle issues specific to competition data set such as high dimensionality of the data (large number of features), how you do model selection (training-validation split or cross validation), and how you do further investigations to take into account the two extra information: test label proportion and the annotation confidence on labels.

Details of Research Report

You are expected to write a 2-page report detailing your solution to the Kaggle competition problem. Please use the provided latex or word template (see the end part of this description). Your report should include the following components (you are allowed to combine descriptions #2 and #3 but make sure we can easily identify them).

Mathlab y Mathematica

Nº del proyecto: #10069316

Sobre el proyecto

2 propuestas Proyecto remoto Activo Mar 28, 2016

2 freelancers están ofertando un promedio de £29 / hora por este trabajo

RafNancy

Being an experienced academic writer and well researcher. I am 100% confident I can do this project perfectly. I have already written PhD and Masters Level Paper for UK and US Students and I can easily work on it. I am Más

£28 GBP / hora
(1 comentario)
2.5
garg7

I am a PhD student in computer science, specializing in data mining and machine learning. I am well versed in Matlab and would like to discuss about your project.

£30 GBP / hora
(0 comentarios)
1.2