Coursework 3: Scene Recognition
This is a group coursework: please work in teams of four people.
Due date: Thursday 16th December 2021, 16:00.
Development data download: training.zip
Testing data download: testing.zip
Handin: 2122/COMP3204/3/
Required files: report.pdf; code.zip; run1.txt; run2.txt; run3.txt
Credit: 20% of overall module mark
The goal of this project is to introduce you to image recognition. Specifically, we will examine the task of scene recognition starting with very simple methods – tiny images and nearest neighbour classification – and then move on to techniques that resemble the state-of-the-art – bags of quantized local features and linear classifiers learned by support vector machines.
This coursework will run following the methodology used in many current scientific benchmarking competitions/evaluations. You will be provided with a set of labelled development images from which you are allowed to develop and tune your classifiers. You will also be provided with a set of unlabelled images for which you will be asked to produce predictions of the correct class. We will score your predictions and there will be a prize for the team producing the best classifier.
You will need to use OpenIMAJ to write software that classifies scenes into one of 15 categories. We want you to implement three different classifiers as described below. You will then need to run each classifier against all the test images and provide a prediction of the class for each image.
You’ll need a good understanding of working with OpenIMAJ. Chapter 12 of the tutorial will be a great help.
The training data consists of 100 images for each of the 15 scene classes. These are arranged in directories named according to the class name. The test data consists of 2985 images. All the images are provided in JPEG format. All the images are grey-scale, so you don’t need to consider colour.
The key classification performance indicator for this task is average precision; this is literally the proportion of number of correct classifications to the total number of predictions (i.e. 2985).
As mentioned above, you need to develop and run three different classifiers. We’ll refer to the application of a classifier to the test data as a “run”.
Run #1: You should develop a simple k-nearest-neighbour classifier using the “tiny image” feature. The “tiny image” feature is one of the simplest possible image representations. One simply crops each image to a square about the centre, and then resizes it to a small, fixed resolution (we recommend 16x16). The pixel values can be packed into a vector by concatenating each image row. It tends to work slightly better if the tiny image is made to have zero mean and unit length. You can choose the optimal k-value for the classifier.
Run #2: You should develop a set of linear classifiers (use the LiblinearAnnotator
class to automatically create 15 one-vs-all classifiers) using a bag-of-visual-words feature based on fixed size densely-sampled pixel patches. We recommend that you start with 8x8 patches, sampled every 4 pixels in the x and y directions. A sample of these should be clustered using K-Means to learn a vocabulary (try ~500 clusters to start). You might want to consider mean-centring and normalising each patch before clustering/quantisation. Note: we’re not asking you to use SIFT features here - just take the pixels from the patches and flatten them into a vector & then use vector quantisation to map each patch to a visual word.
Run #3: You should try to develop the best classifier you can! You can choose whatever feature, encoding and classifier you like. Potential features: the GIST feature; Dense SIFT; Dense SIFT in a Gaussian Pyramid; Dense SIFT with spatial pooling (i.e. PHOW as in the OpenIMAJ tutorial), etc. Potential classifiers: Naive bayes; non-linear SVM (perhaps using a linear classifier with a Homogeneous Kernel Map), … Note: you don’t have to use OpenIMAJ for this run if you don’t want to (for example you might want to try and use a deep learning framework instead).
The predictions for each run must be written to a text file named runX.txt
(where X
is the run number) with the following format:
<image_name> <predicted_class>
<image_name> <predicted_class>
<image_name> <predicted_class>
...
For example:
0.jpg tallbuilding
1.jpg forest
2.jpg mountain
3.jpg store
4.jpg store
5.jpg bedroom
...
Each image can only appear once, and every test image must be present. Be aware that the test images are not numbered continuously!
The report must be no longer than 4 sides of A4, and must be submitted electronically as a PDF. The report must include:
You need to submit to ECS Handin the following items:
Marks will be awarded for:
Marks will not be based on the actual performance of your approach (although you can expect to lose marks if runs 1 and 2 are way off our expectations or you fail to follow the submission instructions). There will however be a prize for the team with the best performing run 3.
Standard ECS late submission penalties apply.
Individual feedback will be given to each team covering the above points. We will also give overall feedback on the approaches taken in class when we announce the winner!
If you have any problems/questions then email or speak to Jon in his office.