The
first one, "casi2002-groundtruth.arff" (102 KB) is a Ground Truthing of
the CASI image. These data have been acquired by a human
expert using a computer program according to his own knowledge on the
ground (zone of the saltmarshes of Venice), so
they should be quite accuracte, but as you would understand this
process
is costly, so this is why the dataset is small.
The
second one, "casi2002-sam.arff" (17 951 KB) is a full and automatic
classification of
the CASI image done with Spectral Angle Mapper (SAM). Of
course this automatic classification produce a larger dataset but you
have to consider it as not relevant and accurate for classfication.
Anyway, you can use it for validation (comparing your own
classification with this image). Note also that some pixels are
unclassified (marked in the class 'misc'). Note also that the same
pixel can be classified in two different classes between the Ground
Truth and the SAM.
You
can cut these dataset in train/test set as you want (find a good
trade-off between accuracy and time), but you have first to
randomize the order of the instances: if not, as the pixels are stored
in the original order of the image, some regions will be missed (in
KEEL this is automatically done while cutting a
dataset, and in Weka you can do it with
weka.filters.unsupervised.instance.Resample).