Input Datasets

In this project 4 text classification standardized datasets are used to feed the neural nets and report the efficiency of the algorithms.
The idea is to evaluate both architectures using different datasets. The statistics summary for each dataset is shown in the Table below.
The optization is done at each trial chosing a set of parameters applied to each dataset. There is two versions of each dataset:
TFIDF and distance-based meta-features (MF)

Dataset Size #Features #Classes Mean Minor Class 1st Quartile Median 3rd Quartile Major Class
20NG 18766 61050 20 938 627 952 978 988 998
4UNI 8274 40195 7 1182 13 343 929 1382 3757
REUTERS 13327 19590 90 148 2 8 29 91 3964
ACM 24897 59990 11 2263 63 761 2041 3278 6562

The datasets we use to create all visualizations are derived from the process of optimization (with 5-fold cross-validation)
of each set of parameters applied to each dataset version. During the optization we get the all trials (we set 80 trials) with all
the set of parameters that were tested, time and loss.