In the first part of this series we found the good hyperparameters for a single linear SVM classifier. In part 2, we will try another technique to improve the performance of the system: ensemble learning.
So far, we already reached
95% of accuracy with some tweaking the hyperparameters and the training corpuses but the
F1 score is still around
~70% with the full gold standard which can be improved. There are also situations when
precision should be nearly perfect (because false positives are really not acceptable) or when the
recall should be optimized.
Here we will try to improve this situation by using ensemble learning. It uses multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. In our examples, each model will have a vote and the weight of the vote will be equal for each mode. We will use five different strategies to create the models that will belong to the ensemble:
- Bootstrap aggregating (bagging)
- Asymmetric bagging 1
- Random subspace method (feature bagging)
- Asymmetric bagging + random subspace method (ABRS) 1
- Bootstrap aggregating + random subspace method (BRS)
Different strategies will be used depending on different things like: are the positive and negative training documents unbalanced? How many features does the model have? etc. Let’s introduce each of these different strategies.
Note that in this article I am only creating ensembles with linear SVM learners. However an ensemble can be composed of multiple different kind of learners, like SVM with non-linear kernels, decisions trees, etc. However, to simplify this article, we will stick to a single linear SVM with multiple different training corpuses and features.