Why Support Vector Machines for Sentiment Analysis?

Harvey
2 min readMay 16, 2022

Example Language: Python

Defined by Jakkula (2006), a Support Vector Machine (SVM) is a classification and regression prediction tool that uses ML theory to maximize predictive accuracy while automatically avoiding overfitting to the data. SVM can be defined as systems which use hypothesis space of a linear functions in a high dimensional feature space, trained with a learning algorithm from optimization theory that implements a learning bias derived from statistical learning theory.

SVM functions by taking all data points and defining a ‘hyperplane’ to divide the classes. This line is termed the decision boundary. Observations can then be classified based off what side off the decision boundary they fall on.

Advantages:

  • Improved performance when there is a clear margin of separation between classes.
  • Effective in high dimensional spaces.

Disadvantages:

  • Not suitable for larger data sets.
  • Poor performance when the data set has a considerable amount of ‘noise’.
  • Overfitting can occur if the number of observations is less than the number of features.

The SVM model of classification works well for sentiment analysis and text categorization in general due to its ability to handle a high dimensional input space. SVM’s use overfitting protection, which does not necessarily depend on the number of features, thus they have the potential to handle these large feature spaces. The assumption that most the features are irrelevant in competing classifiers are their pitfall, as aggressive feature selection tends to result in a loss of information. Competing classifiers such as naïve bayes and logistic regression are ‘additive’ algorithms, meaning they are better suited for dense concepts rather than sparse document vectors. Finally, the sentiment analysis problem is linearly separable and the theory behind SVM’s is to find linear separators (Joachims 1998), deeming it highly suitable.

SVM’s core hyperparameter is the ‘Kernel’, the kernel is responsible for mapping the observations into some feature space, ideally making the observations more linearly separable after this transformation.

A ‘Linear’ kernel is best suited for most text classification problems as it thrives when dealing with large sparse data vectors, text classification problems tend to be linearly separable and it takes less training time compared to other kernel functions.

Linear Kernel Function

Implementation:

from sklearn import svmlinearClassifier = svm.SVC(kernel='linear').fit(x, y)

Bibliography:

Jakkula, V. 2006. Tutorial on Support Vector Machine (SVM). Available At: https://course.ccs.neu.edu/cs5100f11/resources/jakkula.pdf [Accessed: 7th March 2022].

Joachims, T. 1998. Text categorization with Support Vector Machines: Learning with many relevant features. Available At: https://link.springer.com/chapter/10.1007/BFb0026683 [Accessed: 8th March 2022].

--

--

Harvey

Data Scientist | UK | ‘The pursuit of knowledge is more valuable than its possession’ — Albert Einstein