## Register

✆+91-9916812177 | contact@beingdatum.com

# Getting Friendly with SVM Algorithm

In this blog I will discuss about SVM algorithm so that you get friendly with frequently asked interview question.If you are planning to take some interviews in coming days, then this article can be handy resources for you.

Introduction:

Support Vector Machine (SVM) is a very powerful and flexible ml model, capable of performing linear or nonlinear classification, regression, and even outlier detection. It is one of the most popular models in Machine Learning , and anyone interested in ML should have it in their toolbox. SVMs are particularly well suited for classification of complex but small or medium sized datasets.

Q1. What the mathematics behind svm algorithm ?

SVM is a discriminative classifier formally defined by a separating hyperplane. In simple word, given labeled training data, the algorithm outputs an optimal hyperplane which categorizes new examples.Let see SVM hyperplane graphically.

In the graph, there clearly exists two classes. One class is represented with blue circles and another class with red squares. Now we have to find a line that seperates the two classes. As you can see there as many lines that seperates the two classes, so what is the optimal line in this case?

Basically a line is bad if it passes too close to the points because it will be noise sensitive and it will not generalize correctly. Therefore, our goal should be to find the line passing as far as possible from all points and finding the hyperplane that gives the largest minimum distance to the training examples.See below graph representing optimum hyperplane

So the ideology behind SVM is finding a hyperplane that best separates the features into different domains.

Q2 What’s the “kernel trick” and how is it useful?

The Kernel trick involves kernel functions that can enable in higher-dimension spaces without explicitly calculating the coordinates of points within that dimension: instead, kernel functions compute the inner products between the images of all pairs of data in a feature space.Kernel trick enables us effectively run algorithms in a high-dimensional space with lower-dimensional data.

Q3. What is Soft and Hard Margin Classification ?

In Hard margin classification we assume data is linearly separable and and we strict our instances be off the “street” and on the correct side of the line.So hard margin SVM often overfits to a particular dataset and thus can not generalize. Even in-case of a linearly separable dataset outliers well within the boundaries can influence the margin.
So in Soft Margin we avoid these issues by  preferably using more flexible margin and find a good balance between keeping the street as large as possible and limiting the margin violation. We use a fuction that allows to skip few outliers and be able to classify almost linearly separable points by utilizing slack variable ( ξ ).

Q4. How can I avoid over-fitting in SVM and what is the purpose for using slack variable in SVM?

In general their are variety of reasons for model over-fitting, so no exact master solution to tackle overfitting. But in SVM we can avoid overfitting by properly tuning the margin controlling parameter “λ”  and the kernel’s hyperparameters. We can try reduce the missclasification penalty, usually known as “C” in SVM.
Slack Variable in SVM allows trade off between learning “simple” functions and fitting the data exactly. Without slack variables the SVM would be forced into always fitting the data exactly and would often overfit data.

I hope this document provides you the good idea on SVM algorithm and gettiing you prepared for interviews.Keep learning and use comment section for any queries or doubts.

February 24, 2020