Artifical Intelligence

6 Ways to Avoid Overfitting with Machine Learning Algorithms

The process of developing a machine learning model can be challenging and time-consuming. As such, it is paramount for data scientists to develop models that are stable and accurate. 

While not all models will work well on new data, there are steps that you can take to minimize overfitting with your chosen algorithm.

1 – Understand the Bias-Variance Tradeoff

A common way in which machine learning algorithms learn complex relationships between the input features in your dataset is by fitting a curve through these points. However, due to constraints in working memory or compute power, applying complex functions directly may result in an under-fitted model (i.e., one that does not capture all structures present in the training set). To compensate for this under-fitting, some kind of regularization is required. On the other hand, a model that is too simple may fail to capture significant structures in your dataset and can result in an over-fitted model. In such an instance, applying even more complex kinds of regularization will not correct this issue as those structures are indeed present in your training set.

As a rule of thumb, if you see that your accuracy on the validation set or test set keeps going up as you reduce your L2-norm (aka ridge regression coefficient), then you have likely found a good balance between bias and variance.

Also Read: Machine Learning: The Future of a Successful Business Culture

2 – Use Regularization for Your Algorithm

While it may be easy to interpret mean squared error or R2 scores as a quantitative measure of how good a model is, this can be deceiving. For example, in the case of regression, applying too much regularization leads to under fitting and poor performance on new data (data not used for training). On the other hand, when you don’t apply enough regularization, your model can become overly complex and prone to overfitting.

Therefore, it is important that you read the fine print for each algorithm and parameterize accordingly. From cross-validation scores or learning curves such as ROC curves and lift charts, select an appropriate measure of what makes a “good” model. This will give you some guidance on whether adding more weights or more observations to your dataset would lead to better results.

3 – Try Different Optimization Algorithms

The optimization algorithm that you choose can have a large impact on the final model. While gradient descent methods may work well for linear and logistic regression, they should be replaced by more advanced algorithms such as limited memory BFGS (L-BFGS) or conjugate gradient (CG). Luckily nowadays there are plenty of packages that allow you to quickly compare different algorithms using cross-validation scores. Depending on your specific problem, you might even be able to directly tune the learning rate hyper parameter during optimization.

4 – Try Different Loss Functions and Regularization Objectives

When performing classification, depending on whether you want to avoid false positives or false negatives, it is important to try out different loss functions or regularization objectives. For example, in multi-class classification using balanced binary and one-vs-all (OAA) methods will give you more control over the tradeoff between true positives and false positives.

Also Read: 7 steps you must know if you want to become a Machine Learning Engineer

5 – Train Different Models at the Same Time

Once you have found a good training set and model architecture, try running different kinds of models on your data at once. As an example, for text classification, it might be interesting to use both stacked sparse autoencoders with word embeddings as well as deep belief nets deep neural networks. Also, consider trying out different input features even though they may not seem necessary. It is important to note that you are not allowed to simply train each algorithm on the same set or your measures of goodness will be biased. However, for many problems, it is possible to train each algorithm on a fraction of your dataset that has been previously randomly sampled without replacement.

6 – Try Different Algorithms and Regularization Objectives Using the Feature Importance Metric

One of the most common tools used to assess which features are important in a regression or classification problem is called feature importance. It answers the question “How much would you have to change this feature before our model’s performance on an out-of-sample set starts declining?” Many algorithms do not provide such information directly and so it will be up to you as a practitioner to decide how important each independent variable is. However, if your algorithm does provide some the regularized loss then cross-validation scores can be used to come up with a feature importance metric.

Also Read: Top 15 Machine Learning Interview Questions & Answer

Conclusion:

Machine Learning is a vast field and there are so many pitfalls to avoid on your quest for greatness. The tips on how to choose the best hyper parameters should serve as a good starting point on your journey of discovery, but remember – never stop learning!

Related Articles