One of the most important questions you will ask yourself about any Machine Learning project is this:
Does my model fit?
This question is being drilled into me more and more as I go deeper on my Machine Learning journey.
So how do you go about answering that question?
Well firstly let’s dispel any hope you had of ever answering yes; No your model will never ‘fit’.
In fact George Box first coined this phrase in 1976:
"All models are wrong, but some are useful"
So let’s revise our question:
How useful is my model?
Great. But how can we answer that question with any degree of certainty?
One of the most effective tools in your arsenal is Data Visualisation. Once you have trained your model you can visualise it, with the training data and predictions by plotting them on a graph. Of course, the kind of visualisation you use will differ depending on the type of model you used.
Once you have established your graph or visualisation, you can see much more clearly how your model performs on your data. It’s then possible to go back to your model and tune hyperparameters (like all the knobs on an old fashioned equalizer, hyperparameters are for fine-tuning your model).
You can do this iteratively until you’re satisfied your model as useful as you think it should be.
Data Visualisation is a crucial feedback loop
Using this technique is a cornerstone of good model making.
It’s an obvious point, but the more you practice and hone your skills in Data Visualisation, the more the quality of your models will improve.
There are many courses available dedicated to the art of Data Visualisation, in fact here’s an excellent article from freecodecamp that evaluates and ranks several popular courses:
Data Visualisation is not only useful when you’ve created your model though, it should be the first tool you use when starting a project:
You can use the visualisations to see where the natural divisions lay in your data. It’s even possible to construct visualisations for many dimensions of data, it’s just up to you to construct something that is capable of revealing correlations.
Use visualisation in every step when building a model
We might stop here for a second just to zoom out and appreciate the wider significance of Data Visualisation.
Being ‘data-driven’ in business is often a good thing. We can drive automation from all sorts of data, we can make better decisions as humans when presented with data.
The importance of having the tools to visualise all the data your business generates is increasing all the time.
Imagine for a moment, all the aspects of a business being ‘data-driven’, not automated, just ‘data-driven’. How does that look?
You can begin to work much smarter.
The power of visualisation is just that. We make smarter decisions when we can see data.
When it comes to tuning models there are a few other tricks up our sleeve. We can look at metrics like P and Adjusted R². But we’ll do that in another post.
For now I’m still enjoying my journey into Machine Learning, I hope you are too.
This article was originally published by Dan Neaves on Medium in his 'How to train a computer' series. See Data Visualization.