Predicted Highest Selling UK Electric Car Models in 2021

/Home/blog/

Whilst the traditional UK car market endured a torrid 2020 the EV market soared. Tax breaks for company cars, a greater focus on the environment, and a stream of new models all helped to push the market share of the UK’s BEV sales to record highs of 6.6% in 2020 [1].

But what about 2021? There are a large number of new models ready to enter the UK market jockeying for position with the models already on the market. So who will come out on top?

To make some predictions we have taken the Q3 2020 UK car sales data provided by the UK Government [2]. We’ve filtered out the EV data, cleaned it up a bit, and added some extra range and price data. Then by taking the new models’ estimated price and range (WLTP) we fed that into the statistical model and predict the best selling UK EVs of 2021.

Feel free to jump straight to the bottom of the page if you want to skip the methodological detail.

Method Overview

Arguably 2020 is the first year where the UK EV market has looked like a traditional market. There are: a range of models; many prices points; a mixture of purpose built EVs and EVs that are build on a ICE platforms; a decrease in compliance cars; and a loosening of supply constraints. As a result we are able to build prediction models and make inferences on existing data to make logical guesses about the future state of the UK EV market.

However, past performance is no guarantee of future returns. These are only predictions based on a couple of features – namely range and price. The raw predictions are interesting in themselves but there are plenty of nuances about the market which reveal themselves when you delve into the data in an attempt to design a fair and representative model of the UK’s EV market.

The details of the methods used and the assumptions made are below, but please skip to the results if you just want to see what we believe the UK’s best selling electric cars of 2021 will be.

TLDR: Overview of Methodology

Find, Clean and Pre-process Data

Firstly we need to ensure the data is in a suitable format so that it can be fed into the statistical models. This step also requires decision about outliers and incomplete data. For example, models with known sales constraints where sales do not span the entire time period of the data set, or there are known supply constraints.

Create Model and Optimise

With an eye on the underlying data we need to select a model which is able to best handle that data. We are not using a very large data set so any deviations from using simple regression models have to be explicitly stated. By feeding the original data back into the model the model itself can be tuned so as to be representative of the key features of the existing EV market without simply being an exact copy and therefore unable to generalise to new EVs.

Apply to Predicted Data

Once the model has been optimised the data on new EVs can be fed into the new model. As stated, the existing EV dataset is not large so we have to be careful about how far we take the predictions. For example, predicting exact sales numbers is beyond the scope of this model, whereas predicting relative sales (i.e. where models will rank) is a more valid output.

The Data and Pre-processing

The Department for Transport (DfT) and the Driver Vehicle Licencing Agency (DVLA) release a quarterly breakdowns of the UK vehicle statistics [2]. Many of the documents describe different breakdowns of registered vehicles each quarter. We will use registrations as a proxy for sales. For this study we have used the Q3 2020 statistics as they are the latest available (Published 9th Dec 2020).

To this data the quoted range of each variant was added. Although these quoted WLTP ranges are rarely realistic the data needs to be consistent between models therefore the WLTP test cycle has been used for all variants. Similarly in order to be consistent between EV models the prices entered into the data will include the government plug-in grant. This data was gathered between the 11th and 15th January 2021.

This data was predominantly gathered from the manufacturers own website and brochures. If this data was not easily apparent then other sources like EV comparison sites were used.

Data Consolidation and Dimensionality Reduction – Sales, Max Range, Min Price

The specific document we used – veh0160.ods – has a registration/sales breakdown by model variant – e.g. Renault Zoe Iconic/Iconic ZE 50/Play/GT/GT 50 instead of just Renault Zoe. So this data is very granular, in fact it is too granular. Therefore, each variant has been grouped into just its model.

This, however, causes an issue. Each variant has a fixed range and base price (Model 3 Standard Range Plus has 278 miles WLTP range and a base price of £40,490). But by grouping variants into each model we have a range of ranges and prices (Model 3 = 278 – 360 miles and £40,490 – £56,490). So do we base the prediction model on the min or max range, min or max price, the spread or skew?

The answer is we based the predictive model on the maximum range and the minimum price. This may seem counter-intuitive since the car at the minimum price is highly unlikely to have the maximum range. But that is the data which we will feed into the predictive model. When marketing a new EV car companies will predominantly lead with the maximum range and minimum price. Therefore to have representative predictions we need to base the model on the most similar data.

Dealing with Outliers

Partial or Low Sales Data

Not all EVs which appear in the Q3 2020 data were sold for the entire duration of the quarter. Furthermore some EVs were still in the ramp up phase of being introduced into manufacture. Therefore their sales figures are not as representative of true demand relative to other vehicles. These EVs were removed from the data set, such as: VW ID3, Peugeot 2008 EV, Polestar 2 EV.

Other vehicles also suffered from production constraints and again their sale are not representative of demand, such as: VW e-up, Seat Mii, and the Skoda Citigo-e.

How to Handle the Tesla Model 3?

In many predictive modelling outliers are frequently removed, ignored, or their significance reduced. The Tesla Model 3 is a significant outlier in our dataset. In Q3 2020 sales were nearly double its nearest rival the Renault Zoe (5988 vs 3069). As a result it means the predictive model is at risk of being heavily influenced by the Model 3. But the Model 3’s influence on the EV market is highly representative of huge demand for EVs. Therefore, it must be suitably factored in somehow.

This is the chief reason why the chosen predictive model is not a simple multiple linear regression and requires additional assumptions to incorporate it into the model. However, we must not overfit the data to the presence of the Model 3.

Left to its own devices an unsupervised machine learning algorithm would put so much weight on the Model 3 that only EVs with near identical prices and ranges would have comparable sales. This would mask any underlying trends.

Data Visualisation

Before making any predictions it is always useful to visualise the data. The figure below shows the spread of data on a scatter plot, with the sales data colour coded.

There is a vague correlation between range and price with the bulk of the lower priced electric cars in the bottom left quadrant and the more expensive in the top right quadrant.

The Tesla Model 3 can be clearly seen in the bottom right quadrant as the lone yellow data point. Thereby occupying a unique spot of having nearly 6000 Q3 2020 sales with an appreciably higher range for its price.

Visualisation of electric car Q3 2020 data for use in predictive model — Visualisation of UK EV Q3 2020 Sales Data

Design of the Prediction Model and Optimisation

It is worth repeating that the data is quite sparse so unfortunately we can’t just chuck it into a AI or machine learning algorithm. Therefore, we do need to understand the choice of model if we want to extract meaningful conclusions.

The simplest model to start off with is a linear regression especially as we are using continuous data. This will show the broad direction of the market when looking solely at range and price.

The tricky bit is how to incorporate the non-linearities due to the Model 3. For this we have again take into account the sparsity of the data. A k-nearest neighbour (KNN) algorithm will allow us to do this, but it cannot be too closely aligned to the current data set. Therefore, the we need to broaden that model and optimise the KNN parameters. This results in a trade-off between the precision of retrofitting the model to the current data and the generality of the model to predict future market success. The KNN algorithm has been chosen as it will take into account individual data points and can be tuned to balance precision and generality. With more data we should be able to move away from the KNN algorithm in the future as more continuous algorithms will be suitable.

Predictions

Once we had a prediction model we were happy with we created a database of upcoming EVs and plugged their data into the model. A list of new EVs coming to market was taken from [3] as well as the predicted month that they will be released. The max range and min price of each of these models were then scraped from a variety of sources – manufacture’s websites, brochures, press releases, comparison sites, and news articles. Obviously these are likely to change but for now it is a good approximation. As described above it may seem counter-intuitive to look at the max range and min price but that’s the data we have got to work with to make predictions at the moment.

Also the EVs which were removed from the initial data due to their sales not spanning the entire Q3 2020 duration can be reintroduced.

The prediction model itself will output an estimate of sales, however, this is not a reliable estimate until we are able to test these values with the Q4 2020 data. We are able, however, to provide a relative measure of sales strength/demand by ranking the each model relative to one another.

There is one final calculation that must be applied to the prediction to take account of the month each electric car becomes available. Therefore, the predicted sales needs to be scaled depending on the release date, i.e. if the release date is July then there will be no sales January to June and therefore the estimated sales must be halved.

Results

Tesla Model 3
Volkswagen ID3
Hyundai Kona
Polestar 2
Kia eNiro
Kia Soul EV
Renault Zoe
Nissan Leaf
Mustang Mach-E (new 2021)
Peugeot 208 EV
Fiat 500e (new 2021)
Mercedes EQC
Audi eTron
BMW i3
Seat El-Born (new 2021)

Discussion Points

Model 3 is currently dominant in the UK market place and is also predicted to dominate 2021. The Volkswagen ID3 comes in in second place. This result shows the model is working well even through the Volkswagen ID3 data is not in used to create prediction model the December 2020 SMMT figures show that the Tesla Model 3 and VW ID3 are in first and forth place in the overall UK car market (first and second place for EV only) [1]. Aligning closely to what we have predicted above.

Kia Soul EV is predicted to outpaced its 2020 sales rank, however, it is well documented that the Kia Soul EV production has been restricted to favour the eNiro [4]. So the Soul’s prediction is based on an unrestricted supply and unlikely meet our estimation unless the supply restrictions are lifted.

Of the new entrants into the 2021 market the Mustang Mach-E is the highest place, followed by the Fiat 500e and the Seat El-Born.

This prediction model suggests that range is more important than price at this moment. This is due to the presence of the Model 3. As more EVs appear in the range/price space near the Model 3 then this hypothesis can be tested further. It makes qualitative sense that a 500 mile range £20,000 car would out sell a Tesla Model 3 but what about a car that is £10,000 more expensive and 50 more miles?

Conclusion

This model is nowhere near perfect for accurate predictions of sales numbers butwe believe it is suitable for estimating the relative sales strength between models. It will be interesting to compare these results to the final results for the whole year.

As more data rolls in it will be possible to improve the data quarter by quarter. The algorithms used can be updated, swapped, and optimised. For example, as much as we try, the use of KNN algorithm will overfit and may bias the towards similar cars to the Tesla Model 3. However, that car is so much more popular than the next it is difficult to incorporate it without skewing the data.

The current model is only based on sales, maximum range and minimum price for a given model. It doesn’t cover brand awareness, supply restrictions, car type and a whole host of other predictors which could be used to improve its accuracy. We hope to update this model as more data appears.

References

Jupyter Notebook Code

Green Finance Guide

Green Finance Guide is a source of green financial, renewable energy and climate change news and insight

To stay up to date follow us on Twitter