How I Used Machine Learning To Price Used Cars

Andrew Carman is a software engineer at Shift.

At the core of any market is pricing. If you get pricing wrong, your marketplace doesn’t work. Our success at Shift—as a marketplace that makes it easy for anyone to buy and sell used cars—hinges on accurately and competitively pricing cars. 

Dealerships traditionally have a team of experts that price trade-ins for customers and manage the prices of their vehicles. They haggle over those prices with buyers that come to their dealerships, and often take advantage of consumers who have less access to car price data.

Our goal is to make selling and buying a car a fun, fair, and accessible experience by using technology to disintermediate what car dealerships do poorly. So we build software instead of back-office sales teams to price our cars, which increases efficiency, levels the playing field for our customers, and lets us systematically improve the accuracy of our pricing over time.

Read on to see exactly how we did it, and pass this along to the folks who might be interested in our approach.

The Challenge

There’s a reason traditional dealerships price cars by hand. It’s hard! Used cars are unique, and there are a lot of factors to take into account when pricing. There are three major reasons that pricing cars is hard: data is sparse, the data that exists is unstructured and messy, and car purchases are big and infrequent.

In the used car market, price information is hard to find, particularly for consumers. There is no place you can look up a list of recent sales like for houses. How much cars are listed for exists, but it’s still hard to get, messy to work with, and doesn’t reflect the haggling and incentives most dealerships offer. Additionally, there’s no standardization in the data including how options are described to how model and trim names are advertised.

Even with perfect listing data, it would still be very sparse. There are around 3 million used cars listed for sale in the US at any time, but there are over 100,000 variations of cars built since 2000. And that doesn’t count different colors or options, let alone the unique history of each car. It’s quite possible when selling your car that it is only one of a handful for sale in the whole country just like it.

Lastly, cars aren’t just a commodity; they’re an emotionally charged part of people’s lives. People name their cars. Two customers might place different value on the same options. And since there’s no transparency in how much a particular option is worth, it means that dealers have an incentive to hold out for a buyer willing to pay an irrationally high price.

How To Price A 2012 Honda Civic

Let’s take a look at how pricing works for one model and year: the 2012 Honda Civic. If you look at the graph below of list prices, there is a huge range in how much they’re going for. As much as a $5000 difference for a single trim!

Now let’s look at the same vehicles, but add the odometer information and limit our sample to just a single trim to see how it affects pricing.

[r^2 = 0.28]

Notice that while the odometer is definitely correlated with price, it only explains a small part of the variation that you see. You can think of this line as a simple linear regression with a single feature, odometer. If we add additional features to this regression, like transmission, color, age of the listings, and location, our model still only produces an r^2 of 0.4. Meaning this basic model only explains 40% of the variations we see in list prices. That’s a noisy data set! The standard error on the model is $1000. So if we used this model in production, our prices would be off more than $1000 a third of the time. And this is for one of the most popular cars out there.

For a less common car like a 2012 Mercedes-Benz CL-Class where fewer than 100 are available for sale in the entire country, there is a price variation of almost $50,000!

So, pricing is a tricky problem, but one we’re hugely invested in solving here at Shift. Let’s dive into how we’re solving it.

Bringing Machine Learning to Pricing

Given how hard and how manual it is to price a used car, how do you even get started building technology to make this better? As the truism goes, we started with a solution that didn’t scale. Initially, we hired in-house experts who knew used car pricing in and out and used third-party pricing tools. From that knowledge base, we began to build our models; collaborating closely with our experts every step of the way. The work we’ve done is proprietary and I’m not going to reveal all our secrets (come talk to us if you want to know more), but here’s a high-level overview.

The first thing we did was build out a clean way to represent a vehicle. What year is it? What is the Make? Model? Trim? Clean data is required for any type of machine learning, so we started with the unglamorous task of cleaning up how vehicles are represented in our systems. Different data sources use different names for the exact same vehicle, so we needed to normalize everything against a single ontology.

Next, we use our cleaned market data to train machine learning models to predict list prices. When you submit a car for a quote on shift.com, we run models that look at cars currently listed on the market to determine how much yours is worth. Our algorithm starts by predicting prices for a more generic version of your car, and then we use a series of proprietary adjustments to account for the history, condition, options, and color of your specific vehicle. This lets us address the issue of sparse data.

We also use models to predict the time to sale and expected sale price of your vehicle based on our historical demand and sale data for cars like yours. Because there’s so much uncertainty in the market, we can’t tell you exactly how much your car will sell for, but we can use these models to provide you with as much transparency as possible.


Used car pricing is an incredibly complex problem. We’ve laid a strong foundation, but we have a lot of exciting work ahead of us, including incorporating more advanced modeling (neural nets anyone?), feature engineering, NLP for data mapping, and data pipeline and tooling upgrades.

Pricing is a crucial component of our business and a highly impactful problem to solve for customers. Dealerships thrive on the opaque, information imbalance in used car pricing. But at Shift, we use engineering and machine learning to level the playing field.

If building algorithmic models to bring fairness and transparency to used car pricing sounds interesting to you, come work with us!

Comments

comments

Author: Andrew Carman

After growing up in the Bay Area and studying CS and Math at Harvey Mudd College, Andrew dove into startups. He's been an early engineer at three small startups, including Shift. One them was acquired by Jawbone where he built a streaming data platform to deliver millions of individualized health insights a day for their wearable product. As the second engineer at Shift, he's been involved in many projects, but most recently he led the creation of the pricing engine, auction systems, and inventory management tools.