A Very, Very Hard Problem

By Brian Henn, Flyhomes Head of Data Science

What’s the right price for a home?

How do you know?

Historically, home value was decided 100% by people. In the age of big data, we increasingly use algorithms to help. So how close are we to having certainty about home value?

A huge win

Yesterday, the data science around home pricing improved as Zillow announced the winners of a long-running competition to boost its ability to forecast home values. The Zillow Prize, with $1.2 million in prizes and thousands of teams competing, was won by an international team of machine learning experts.

The team reduced the error between Zillow’s home value estimate (the “Zestimate”) and the actual sale price of a home by 13%, or from 4.5% to 4% on average.

For data scientists like me, this is a huge win.

In their press release, the Zillow Prize team said it represents “… a much larger margin than Zillow anticipated at the contest onset.” This work is truly impressive.

What half a percent means to a home buyer

A 13% improvement in error rate is a massive change from a data science perspective. For the average home buyer, the difference won’t be felt immediately, because there is still a lot of work to do.

In Seattle, for example, the median home price (according to Zillow) is $725,700 at the moment. A 0.5% absolute improvement changes the error from $32,657 to $29,028. This translates into a $3,600 difference, which is a lot of money; however, that still leaves nearly $30,000 in “error rate,” which is a significant margin.

My point here is that this is a really, really hard problem. This achievement was the work of over 3,800+ teams devoting hundreds of hours over a 2 year span. The press release acknowledged that automated home pricing tools will probably never be perfect, and as much as I’d like to see a perfect model, I agree.

Why this problem is so hard

Machine learning algorithms must be able to take into account the diverse range of homes and their conditions, and assess how those conditions will impact the price, using a limited set of information available through public records. Major aspects of the home’s value may not be captured by available information at all.

For example, public data regarding a home may include easily interpreted features such as the number of bedrooms and bathrooms, as well as features that are more difficult to interpret, like the type of heating system. Recent sales data for similar homes in the same area (comparative sales, or comps) are useful, and interpreting the differences between the comps is complex. In this area, algorithms and lots of data can help identify patterns that might not be obvious to the human eye.

In addition to the multitude of variables that need to be considered, one of the biggest challenges in pricing a home is the nature of the market. It only takes two people (a home seller and a home buyer) to agree on a final price. Unlike the securities market, where millions of shares trade hands on any given day, homes ultimately trade between two unique decision makers, who each make decisions based on their own rational (or irrational!) perspective. It’s not hard to imagine a buyer falling in love with a home and paying “too much” for it, or at least more than what an algorithm predicts.

A hybrid approach

At Flyhomes, we use a combination of machine learning tools and human experience to guide our home buying clients in making winning offers and our home sellers in maximizing the value of their homes.

Because people can’t process as much information as an algorithm can, and an algorithm doesn’t capture nuance the way a person can, we’ve seen the best pricing happen with a hybrid approach that starts with data science and then uses human judgement.

Machine learning approaches are valuable, even if many factors that influence the ultimate sales price of a home may never be fully captured by automated approaches. A big backyard may equate in one buyer’s mind to a certain amount of money, while another may rather avoid maintenance and want to pay less for the same home. But the data can also tell us what the average buyer might pay for that amenity.

I’m eager to see Zillow continue to improve and refine their price modeling, and leverage the wisdom of the crowd to tap into creative solutions to sharpen the accuracy of the Zestimate. That said, dealing with such individualized decisions is an incredibly hard problem and something an algorithm may never be able to do, which is why we’ve taken an approach to combine machine learning and personal expertise. Art, plus science.

A Very, Very Hard Problem

By Brian Henn, Flyhomes Head of Data Science

A huge win

What half a percent means to a home buyer

Why this problem is so hard

A hybrid approach

Related

Related

By Brian Henn, Flyhomes Head of Data Science

A huge win

What half a percent means to a home buyer

Why this problem is so hard

A hybrid approach

Share:

Related

Related