In the simplest terms, real estate appraisers evaluate the fair market value of a property — one number that aggregates the value of a wide array of the property’s features, both quantitative (number of bedrooms and bathrooms, square footage, lot size, number of garages, etc.) and qualitative (views, street scene, location, etc.). Right?

One way to approximate this value is using the regression method, which is a data analysis tool for studying the relationship between a dependent variable (in this case, property value) and feature predictors (such as number of bathrooms or gross living area). This article aims to show how regression can increase appraisal accuracy by focusing on the two most common types: Simple vs. Multiple Linear Regression.

**Tackling Simple Linear Regression**

Simple linear regression (SLR) is a straightforward regression approach using only one predictor and one dependent variable. For example, if we believe that we can use living area as the sole predictor to estimate a property’s value and that the association between living area and the value of a property is linearly related, then we could use a simple linear regression to estimate the value.

A predictive structure of SLR can be expressed as: *prediction* = *m + b * feature*

In this case, *feature *is the independent variable and *prediction* is the response variable (think back to y = m + bx from high school algebra). Here “m” and “b” are the “intercept” and “slope,” respectively, of a graph made from this equation. The intercept is where the *prediction *would be if the *feature* is zero (if the subject property doesn’t have a garage for example). The slope is the increase or decrease in the property value for each unit change in the *feature* (e.g. how much the property’s value changes for every square foot added or subtracted).

Consider the following example. We want to forecast the value of a single-family-residence (SFR) and we know its basic information such as living area, number of beds, and number of baths. Using HouseCanary’s pre-computed comps analysis, we identify the five most similar properties sold within the last six months, as shown in the table.

The figures below show the relationship between the property prices and either the living area or the number of baths of each of the properties. In each graph, the point where the vertical brown line touches the horizontal axis shows the subject property’s value for the feature in question (3,137 for living area and 3.0 for bathrooms). The blue line is the one-predictor linear regression line, which shows the predictive relationship between the property price and the appropriate feature.The similarity is determined by property types (SFR or others), property features, and the geographical distances between those properties and the subject property. Since each property has the same number of bedrooms, this feature has no impact on the price among these properties other than as a fixed constant. The price variation, however, is clearly associated with living area and the number of baths.

** **

In plot 1, for example, the forecasting relationship between price and living area is expressed as:

*price =200,323 + 147 * living area*.

For the subject property, which measures 3,137 square feet, this means its predicted price is $661,462: $661,462 = ($200,323 + $147 * 3137)

This is shown as the solid blue dot in the left graph. The “slope” of the regression line, 147, measures the unit price value, i.e., the price adjustment for each additional square-foot. However, if we use number of bathrooms as our predictor instead, the target property’s predicted price is $553,000 (-$1,241,000 + $598,000 * 3.0 = $553,000). This discrepancy in predicted price ($661,462 and $553,000) is common when using simple linear regression formulas for different features of the same property.

The actual price for the target property is $607,000, which is shown as a brown star in each graph. Consequently, the errors for both estimates are nearly identical, albeit in the opposite direction, whether expressed in raw dollars ($607,000 - $661,462 = -$54,462 and $607,000 - $553,000 = $54,000, respectively) or as a percentage (-$54,462/$607,000 = - 9% and $54,000/$607,000 = 9%, respectively).

**Going Further With Multiple Linear Regression**

Using simple linear regression, an appraiser can use the most important predictor of a property’s value to make his/her appraisal analysis (or even use SLR analysis on multiple features to get multiple reference points, like we did in the example above). But what if the most important predictor isn’t clear? What if, for example, all of the three bedroom units in a given analysis have a different number of bathrooms, varying gross living areas, or only a couple of them have pools? By using multiple linear regression (MLR) appraisers can compare the effect that multiple predictors have on a property value with a single calculation.

A predictive structure of MLR can be expressed as:

*prediction = m + b1 * feature 1 + b2 * feature 2 + b3 * feature 3+ and so on...*

For example, with the data above, we can regress *price* by both *living area* and *number of bathrooms*: *price = -$1,309,770 - $125 * living area + $744,918 * baths*** **

If we plug-in the target property’s feature values, the predicted price is $532,859, which yields an error of $74,141 or 12%. The error of this prediction is worse than those errors considered in SLRs, but don’t worry, there is an explanation in the data (and a solution that follows!).

First, we must consider the relationships between the predictors themselves and understand what information the letters in the equation above convey, which is not as straightforward in this case as it was when using SLR. For instance, the coefficient -$125 for the *living area* says that, **for a fixed **** number of baths**, increasing

*living area*by 1 square foot actually decreases the property price by $125 — which seems unnatural! However, if we look at the table above, we can see that for 3.5 bathrooms, there are only two properties. The one with 3,101 square feet is actually more expensive ($951,000) than the property with 4,024 square feet. For the three properties with 3.0 bathrooms, the living areas (2690, 3080, 3155) don’t correlate exactly with the prices ($521,000, $451,000, $687,000) either.

This means that, in our data, we have a different linear relationship between the response *price* and the predictor *living area *for the properties with a different *number of bathrooms*. Thus, in the graph above, the black line (the regression line with baths* = *3.0) and the red line (the regression line with baths = 3.5) actually show opposite trends. In such a case, both predictors are not purely additive against the price — an increase of one does not immediately imply an increase in the price. The remedy to this is to add another term, *living area * number of baths*, which represents the **interaction** of the predictors.

For our data, the predictive relation would be expressed as:

*price = *-$9,590,000 + $2,552 * *living area +* 3,202,000 **baths* – 791 * *living area * baths*

This yields a target property prediction of $577,523, which has an error of $29,477 ($607,000 - $577,523 = $29,477) and a relative error of 5%, which is the best fit so far!

**When to Use Simple Linear Regression vs. Multiple Linear Regression**

When appraising a property or determining the value of a particular feature, appraisers can use regression analysis to derive the most accurate price adjustments by taking into account the effect of multiple features on a property’s value. New industry regulations requiring data-driven justification for appraisal decisions may make regression analysis even more necessary for appraisers in the near future.

Keep in mind that it may not always be possible to use a multiple linear regression analysis. If, for example, there isn’t a varied array of data points available for a property and its comparables, there might not be enough information to conduct an MLR analysis. But, when there is enough available data to perform an MLR analysis, it often produces the most accurate results. Remember that there are times (such as when predictors are not purely additive against the price) when an MLR analysis will require an additional term that represents the interaction of the predictors to be most accurate, such as in the example above. While regression analysis clearly doesn’t replace an appraiser’s expertise — it can be a valuable complement to it.

We hope you will find these insights useful for your appraisals.

Note: Coming soon, residential appraisers will be able to price adjustments using regression models built into HouseCanary’s appraisal software.