REGRESSION PROJECT Background: real estate transactions - You must analyze the data set, described below,
REGRESSION PROJECT
Background: real estate transactions
- You must analyze the data set, described below, and also submit a set of predictions. Your grade will be based on the write-up, with extra points added based on how close your predictions are to the actual values (details below). The report is expected to be $5-10$ pages in length. The data set used in this project was gathered from the records of a real estate office in Springfield, home to the famous television family The Simpsons. A picture of the town is shown below.
- To understand the data set and the project, it is necessary to understand how houses are bought and sold in Springfield and the rest of the United States.
- When an individual or a family wishes to sell their house, they often choose to get help from a real estate agent. (This is true in the majority of cases but there is a substantial minority of cases in which individuals or families sell their house without using a real estate agent.) The agent helps the family decide on a price for the house called the list price. The family signs a contract (called a listing agreement) with the real estate agent. The contract says that if the agent finds a buyer for the house who will pay the list price, then the family will sell the house at that price. The actual price when the house is sold is called the sale price. Usually the sale price is less than the list price, but not always.
- An important part of the real estate agent's job is to help fix the list price. The agent looks carefully at the characteristics of the house, including its size, location, age, the amount of property that comes with the house (called the lot size), the number of bedrooms and bathrooms, and whether the house has various desirable features. These features include items like a basement, garage, and fireplace, and appliances that are sold with the house like a dishwasher, garbage disposer, and oven. In fixing the list price the agent considers the prices of similar houses that have been sold recently.
- The real estate agent provides all of this information for a large book that is shared by real estate agents. Each page describes one house, including the characteristics that are important in establishing the list price, along with the list price. Then real estate agents who are helping families who want to buy a house can use this information to help guide the prospective buyers
to the house that might be right for them. The data for this project was gathered from several of these books
- Background: House pricing models. The characteristics of a house strongly affect the sale price. The direction of the effect of most of these characteristics is obvious: larger houses sell for more than smaller houses, houses with more bedrooms and bathrooms sell for more than houses with fewer bedrooms than bathrooms, the presence of a garage tends to raise the sale price of a house, and so on. On the other hand, these features by no means provide a perfect prediction o the sale price of a house. In part, this is because some features are not recorded systematically by real estate agents.
- For example, whether there is a busy and noisy street in front of the house will matter, whether the house has been kept in good repair is important, and so on. (In the data set that you will be using, there is no information about location, which is usually quite important. For example, the location of the house determines what public school any children living there will attend, and some schools are regarded as better than other schools.)
- Even if there were a complete list of all the features there still would not be a perfect description of the price, because the price is also affected by who buys the house. Houses and buyers are all different. If a family looks at a house that has just been advertised for sale and that house is "just right" for them, it may well sell for the list price or even a little higher. On the other hand, if the house has been for sale for a long time and a family sees that the house will meet their needs only with some changes then it may well sell for quite a bit less than the list price.
INSTRUCTIONS
-
Analyze the Data Set
As discussed previously, You must analyze the data set, described below, and also submit a set of predictions. Your grade will be based on the write-up, with extra points added based on how close your predictions are to the actual values (details below). The report is expected to be 5-10 pages in length. -
Project data:
- The Excel spreadsheet for this project is called bldmodelhousedata.xls. The spreadsheet has 301 rows and 15 columns. There is a data description at the end of this assignment, describing each of the 15 columns. There is also a file containing 84 additional observations called validationhouse.xls which you will also need. - Scope:
- Imagine that you work for a statistical consulting firm that has been asked to create and estimate a hedonic pricing model for Springfield. This will be a regression equation with house sales price as the dependent variable, and characteristics of the house as regressors.
- Remember that the objective of the model is to predict the selling prices of houses with the same mix of characteristics shown in the data set. There are many models that could be set up. There is more to be decided here than just the variables included on the right side of the equation. You will also need to think about the functional form of the equation. (Example: Should the dependent variable be expressed in terms of dollars, in terms of logarithms, or something else? Will this decision in turn affect the functional form for the regressors?)
You will need to think about all of the technical problems that can arise in regression. For this data set multicollinearity, heteroscedasticity, and outlying observations may be problems. You will probably wish to carry out some hypothesis tests as part of your work; these will be of dubious value if the assumptions of the normal multiple linear regression model are badly violated.
Deliverable: Word Document
