You are working as a data scientists and you have received data on house prices in the Boston region. The data set contains the following variables: • crim: per capita crime rate by town • zn: proportion of residential land zoned for lots over 25,000 sq.ft. • indus: proportion of non-retail business acres per town • chas: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) • nox: nitric oxides concentration • rm: average number of rooms per dwelling •age: proportion of owner-occupied units built prior to 1940 • dis: weighted distances to five Boston employment centers • rad: index of accessibility to radial highways • tax: full-value property-tax rate per $10,000 • ptratio: pupil-teacher ratio by town • b: 1000(Bk – 0.63)² where Bk is the proportion of blacks by town • Istat: % lower status of the population • medv: Median value of owner-occupied homes in $1000s Given this information: 1. Download the dataset boston.csv and open it as a PANDAS dataframe. 2. Using 'medv' as the response variable and per capita crime rate by town, proportion of owner-occupied units built prior to 1940, and nitric oxides concentration as predictors, fit a linear model (OLS), and a k-nearest neigherbour model (using the 5 nearest neighbour). Which one has better prediction properties using k-fold cross validation (k=5)? Explain why.

Database System Concepts

7th Edition

ISBN:9780078022159

Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan

Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan

Chapter1: Introduction

Section: Chapter Questions

Problem 1PE

See similar textbooks

Similar questions

SEE MORE QUESTIONS

Recommended textbooks for you

Database System Concepts

Computer Science

ISBN:

9780078022159

Author:

Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan

Publisher:

McGraw-Hill Education

Starting Out with Python (4th Edition)

Computer Science

ISBN:

9780134444321

Author:

Tony Gaddis

Publisher:

PEARSON

Digital Fundamentals (11th Edition)

Computer Science

ISBN:

9780132737968

Author:

Thomas L. Floyd

Publisher:

PEARSON

Database System Concepts

Computer Science

ISBN:

9780078022159

Author:

Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan

Publisher:

McGraw-Hill Education

Starting Out with Python (4th Edition)

Computer Science

ISBN:

9780134444321

Author:

Tony Gaddis

Publisher:

PEARSON

Digital Fundamentals (11th Edition)

Computer Science

ISBN:

9780132737968

Author:

Thomas L. Floyd

Publisher:

PEARSON

C How to Program (8th Edition)

Computer Science

ISBN:

9780133976892

Author:

Paul J. Deitel, Harvey Deitel

Publisher:

PEARSON

Database Systems: Design, Implementation, & Manag…

Computer Science

ISBN:

9781337627900

Author:

Carlos Coronel, Steven Morris

Publisher:

Cengage Learning

Programmable Logic Controllers

Computer Science

ISBN:

9780073373843

Author:

Frank D. Petruzella

Publisher:

McGraw-Hill Education

SEE MORE TEXTBOOKS