Key Word(s): Knn , Knn Regression , MSE , Data Plotting

Title : ¶

Exercise: Simple kNN Regression

Description : ¶

The goal of this exercise is to re-create the plots given below. You would have come across these graphs in the lecture as well.

knn at home homework

Data Description: ¶

Instructions: ¶.

Part 1: KNN by hand for k=1

  • Read the Advertisement data.
  • Get a subset of the data from row 5 to row 13.
  • Apply the kNN algorithm by hand and plot the first graph as given above.

Part 2: Using sklearn package

  • Read the Advertisement dataset.
  • Split the data into train and test sets using the train_test_split() function.
  • Set k_list as the possible k values ranging from 1 to 70.
  • Use sklearn KNearestNeighbors() to fit train data.
  • Predict on the test data.
  • Use the helper code to get the second plot above for k=1,10,70.

Hints: ¶

np.argsort() Returns the indices that would sort an array.

df.iloc[] Returns a subset of the dataframe that is contained in the column range passed as the argument.

plt.plot() Plot y versus x as lines and/or markers.

df.values Returns a Numpy representation of the DataFrame.

pd.idxmin() Returns index of the first occurrence of minimum over requested axis.

np.min() Returns the minimum along a given axis.

np.max() Returns the maximum along a given axis.

model.fit() Fit the k-nearest neighbors regressor from the training dataset.

model.predict() Predict the target for the provided data.

np.zeros() Returns a new array of given shape and type, filled with zeros.

train_test_split(X,y) Split arrays or matrices into random train and test subsets.

np.linspace() Returns evenly spaced numbers over a specified interval.

KNeighborsRegressor(n_neighbors=k_value) Regression-based on k-nearest neighbors.

Note: This exercise is auto-graded, hence please remember to set all the parameters to the values mentioned in the scaffold before marking.

Part 1: KNN by hand for $k=1$ ¶

Plotting the data ¶, part 2: knn for $k\ge1$ using sklearn ¶, ⏸ in the plotting code above, re-run ax.plot(x_train, y_train,'x',label='train',color='k') with x_test and y_test instead. according to you, which k value is the best and why ¶.

Python Tutorial

File handling, python modules, python numpy, python pandas, python matplotlib, python scipy, machine learning, python mysql, python mongodb, python reference, module reference, python how to, python examples, machine learning - k-nearest neighbors (knn).

On this page, W3schools.com collaborates with NYC Data Science Academy , to deliver digital training content to our students.

KNN is a simple, supervised machine learning (ML) algorithm that can be used for classification or regression tasks - and is also frequently used in missing value imputation. It is based on the idea that the observations closest to a given data point are the most "similar" observations in a data set, and we can therefore classify unforeseen points based on the values of the closest existing points. By choosing K , the user can select the number of nearby observations to use in the algorithm.

Here, we will show you how to implement the KNN algorithm for classification, and show how different values of K affect the results.

How does it work?

K is the number of nearest neighbors to use. For classification, a majority vote is used to determined which class a new observation should fall into. Larger values of K are often more robust to outliers and produce more stable decision boundaries than very small values ( K=3 would be better than K=1 , which might produce undesirable results.

Start by visualizing some data points:

knn at home homework

ADVERTISEMENT

Now we fit the KNN algorithm with K=1:

from sklearn.neighbors import KNeighborsClassifier data = list(zip(x, y)) knn = KNeighborsClassifier(n_neighbors=1) knn.fit(data, classes)

And use it to classify a new data point:

knn at home homework

Now we do the same thing, but with a higher K value which changes the prediction:

knn at home homework

Example Explained

Import the modules you need.

You can learn about the Matplotlib module in our "Matplotlib Tutorial .

scikit-learn is a popular library for machine learning in Python.

import matplotlib.pyplot as plt from sklearn.neighbors import KNeighborsClassifier

Create arrays that resemble variables in a dataset. We have two input features ( x and y ) and then a target class ( class ). The input features that are pre-labeled with our target class will be used to predict the class of new data. Note that while we only use two input features here, this method will work with any number of variables:

x = [4, 5, 10, 4, 3, 11, 14 , 8, 10, 12] y = [21, 19, 24, 17, 16, 25, 24, 22, 21, 21] classes = [0, 0, 1, 0, 0, 1, 1, 0, 1, 1]

Turn the input features into a set of points:

data = list(zip(x, y)) print(data)

[(4, 21), (5, 19), (10, 24), (4, 17), (3, 16), (11, 25), (14, 24), (8, 22), (10, 21), (12, 21)]

Using the input features and target class, we fit a KNN model on the model using 1 nearest neighbor:

knn = KNeighborsClassifier(n_neighbors=1) knn.fit(data, classes)

Then, we can use the same KNN object to predict the class of new, unforeseen data points. First we create new x and y features, and then call knn.predict() on the new data point to get a class of 0 or 1:

new_x = 8 new_y = 21 new_point = [(new_x, new_y)] prediction = knn.predict(new_point) print(prediction)

When we plot all the data along with the new point and class, we can see it's been labeled blue with the 1 class. The text annotation is just to highlight the location of the new point:

plt.scatter(x + [new_x], y + [new_y], c=classes + [prediction[0]]) plt.text(x=new_x-1.7, y=new_y-0.7, s=f"new point, class: {prediction[0]}") plt.show()

However, when we changes the number of neighbors to 5, the number of points used to classify our new point changes. As a result, so does the classification of the new point:

knn = KNeighborsClassifier(n_neighbors=5) knn.fit(data, classes) prediction = knn.predict(new_point) print(prediction)

When we plot the class of the new point along with the older points, we note that the color has changed based on the associated class label:

Get Certified

COLOR PICKER

colorpicker

Report Error

If you want to report an error, or if you want to make a suggestion, do not hesitate to send us an e-mail:

[email protected]

Top Tutorials

Top references, top examples, get certified.

IMAGES

  1. An Introduction to k-Nearest Neighbors in Machine Learning

    knn at home homework

  2. HW02 Sol

    knn at home homework

  3. Decisiontree knn homework

    knn at home homework

  4. KNN Idiomas lança ambiente virtual para se adaptar à pandemia

    knn at home homework

  5. K-Nearest Neighbor Classifier

    knn at home homework

  6. homework 01 knn dt.pdf

    knn at home homework

VIDEO

  1. Assassin's Creed Revelations: Desmond's New Reality

  2. Miniture Crispy Patato Recipe//Tiny Food Recipes//Padma's Miniture Kitchen

  3. Breaking !! Imran Khan Granted Bail in Cypher Case

  4. ONSITE IBADAH RAYA

  5. Iron Man vs Spider Man New Home, Spider Man No Way Home, Spider Man Miles Morales Funny Animation

  6. 9 Elabana Street, LOGAN CENTRAL, Queensland

COMMENTS

  1. Algorithms From Scratch: K-Nearest Neighbors

    A non-parametric algorithm capable of performing Classification and Regression; Thomas Cover, a professor at Stanford University, first proposed the idea of K-Nearest Neighbors algorithm in 1967. Many often refer to the K-NN as a lazy learner or a type of instance based learner since all computation is deferred until function evaluation.

  2. Harvard CS109A

    Instructions: Part 1: KNN by hand for k=1. Read the Advertisement data. Get a subset of the data from row 5 to row 13. Apply the kNN algorithm by hand and plot the first graph as given above. Part 2: Using sklearn package. Read the Advertisement dataset. Split the data into train and test sets using the train_test_split () function.

  3. Python Machine Learning

    KNN. KNN is a simple, supervised machine learning (ML) algorithm that can be used for classification or regression tasks - and is also frequently used in missing value imputation. It is based on the idea that the observations closest to a given data point are the most "similar" observations in a data set, and we can therefore classify ...