K Nearest Neighbours (KNN)

Description

K-Nearest Neighbours is a machine learning classification algorithm used to determine the class membership of an unknown data point based on data points of known classes.

Simple Explanation

K nearest neighbours algorithm

Fig. 1. Scatter plot of hypothetical square class (green squares) and x class (blue crosses) on a 2-dimensional grid. An unknown data point (red circle) is collected and classified with KNN. Purple line indicates the classification boundary when k=3 nearest neighbours, and green line shows boundary of k=5 nearest neighbours.

The algorithm looks at k total number of nearest data points to the unknown to help identify its class membership. In the example Figure 1, when the k is set to 3, the unknown has 2 x neighbours, and 1 square neighbour, so the machine will classify the unknown as an x class. In contrast, expanding the k to 5 will result in only 2 x neighbours and 3 square neighbours, thus changing the unknown’s class to a square.

Parameters

ParameterDescription
k number of nearest neighbours to take into account when making a classification

MLGenius will help you determine the best k to use for your dataset

Additional Information

It is possible to define a distance function to customize the calculation of separation between two data points. You can do so by creating your own formula for each dimension or by weighing different distances differently. By default, MLGenius uses Euclidean Distance, which is the simple geometrical distance between two points. Please contact MLGenius if you wish to customize the distance function.

Pros and Cons

ProsCons
  • flexible to implement own distance function
  • can learn complex non-linear datasets without excess work
  • insensitive to outliers
  • computationally expensive
  • requires effort to determine a good distance function
  • poor performance against noisy data