Skip to content

Fast k-nearest neighbors (k-NN) for uint64s & Hamming distance.

License

Notifications You must be signed in to change notification settings

keilerkonzept/bitknn

Repository files navigation

bitknn

Coverage

Go Reference Go Report Card

import "github.com/keilerkonzept/bitknn"

bitknn is a fast k-nearest neighbors (k-NN) library for uint64s, using (bitwise) Hamming distance.

If you need to classify binary feature vectors that fit into uint64s, this library might be useful. It is fast mainly because we can use cheap bitwise ops (XOR + POPCNT) to calculate distances between uint64 values. For smaller datasets, the performance of the neighbor heap is also relevant, and so this part has been tuned here also.

If your vectors are longer than 64 bits, you can pack them into []uint64 and classify them using the "wide" model variants. On ARM64 with NEON vector instruction support, bitknn can be a bit faster still on wide data.

You can optionally weigh class votes by distance, or specify different vote values per data point.

Contents

Usage

There are just three methods you'll typically need:

Each of the above methods is available on either model type:

Basic usage

package main

import (
    "fmt"

    "github.com/keilerkonzept/bitknn"
)

func main() {
    // feature vectors packed into uint64s
    data := []uint64{0b101010, 0b111000, 0b000111}
    // class labels
    labels := []int{0, 1, 1}

    model := bitknn.Fit(data, labels, bitknn.WithLinearDistanceWeighting())

    // one vote counter per class
    votes := make([]float64, 2)

    k := 2
    model.Predict(k, 0b101011, bitknn.VoteSlice(votes))
    // or, just return the nearest neighbor's distances and indices:
    // distances,indices := model.Find(k, 0b101011)

    fmt.Println("Votes:", bitknn.votes)

    // you can also use a map for the votes.
    // this is good if you have a very large number of different labels:
    votesMap := make(map[int]float64)
    model.Predict(k, 0b101011, bitknn.VoteMap(votesMap))
    fmt.Println("Votes for 0:", votesMap[0])
}

Packing wide data

If your vectors are longer than 64 bits, you can still use bitknn if you pack them into []uint64. The pack package defines helper functions to pack strings and []bytes into []uint64s.

It's faster to use a [][]uint64 allocated using a flat backing slice, laid out in one contiguous memory block. If you already have a non-contiguous [][]uint64, you can use pack.ReallocateFlat to re-allocate the dataset using a flat 1d backing slice.

The wide model fitting function is bitknn.FitWide and accepts the same Options as the "narrow" one:

package main

import (
    "fmt"

    "github.com/keilerkonzept/bitknn"
    "github.com/keilerkonzept/bitknn/pack"
)

func main() {
    // feature vectors packed into uint64s
    data := [][]uint64{
    	pack.String("foo"),
    	pack.String("bar"),
    	pack.String("baz"),
    }
    // class labels
    labels := []int{0, 1, 1}

    model := bitknn.FitWide(data, labels, bitknn.WithLinearDistanceWeighting())

    // one vote counter per class
    votes := make([]float64, 2)

    k := 2
    query := pack.String("fob")
    model.Predict(k, query, bitknn.VoteSlice(votes))

    fmt.Println("Votes:", votes)
}

ARM64 NEON Support

For ARM64 CPUs with NEON instructions, bitknn has a vectorized distance function for []uint64ss that is about twice as fast as what the compiler generates.

When run on such a CPU, the *V methods WideModel.FindV and WideModel.PredictV are noticeably faster than the regular Find/Predict:

Bits N k Find s/op FindV s/op diff
128 1000 3 2.374µ ± 0% 1.792µ ± 0% -24.54% (p=0.000 n=10)
128 1000 10 2.901µ ± 1% 2.028µ ± 1% -30.08% (p=0.000 n=10)
128 1000 100 5.472µ ± 3% 4.359µ ± 1% -20.34% (p=0.000 n=10)
128 1000000 3 2.273m ± 3% 1.380m ± 2% -39.27% (p=0.000 n=10)
128 1000000 10 2.261m ± 1% 1.406m ± 1% -37.84% (p=0.000 n=10)
128 1000000 100 2.289m ± 0% 1.425m ± 2% -37.76% (p=0.000 n=10)
640 1000 3 6.201µ ± 1% 3.716µ ± 0% -40.07% (p=0.000 n=10)
640 1000 10 6.728µ ± 1% 3.973µ ± 1% -40.96% (p=0.000 n=10)
640 1000 100 10.855µ ± 2% 6.917µ ± 1% -36.28% (p=0.000 n=10)
640 1000000 3 5.832m ± 2% 3.337m ± 1% -42.78% (p=0.000 n=10)
640 1000000 10 5.830m ± 5% 3.339m ± 1% -42.73% (p=0.000 n=10)
640 1000000 100 5.872m ± 1% 3.361m ± 1% -42.77% (p=0.000 n=10)
8192 1000000 10 72.66m ± 1% 30.96m ± 3% -57.39% (p=0.000 n=10)

Options

Benchmarks

goos: darwin
goarch: arm64
pkg: github.com/keilerkonzept/bitknn
cpu: Apple M1 Pro
Bits N k Model Op s/op B/op allocs/op
64 100 3 Model Predict 99.06n ± 2% 0 0
64 100 3 WideModel Predict 191.6n ± 1% 0 0
64 100 3 Model Find 88.09n ± 0% 0 0
64 100 3 WideModel Find 182.8n ± 1% 0 0
64 100 10 Model Predict 225.1n ± 1% 0 0
64 100 10 WideModel Predict 372.0n ± 1% 0 0
64 100 10 Model Find 202.9n ± 1% 0 0
64 100 10 WideModel Find 345.2n ± 0% 0 0
64 1000 3 Model Predict 538.2n ± 1% 0 0
64 1000 3 WideModel Predict 1.469µ ± 1% 0 0
64 1000 3 Model Find 525.8n ± 1% 0 0
64 1000 3 WideModel Find 1.465µ ± 1% 0 0
64 1000 10 Model Predict 835.4n ± 1% 0 0
64 1000 10 WideModel Predict 1.880µ ± 1% 0 0
64 1000 10 Model Find 807.4n ± 0% 0 0
64 1000 10 WideModel Find 1.867µ ± 2% 0 0
64 1000 100 Model Predict 3.718µ ± 0% 0 0
64 1000 100 WideModel Predict 4.935µ ± 0% 0 0
64 1000 100 Model Find 3.494µ ± 0% 0 0
64 1000 100 WideModel Find 4.701µ ± 0% 0 0
64 1000000 3 Model Predict 458.8µ ± 0% 0 0
64 1000000 3 WideModel Predict 1.301m ± 1% 0 0
64 1000000 3 Model Find 457.9µ ± 1% 0 0
64 1000000 3 WideModel Find 1.302m ± 1% 0 0
64 1000000 10 Model Predict 456.9µ ± 0% 0 0
64 1000000 10 WideModel Predict 1.295m ± 2% 0 0
64 1000000 10 Model Find 457.6µ ± 1% 0 0
64 1000000 10 WideModel Find 1.298m ± 1% 0 0
64 1000000 100 Model Predict 474.5µ ± 1% 0 0
64 1000000 100 WideModel Predict 1.316m ± 1% 0 0
64 1000000 100 Model Find 466.9µ ± 0% 0 0
64 1000000 100 WideModel Find 1.306m ± 0% 0 0
128 100 3 WideModel Predict 296.7n ± 0% 0 0
128 100 3 WideModel Find 285.8n ± 0% 0 0
128 100 10 WideModel Predict 467.4n ± 1% 0 0
128 100 10 WideModel Find 441.1n ± 1% 0 0
640 100 3 WideModel Predict 654.6n ± 1% 0 0
640 100 3 WideModel Find 640.3n ± 1% 0 0
640 100 10 WideModel Predict 850.0n ± 1% 0 0
640 100 10 WideModel Find 825.0n ± 0% 0 0
128 1000 3 WideModel Predict 2.384µ ± 0% 0 0
128 1000 3 WideModel Find 2.374µ ± 0% 0 0
128 1000 10 WideModel Predict 2.900µ ± 0% 0 0
128 1000 10 WideModel Find 2.901µ ± 1% 0 0
128 1000 100 WideModel Predict 5.630µ ± 1% 0 0
128 1000 100 WideModel Find 5.472µ ± 3% 0 0
128 1000000 3 WideModel Predict 2.266m ± 0% 0 0
128 1000000 3 WideModel Find 2.273m ± 3% 0 0
128 1000000 10 WideModel Predict 2.269m ± 0% 0 0
128 1000000 10 WideModel Find 2.261m ± 1% 0 0
128 1000000 100 WideModel Predict 2.295m ± 1% 0 0
128 1000000 100 WideModel Find 2.289m ± 0% 0 0
640 1000 3 WideModel Predict 6.214µ ± 2% 0 0
640 1000 3 WideModel Find 6.201µ ± 1% 0 0
640 1000 10 WideModel Predict 6.777µ ± 1% 0 0
640 1000 10 WideModel Find 6.728µ ± 1% 0 0
640 1000 100 WideModel Predict 11.16µ ± 2% 0 0
640 1000 100 WideModel Find 10.85µ ± 2% 0 0
640 1000000 3 WideModel Predict 5.756m ± 4% 0 0
640 1000000 3 WideModel Find 5.832m ± 2% 0 0
640 1000000 10 WideModel Predict 5.842m ± 1% 0 0
640 1000000 10 WideModel Find 5.830m ± 5% 0 0
640 1000000 100 WideModel Predict 5.914m ± 6% 0 0
640 1000000 100 WideModel Find 5.872m ± 1% 0 0

License

MIT License