This work was sponsored by enioka.
Given a finite set of objects pool
, and an object
query
. Look within approximately t
time, at
most n
objects subset of pool
that are near
query
according to the metric m
, call that
subset candidates
.
That is the task.
Most popular approach nowadays, and the one I could start to get under control is to rely on Hierarchical Navigable Small Worlds. HNSW is reminiscent of Ben Goertzel glocals from opencog fame.
HNSW datastructure use a deterministic heuristic with the distance
function to construct clusters, still, it use random to spread pression
on memory, and compute. It looks like a skip-list with multiple path
ways. A regular natural sort will not speed queries, because of that
glocals are not sorted, instead nodes are related to points in a higher
level layer. The higher the level the less points they are. The highest
level has only one point. Each point has a maximum x
neighbor in a lower level. Horizontal navigation must be done through
the parent point.
While I put that whole on a flashcard, you can delice with the
following brute force algorithm that will generate a pool of points
nearest to a point query
:
(import (chezscheme))
(import (letloop r999))
(import (scheme generator))
;; brute force algorithm
define total (string->number (or (getenv "HYPER_TOTAL") "65536")))
(define exponent (string->number (or (getenv "HYPER_EXPONENT") "8")))
(
define pk
(lambda args
(write args)
(newline)
(car (reverse args))))
(
define point-distance
(lambda (a b) (abs (- a b))))
(
define point-random (lambda () (random (expt 2 exponent))))
(
define pool (map (lambda _ (point-random)) (iota total)))
(define query (point-random))
(
define ~.two
(lambda (x)
(inexact (/ (round (* x 100)) 100))))
(
define candidates
(let* ((objects (map (lambda (x) (cons (point-distance query x) x)) pool))
(lambda (u v) (< (car u) (car v))) objects))
(objects* (sort (max (map car objects*))))
(distance-max (apply lambda (x) (cons (~.two (/ (- distance-max (car x)) distance-max))
(map (cdr x)))
(
objects*)))
string-upcase (number->string query 36)) query)
(pk 'query (newline)
(for-each pk (map (lambda (x) (list (car x) (number->string (cdr x) 36) (cdr x))) candidates)) (