This work was sponsored by enioka.
The task one is:
Given a finite set of objects
pool
, and an objectquery
. Look, which ever comes first, within approximatelyt
time, or at mostn
objects subset ofpool
that are nearquery
according to the metricm
, call that subsetcandidates
.
Most popular approach nowadays, and the one I could start to get under control is to rely on Hierarchical Navigable Small Worlds. HNSW is reminiscent of Ben Goertzel glocals from opencog fame.
HNSW datastructure use a deterministic heuristic with the distance
function to construct clusters, still, it use random to spread pression
on memory, and compute. It looks like a skip-list with multiple path
ways. A regular natural sort will not speed queries, because of that
glocals are not sorted, instead nodes are related to points in a higher
level layer. The higher the level the less points they are. The highest
level has only one point. Each point has a maximum x
neighbor in a lower level. Horizontal navigation must be done through
the parent point.
While I put that whole on a flashcard, you can delice with the
following brute force algorithm that will generate a pool of points
nearest to a point query
:
(import (chezscheme))
(import (letloop r999))
(import (scheme generator))
;; brute force algorithm
define total (string->number (or (getenv "HYPER_TOTAL") "65536")))
(define exponent (string->number (or (getenv "HYPER_EXPONENT") "8")))
(
define pk
(lambda args
(write args)
(newline)
(car (reverse args))))
(
define point-distance
(lambda (a b) (abs (- a b))))
(
define point-random (lambda () (random (expt 2 exponent))))
(
define pool (map (lambda _ (point-random)) (iota total)))
(define query (point-random))
(
define ~.two
(lambda (x)
(inexact (/ (round (* x 100)) 100))))
(
define candidates
(let* ((objects (map (lambda (x) (cons (point-distance query x) x)) pool))
(lambda (u v) (< (car u) (car v))) objects))
(objects* (sort (max (map car objects*))))
(distance-max (apply lambda (x) (cons (~.two (/ (- distance-max (car x)) distance-max))
(map (cdr x)))
(
objects*)))
string-upcase (number->string query 36)) query)
(pk 'query (newline)
(for-each pk (map (lambda (x) (list (car x) (number->string (cdr x) 36) (cdr x))) candidates)) (
home.