SIMD Support for Proximity Based Queries in Cottontail DB (Bachelor Thesis, Finished)

Author

Colin Saladin

Description

Cottontail DB is a database for multimedia retrieval and has served as a backend for the vitrivr stack for several years now. It seamlessly combines the execution of Boolean as well as similarity search in a unified query- and execution-model. Mostly, similarity search relies on the evaluation of distances between often high-dimensional feature vectors (Nearest Neighbor Search / Farthest Neighbor Search).

The ever-increasing volume of unstructured data poses a challenge to modern multimedia retrieval systems like Cottontail DB. The soaring size of stored data leads to a degradation in query execution times for content-based search. Approximation methods like Locality Sensitive Hashing (LSH) or Product Quantization (PQ) provide an uneasy trade-off at best and even though, both methods significantly accelerate the execution times, they lack the precision required for certain use cases in Nearest Neighbor Search (NNS).

The problem with exact methods usually comes from the fact that they are very time-consuming and computationally challenging. Therefore, in this Bachelor Thesis, we exploit data-level parallelism through a CPU’s Single Instruction, Multiple Data (SIMD) operations to achieve improved performance for the computation of NNS queries on Cottontail DB without losing any of the accuracy. Upon successful implementation, we evaluate the performance of the implementations with and without SIMD, as well as the impact of intra-query parallelism in both versions. The results reveal substantial differences between the two distance function versions. Additionally, we were able to write a command that determines a threshold value, corresponding to a feature-vector dimension, at which the usage of the vectorized version of the distance function is advantageous. As a result, we wrote a planner rule that decides on which version to use, based on the dimension of the input vector.

Start / End Dates

2022/03/07 - 2022/07/06

Supervisors

Research Topics