SanDBoX - A Modular Component-based Multimedia Information Retrieval Framework (Master Thesis, Finished)
Author
Pascal Düblin
Description
Database research has a long history and tradition. Database systems that are still very prominently in use, such as PostgreSQL and MySQL, date back to the 80s and 90s and rely on software engineering concepts and architectures that are no longer up to date. This project is a hot topic in current research in the fields of databases and information retrieval. The goal is to lay out the concepts and take a hybrid approach for overcoming the gap between database management systems and NoSQL/NewSQL technologies for information retrieval purposes. It includes the modeling the search problem from a theoretical perspective, and, on the other hand, the implementation of a software layer that is able to take over this task. Through the use of distribution techniques, the system should be able to scale to Big Data sizes. Technologies Java, possibly: Apache UIMA, Memcached, Apache Spark, Galago, Hbase, LevelDB, Lucene, Solr, MongoDB, Hadoop, FlumeJava, etc. Literature
Fuhr, N. (2014). Bridging Information Retrieval and Databases.
Beyer, K., Goldstein, J., Ramakrishnan, R. (1999). When Is “Nearest Neighbor” Meaningful?
de Vries, A. (1999). Content and Multimedia Database Management Systems.
Yui, M., Kojima, I. (2014). A Database-Hadoop Hybrid Approach to Scalable Machine Learning.
Moise, D., Shestakov, D., Gudmundsson, G., Amsaleg, L. (2014). Indexing and Searching 100M Images with Map-Reduce.
Kranyaz, S., Gabbouj, M. (2014). Content-Based Management of Multimedia Databases. http://baselbern.swissbib.ch/Record/31573194X