Hybrid Human-Machine Information Systems for Data Classification

Shaban Shabani
PhD Thesis
Appears in
PhD Thesis, Department of Mathematics and Computer Science
University of Basel, Switzerland

Over the last decade, we have seen an intense development of machine learning approaches for solving various tasks in diverse domains. Despite the remarkable advancements in this field, there are still tasks categories that machine learning models fall short of required accuracy. This is the case with tasks that require human cognitive skills such as sentiment analysis, emotional or contextual understanding. On the other hand, human based computation approaches, such as crowdsourcing, are popular for solving such tasks. Crowdsourcing enables access to a vast number of groups with different expertise, and if managed properly, generates high quality results. However, crowdsourcing as a standalone approach is not scalable due to the latency and cost it brings in.

Addressing the challenges and limitations that the human and machine based approaches have distinctly requires bridging the two fields into a hybrid intelligence, seen as a promising approach to solve critical and complex real-world tasks. This thesis focuses on hybrid human-machine information systems, combining machine and human intelligence and leveraging their complementary strengths, the data processing efficiency of machine learning and the data quality generated by crowdsourcing.

In this thesis, we present hybrid human-machine models to address the challenges falling in three dimensions: accuracy, latency, and cost. Solving data classification tasks in different domains has different requirements with respect to the accuracy, latency, and cost criteria. Motivated by this fact, we present a master component that evaluates these criteria to find the suitable model as a trade-off solution. In hybrid human-machine information systems, incorporating human judgments is expected to improve the accuracy of the system. Therefore, to ensure this, we focus on the human intelligence component, integrating
profile aware crowdsourcing for task assignment and data quality control mechanisms in the hybrid pipelines.

The proposed conceptual hybrid human-machine models materialize in conducted experiments. Motivated by challenging scenarios and using real-world datasets, we implement the hybrid models in three  experiments. Evaluations show that the implemented hybrid human-machine architectures for data classification tasks lead to better results as compared to each of the two approaches individually, improving the overall accuracy at an acceptable cost and latency.

Staff members