The Case for a Learned Sorting Algorithm

Authors

Ani Kristo, Kapil Vaidya, Ugur

Brown University; MIT; Intel Labs

Portals

Abstract

Sorting is one of the most fundamental algorithms in Computer Science and a common operation in databases not just for sorting query results but also as part of joins (i.e., sort-merge-join) or indexing. In this work, we introduce a new type of distribution sort that leverages a learned model of the empirical CDF of the data. Our algorithm uses a model to efficiently get an approximation of the scaled empirical CDF for each record key and map it to the corresponding position in the output array. We then apply a deterministic sorting algorithm that works well on nearly-sorted arrays (e.g., Insertion Sort) to establish a totally sorted order. We compared this algorithm against common sorting approaches and measured its performance for up to 1 billion normally-distributed double-precision keys. The results show that our approach yields an average 3.38x performance improvement over C++ STL sort, which is an optimized Quicksort hybrid, 1.49x improvement over sequential Radix Sort, and 5.54x improvement over a C++ implementation of Timsort, which is the default sorting function for Java and Python.

The Case for a Learned Sorting Algorithm

The Case for a Learned Sorting Algorithm

Authors

Portals

Abstract

PDF Preview

Like this:

Leave a Reply Cancel reply

The Case for a Learned Sorting Algorithm

The Case for a Learned Sorting Algorithm

Authors

Portals

Abstract

PDF Preview

Like this:

You may also Like:

Defeating duplicates: A re-design of the LearnedSort algorithm

Leave a Reply Cancel reply