I found an adversarial input for std::sort when looking at the visualization of this permutation. The idea is to let STL algorithms degenerate and call std::partial_sort on a subarray, and we want the subarray to be as long as possible (for random inputs these algorithms seldom do that). The code for the generating algorithm is as follows:
void generate(std::vector& arr) {
for (size_t i = 0; i < arr.size(); i++) arr[i] = i + 1;
size_t i = 4, j = arr.size() - 1;
while (i < j) {
std::swap(arr[i], arr[j]);
i += 2;
j -= 2;
}
std::swap(arr[0], arr[arr.size() - 1]);
}
According to my benchmark, it pushes both GCC std::sort and MSVC std::sort into heap sort case. The following table shows CPU time (clock()), array access count and comparison count in the format time (access/comparison). The criteria is: If the array access count is greater than 3.5*n*log2(n) then we assume that the algorithm ran into heap sort.
| n | GCC (-O3) | MSVC (/O4) |
|---|---|---|
| 131072 | 27.023ms (18566233 / 6889195) | 19ms (8202054 / 3152744) |
| 1048576 | 158.886ms (173777132 / 64573900) | 100ms (78129920 / 30096638) |
| 16777216 | 2499.614ms (3317506540 / 1234567896) | 1774ms (1498187752 / 580803613) |
| 134217728 | 22777.134ms (29766865113 / 11086205472) | 15704ms (13527065651 / 5255846034) |
I have three questions:
Can it degenerate C
qsortinto O(n^2)?What happens with other libraries from other compilers (e.g. clang)?
Do
std::sorts give better performance on this permutation or the adversarial one in the paper I linked?
Remark. The adversarial input in the paper could also be generated using the following code:
void generate(std::vector& arr) {
size_t num = 1;
std::map > positions;
auto lowbit = [=](size_t x) -> size_t {
return x & (-x);
};
for (size_t i = 0; i < arr.size() / 2; i++) {
size_t lb = lowbit(i + 1);
positions[lb].push_back(i);
}
for (auto& p : positions) {
for (size_t pos : p.second) {
arr[pos] = num;
num += 2;
}
}
for (size_t i = arr.size() / 2; i < arr.size(); i++) arr[i] = (i - arr.size() / 2) * 2 + 2;
}