Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assertEquals is very slow for large Set/Map obj due to O(n^2) comparison algorithm #5903

Open
lionel-rowe opened this issue Sep 4, 2024 · 6 comments
Labels
assert enhancement New feature or request expect PR welcome A pull request for this issue would be welcome

Comments

@lionel-rowe
Copy link
Contributor

Describe the bug

assertEquals is very slow when comparing large Set and Map objects due to using an O(n^2) comparison algorithm:

std/assert/equal.ts

Lines 87 to 99 in 7e26160

for (const [aKey, aValue] of a.entries()) {
for (const [bKey, bValue] of b.entries()) {
/* Given that Map keys can be references, we need
* to ensure that they are also deeply equal */
if (
(aKey === aValue && bKey === bValue && compare(aKey, bKey)) ||
(compare(aKey, bKey) && compare(aValue, bValue))
) {
unmatchedEntries--;
break;
}
}
}

It looks like there's the same issue in the expect version too.

Steps to Reproduce

import { assertEquals } from "jsr:@std/assert";

const arr1 = Array.from({ length: 1e4 }, (_, i) => i)
const arr2 = [...arr1]
const set1 = new Set(arr1)
const set2 = new Set(arr2)

console.time('compare arr')
assertEquals(arr1, arr2);
console.timeEnd('compare arr')

console.time('compare set')
assertEquals(set1, set2);
console.timeEnd('compare set')

Results:

compare arr: 14ms
compare set: 1178ms

Results get exponentially worse as the size increases.

Expected behavior

assertEquals of Set and array (or Map and obj) to take approximately equal time.

Environment

  • OS: Ubuntu 20.04, WSL
  • deno version: 1.46.0
  • std version: assert@1.0.3
@lionel-rowe lionel-rowe added bug Something isn't working needs triage labels Sep 4, 2024
@iuioiua iuioiua added enhancement New feature or request assert expect PR welcome A pull request for this issue would be welcome and removed bug Something isn't working needs triage labels Sep 4, 2024
@iuioiua
Copy link
Contributor

iuioiua commented Sep 4, 2024

PRs that improve the performance of assertEquals() for this scenario are welcome.

@lionel-rowe
Copy link
Contributor Author

Two possible directions to explore:

const aEntries = [...a.entries()].sort();
const bEntries = [...b.entries()].sort();

for (const [idx, [aKey, aValue]] of aEntries.entries()) {
  const [bKey, bValue] = bEntries[idx]!;
  if (!compare(aKey, bKey) || !compare(aValue, bValue)) {
    return false;
  }
}

return true;

This passes for all current tests but I think it'll fail for more complex examples as sort() just stringifies its args and doesn't modify order if the strings are equal (so [{ b: 2 }, { a: 1 }].sort() is [{ b: 2 }, { a: 1 }] because `"[object Object]" === "[object Object]").

const keySet = new Set<unknown>();
const valSet = new Set<unknown>();

for (const input of [a, b]) {
  for (const [key, value] of input.entries()) {
    keySet.add(key);
    valSet.add(value);
  }
}

return keySet.size === a.size && valSet.size === a.size;

This fails some of the current tests as sets are only unique by reference, not value.

Basically, the problem in both cases is not having a unique-by-value, reference-equal, serialized form.

@Leokuma
Copy link

Leokuma commented Sep 4, 2024

I made this comparison with Lodash isEqual(). The difference is not as big I expected. Lodash code looks hacky.

import "https://raw.githubusercontent.com/lodash/lodash/4.17.10-npm/lodash.js";
import { assertEquals } from "jsr:@std/assert";

const arr1 = Array.from({ length: 10_000 }, (_, i) => i)
const arr2 = [...arr1]
const set1 = new Set(arr1)
const set2 = new Set(arr2)

let result: boolean

console.time('compare set std')
assertEquals(set1, set2);
console.timeEnd('compare set std')

console.time('compare set lodash')
result = _.isEqual(set1, set2)
console.timeEnd('compare set lodash')
console.log(result)
❯ deno run -A set.ts
compare set std: 616ms
compare set lodash: 317ms
true

@lionel-rowe
Copy link
Contributor Author

A partial fix would be to have a fast path for cases where all keys of both are primitives, which would certainly have worked for the use case where I discovered this limitation, and maybe for most use cases where you'd be likely to want to compare large Sets/Maps using assertEquals? Typically where the keys aren't primitive, you don't want to compare keys by value (because equality is determined by reference for purposes of setting/getting/etc).

@Leokuma
Copy link

Leokuma commented Sep 4, 2024

Talking only about Sets, if all the values in either Set of the comparison has only primitives (you don't even need to check both Sets, just one of them), then you can compare them instantly with:

set1.symmetricDifference(set2).size == 0

I think this optimization is worth it because a Set usually has only primitives or only objects. I believe it's not common that a Set is populated with both primitives and objects. But I could be wrong.

@lionel-rowe
Copy link
Contributor Author

lionel-rowe commented Sep 5, 2024

Talking only about Sets, if all the values in either Set of the comparison has only primitives (you don't even need to check both Sets, just one of them), then you can compare them instantly with:

set1.symmetricDifference(set2).size == 0

I think this optimization is worth it because a Set usually has only primitives or only objects. I believe it's not common that a Set is populated with both primitives and objects. But I could be wrong.

Yeah I misspoke, only one set/map needs checking for exclusively primitive keys (because if the other has some non-primitive keys, at least one of the primitive keys must be missing due to having already passed same-size check). Probably worth adding symmetricDifference as an extra-fast-path though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assert enhancement New feature or request expect PR welcome A pull request for this issue would be welcome
Projects
None yet
Development

No branches or pull requests

3 participants