Skip to content

Commit

Permalink
scripts: west_commands: fix and optimize ncs-compare heuristic
Browse files Browse the repository at this point in the history
The heuristics currently used have two problems:

problem 1. the comments falsely claim we match commits by author
problem 2. heuristic 2 is no longer necessary now that we do not
           truncate downstream commit titles to make room for
           sauce tags

A corollary of problem 1 is that they're also unnecessarily slow: we
could just compare each downstream commit against upstream commits
with the same author, but instead we waste time with unnecessary
comparisons against (potentially thousands of) commits with different
authors.

Fix both issues, using the email field in a pygit2.Signature object as
the author identifier. Emails will have normalized for any diacritic
marks that may occur in people's names.

Signed-off-by: Martí Bolívar <marti.bolivar@nordicsemi.no>
  • Loading branch information
mbolivar-nordic committed Mar 7, 2023
1 parent e80d285 commit f428958
Showing 1 changed file with 27 additions and 34 deletions.
61 changes: 27 additions & 34 deletions scripts/west_commands/ncs_west_helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@

from __future__ import annotations

import collections
from dataclasses import dataclass
from pathlib import Path
import subprocess
Expand Down Expand Up @@ -274,59 +275,51 @@ def _downstream_outstanding_commits(self) -> list[pygit2.Commit]:
def _likely_merged_commits(self) -> dict[pygit2.Commit,
list[pygit2.Commit]]:
# Compute patches which are downstream and probably were
# merged upstream, using the following heuristics:
# merged upstream, by looking for downstream patches with
# small title edit distances from upstream patches.
#
# 1. downstream patches with small title edit distances
# from upstream patches
# Using edit distance catches patches with typos in the titles
# that reviewers asked to be fixed.
#
# 2. downstream patches with titles that are prefixes of
# upstream patches
#
# Heuristic #1 catches patches with typos in the titles
# that reviewers asked to be fixed, etc. E.g. upstream
# title
# For example, we want this upstream title:
#
# Bluetoth: do foo
#
# matches downstream title
# to be matched with this downstream title:
#
# [nrf fromlist] Bluetooth: do foo
#
# Heuristic #2 catches situations where we had to shorten our
# downstream title to fit the "[nrf xyz]" sauce tag at the
# beginning and still fit within CI's title length
# restrictions. E.g. upstream title
#
# subsys: do a thing that is very useful for everyone
#
# matches downstream title
#
# [nrf fromlist] subsys: do a thing that is very
# assuming both commits have the same author.
#
# The return value is a map from pygit2 commit objects for
# downstream patches, to a list of pygit2 commit objects that
# are upstream patches which have similar titles and the
# same authors.
# are upstream patches which have similar titles and the same
# authors.

email2title_commit = collections.defaultdict(list)
for upstream_commit in self.upstream_new:
email = upstream_commit.author.email
email2title_commit[email].append(
(commit_title(upstream_commit), upstream_commit))

likely_merged = {}
for dc in self.downstream_outstanding:
sl = commit_title(dc)
for downstream_commit in self.downstream_outstanding:
downstream_email = downstream_commit.author.email

def ed(upstream_commit):
return editdistance.eval(
title_no_sauce(sl, self._downstream_sauce),
commit_title(upstream_commit))
def is_match(upstream_title):
edit_dist = editdistance.eval(
title_no_sauce(commit_title(downstream_commit)),
upstream_title)
return edit_dist < self._edit_dist_threshold

matches = [
uc for uc in self.upstream_new if
# Heuristic #1:
ed(uc) < self._edit_dist_threshold or
# Heuristic #2:
commit_title(uc).startswith(sl)
upstream_commit for upstream_title, upstream_commit in
email2title_commit[downstream_email] if
is_match(upstream_title)
]

if len(matches) != 0:
likely_merged[dc] = matches
likely_merged[downstream_commit] = matches

return likely_merged

Expand Down

0 comments on commit f428958

Please sign in to comment.