Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eats all memory and goes into swap #1087

Closed
svpv opened this issue Feb 23, 2021 · 5 comments · Fixed by #1088
Closed

eats all memory and goes into swap #1087

svpv opened this issue Feb 23, 2021 · 5 comments · Fixed by #1088
Labels

Comments

@svpv
Copy link

svpv commented Feb 23, 2021

Recently I upgraded from tig-2.5.0 to tig-2.5.2. I can no longer view big commits: tig eats up all memory and goes into swap.

Steps to reproduce:

  1. Clone e.g. this repo.
  2. In another terminal, run top and press M to sort by memory usage.
  3. In yet another terminal, prepare killall tig.
  4. Run tig --all and position to the second to last commit, the one with 0ad-0.0.24b-alpha-unix-build.tar.xz.
  5. Now press Enter to view the commit.
  6. You'll only have a few seconds to peek at top and execute killall tig. After those few seconds, you won't be able to abort tig gracefully, it'll send the whole machine into swap.

The problem is exacerbated by the fact that tig traps signals. So by the time you realize that C-C doesn't work, and you want to open another terminal, it's already too late. I think it must be possible to trap signals in such a way that the first arrival resets the handler, so that the second C-C is a sure-fire way to kill the program.

I think I was able to bisect the problem to 484aa21.

@koutcher
Copy link
Collaborator

koutcher commented Feb 23, 2021

Thanks for reporting. To avoid the loop until we have a fix, the workaround is set wrap-lines = no. The problem is not with the size of the commit, the offending character is © in 0ad/build/premake/premake5/binmodules/luasocket/LICENSE. It doesn't show even with 2.5.0 and as wrapping is checking for multibytes characters since 2.5.2, it causes the loop.
The strange thing is that it's not even a multibyte character:

0000010: 6365 6e73 650a 436f 7079 7269 6768 7420  cense.Copyright 
0000020: a920 3230 3034 2d32 3031 3320 4469 6567  . 2004-2013 Dieg
0000030: 6f20 4e65 6861 620a 0a50 6572 6d69 7373  o Nehab..Permiss

@krobelus
Copy link
Contributor

krobelus commented Feb 23, 2021

The file is encoded ISO-8859 and we likely try to read it as UTF-8.
In UTF-8, the © would be multibyte: a9 and c2.

Related: .gitattributes supports configuring encoding, but apparently supports ISO-8859 out of the box.

@koutcher
Copy link
Collaborator

koutcher commented Feb 23, 2021

@krobelus, with the following, LICENSE doesn't loop anymore, but mime.html still does. We have some translation in place when showing the blob view but we don't have any when reading the output of git show in the diff view. Maybe we need some kind of sanitizing... Not easy though as we have no indication of the encoding.

diff --git a/src/string.c b/src/string.c
index 2638d92..3e8f7d9 100644
--- a/src/string.c
+++ b/src/string.c
@@ -313,7 +313,7 @@ utf8_length(const char **start, int max_chars, size_t skip, int *width, size_t m
 		unicode = utf8_to_unicode(string, bytes);
 		/* FIXME: Graceful handling of invalid Unicode character. */
 		if (!unicode)
-			break;
+			unicode = '?';
 
 		ucwidth = unicode == '\t' ? tab_size - (*width % tab_size) :
 					    utf8proc_charwidth((utf8proc_int32_t) unicode);

@krobelus
Copy link
Contributor

The fix looks good.
I haven't been able to reproduce a loop with mime.html.
Maybe it's enough to make utf8_length more robust, so it always makes progress regardless of input.
If other places are affected, a separate sanitization step might make more sense.

koutcher added a commit to koutcher/tig that referenced this issue Feb 23, 2021
@koutcher
Copy link
Collaborator

I definitely still have the loop with mime.html (make sure you have set wrap-lines = yes). I managed to avoid it by checking if the string contains valid UTF-8 and by using the old string_expanded_length() if it doesn't.

koutcher added a commit to koutcher/tig that referenced this issue Feb 26, 2021
Changes utf8_char_length(), utf8_to_unicode() and utf8_length()
implementation to rely on utf8proc.

Fixes jonas#1087
koutcher added a commit to koutcher/tig that referenced this issue Feb 27, 2021
Changes utf8_char_length(), utf8_to_unicode() and utf8_length()
implementation to rely on utf8proc.

Fixes jonas#1087
koutcher added a commit to koutcher/tig that referenced this issue Feb 27, 2021
Changes utf8_char_length(), utf8_to_unicode() and utf8_length()
implementation to rely on utf8proc.

Fixes jonas#1087
koutcher added a commit to koutcher/tig that referenced this issue Feb 27, 2021
Changes utf8_char_length(), utf8_to_unicode() and utf8_length()
implementation to rely on utf8proc.

Fixes jonas#1087
koutcher added a commit to koutcher/tig that referenced this issue Feb 28, 2021
Changes utf8_char_length(), utf8_to_unicode() and utf8_length()
implementation to rely on utf8proc.

Fixes jonas#1087
koutcher added a commit to koutcher/tig that referenced this issue Mar 1, 2021
Changes utf8_char_length(), utf8_to_unicode() and utf8_length()
implementation to rely on utf8proc.

Fixes jonas#1087
koutcher added a commit to koutcher/tig that referenced this issue Mar 5, 2021
Changes utf8_char_length(), utf8_to_unicode() and utf8_length()
implementation to rely on utf8proc.

Fixes jonas#1087
koutcher added a commit that referenced this issue Mar 5, 2021
Changes utf8_char_length(), utf8_to_unicode() and utf8_length()
implementation to rely on utf8proc.

Fixes #1087
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants