Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use git gc --auto in maybe_gc_repo #8196

Closed
wants to merge 2 commits into from

Conversation

glandium
Copy link
Contributor

@glandium glandium commented May 2, 2020

Fixes #8195.

@rust-highfive
Copy link

r? @Eh2406

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 2, 2020
@ehuss
Copy link
Contributor

ehuss commented May 2, 2020

Thanks! It looks like it broke some tests, though.

Also, do you have any information about how safe this is to do? Does git work well concurrently with libgit2? The git gc man page says "users who run commands concurrently have to live with some risk of corruption", which doesn't instill a lot of confidence. I also have no idea if libgit2 uses the same locking and anti-corruption strategies as the official client. Since Cargo immediately runs a fetch operation afterwards, it seems like it would be risky.

@glandium
Copy link
Contributor Author

glandium commented May 3, 2020

Thanks! It looks like it broke some tests, though.

Also, do you have any information about how safe this is to do? Does git work well concurrently with libgit2? The git gc man page says "users who run commands concurrently have to live with some risk of corruption", which doesn't instill a lot of confidence. I also have no idea if libgit2 uses the same locking and anti-corruption strategies as the official client. Since Cargo immediately runs a fetch operation afterwards, it seems like it would be risky.

git gc --auto will not remove anything from the repo that is not older than 2 weeks by default (although someone could have set gc.pruneExpire to now). It will also only remove unreferenced loose objects. Those don't happen simply by fetching. They happen when you git add something, but never commit it, for example, which is not how cargo uses repositories. I don't know whether libgit2 follows the same locking as git (one would sure hope it does), but even if it doesn't, I wouldn't expect corruptions here.

@glandium
Copy link
Contributor Author

glandium commented May 3, 2020

I don't know what the remaining rustc_info_cache errors are, but they happen locally even without the patch...

@alexcrichton
Copy link
Member

This is something I ran into myself recently too, so it's neat to see what git has a feature for this! (running in the background)

I'd echo the same concerns as @ehuss though in that it's not clear to me whether we'd function correctly with a gc happening concurrently in the background. I don't really know how the git gc works myself or how roots are determined. Cargo's management of the index is somewhat nonstandard as well since we don't even have a checkout, and I forget if we have branches/refs/etc tracking commits.

Would it be possible to run git gc --auto in the foreground as a stopgap for now? That way we could start respecting global git configuration immediately, and we could figure out in parallel if it's safe to run the gc concurrently.

@glandium
Copy link
Contributor Author

glandium commented May 4, 2020

It's quite simple to do so: just use git -c gc.autoDetach=false gc --auto instead of git gc --auto.

@glandium
Copy link
Contributor Author

glandium commented May 4, 2020

An alternative would be to implement gc manually with the git_packbuilder_* API from libgit2.

@alexcrichton
Copy link
Member

I'd be fine with either route, I think it'd probably just be best to run in the foreground for now until we're sure that this is an operation which can happen concurrently.

@alexcrichton
Copy link
Member

We discussed this in the Cargo meeting and @joshtriplett spoke about the concurrency aspect and said that libgit2 is highly likely to work since git gc is intended to be extremely defensive about concurrent accesses. To hopefully improve our resilience a little more, we were also thinking that the git gc could perhaps be run after the fetch instead of just before?

@glandium would you be up for making that change and we can r+ with it running in the background as well?

@glandium
Copy link
Contributor Author

@glandium would you be up for making that change and we can r+ with it running in the background as well?

Just to be clear, are you saying the patch as-is + a move of the maybe_gc_repo call to after the fetch?

@alexcrichton
Copy link
Member

Yeah, @joshtriplett spoke to how it's highly likely to continue to work given the defensiveness of git gc, and moving it after the fetch should make it even less likely to not work by accident because we'd only gc after fetching.

@glandium
Copy link
Contributor Author

There's an interesting error in the azure tests, where the panic message seems split between stdout and stderr.

@glandium
Copy link
Contributor Author

glandium commented May 14, 2020

Ah, the current code actually relies on reinitializing the repo entirely when git is not available (and thus git gc), and then fetch, so that the number of packs is reduced. Which doesn't work if git gc happens after fetching.

Should we go with something like:

  • check that git is available
  • if not, reinitialize
  • in any case, fetch
  • maybe gc

If so, is it okay to add a dependency on the which crate to do the check? Or does cargo use something else?

@alexcrichton
Copy link
Member

Ah right, that's because the function is currently reliant on only running sometimes but it's now being changed to run unconditionally.

We don't want to reinitialize the entire repository on each fetch so if git isn't available I think we'll still want some sort of a guard like we have today, but otherwise what you've written down sounds pretty reasonable to me.

@bors
Copy link
Collaborator

bors commented Jun 18, 2020

☔ The latest upstream changes (presumably #8363) made this pull request unmergeable. Please resolve the merge conflicts.

@ehuss
Copy link
Contributor

ehuss commented Jan 13, 2021

Closing due to inactivity. We're still interested in improving the automatic garbage collection. If you want to continue pursuing this, feel free to reopen or post a new PR.

@ehuss ehuss closed this Jan 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cargo should not run git gc manually
6 participants