Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't check out the crates.io index locally #4026

Merged
merged 1 commit into from
May 12, 2017

Conversation

alexcrichton
Copy link
Member

This commit moves working with the crates.io index to operating on the git
object layers rather than actually literally checking out the index. This is
aimed at two different goals:

  • Improving the on-disk file size of the registry
  • Improving cloning times for the registry as the index doesn't need to be
    checked out

The on disk size of my registry folder of a fresh check out of the index went
form 124M to 48M, saving a good chunk of space! The entire operation took about
0.6s less on a Unix machine (out of 4.7s total for current Cargo). On Windows,
however, the clone operation went from 11s to 6.7s, a much larger improvement!

Closes #4015

@rust-highfive
Copy link

r? @brson

(rust_highfive has picked a reviewer for you, use r? to override)

@alexcrichton
Copy link
Member Author

r? @matklad

@rust-highfive rust-highfive assigned matklad and unassigned brson May 10, 2017

// Note that this `'static lifetime here is actually a lie, it's actually a
// borrow into the `repo` object below. We're guaranteed, though, that if
// filled in `tree` will be destroyed first, so this should be ok.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What guarantees that tree will be destroyed first? Unspecified drop order? In that case, maybe this is a good case for ManuallyDrop? Or is that not yet usable in cargo due to instability?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexcrichton ping. Should there be a crate for this sort of stuff? https://github.com/Kimundi/owning-ref-rs perhaps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry missed this! Yes the unspecified drop order is what guarantees this. Lots of projects are relying on this so I don't think it's necessary to go out of the way and use ManuallyDrop, and yeah I'd also prefer to keep Cargo on stable.

@matklad unfortunately that crate won't help as it's targeted at Rust pointers, whereas here it's all phantom lifetimes through libgit2 :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's in an Option, you could explicitly take() it first in an impl Drop for RemoteRegistry.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed! I'll do that.

@bors
Copy link
Collaborator

bors commented May 10, 2017

☔ The latest upstream changes (presumably #4024) made this pull request unmergeable. Please resolve the merge conflicts.

// interpretation of each line here and older cargo will simply
// ignore the new lines.
let lines = contents.split(|b| *b == b'\n')
.filter_map(|b| str::from_utf8(b).ok())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should not swallow utf8-decoding errors here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or are we planing to switch to binary format some day?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah this was mostly inspired from discussion on the RFC about schema versioning. I have no plans to break this personally, but it seems reasonable to be somewhat defensive about future changes to the index just for maximal flexibility of future cargo's implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm totally ok with self.parse_registry_package(line).ok(), it's only str::from_utf8(b).ok() that feels overly defensive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable!

@@ -34,7 +35,11 @@ impl<'cfg> RegistryData for LocalRegistry<'cfg> {
&self.index_path
}

fn config(&self) -> CargoResult<Option<RegistryConfig>> {
fn load(&self, root: &Path, path: &str) -> CargoResult<Vec<u8>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not path: &Path? We convert str to Path both for Local and Remote registry anyway. Those slashes format!("{}/{}/{}", &fs_name[0..2], &fs_name[2..4], fs_name) make me nervous :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And do we need root here? Can't we reconstruct it from index_path?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah my thinking of passing in root is that index_path returns a Filesystem which is an "unlocked path", but here we've always got a locked path (locked elsewhere) so looking at a Path is proof of that.

I was originally unsure what would happen if we take a \-separated path when we go down to libgit2, I'm not sure if it handles internally the slash differences. Only one way to find out!

// Note that this `'static lifetime here is actually a lie, it's actually a
// borrow into the `repo` object below. We're guaranteed, though, that if
// filled in `tree` will be destroyed first, so this should be ok.
tree: LazyCell<RefCell<Option<git2::Tree<'static>>>>,
Copy link
Member

@matklad matklad May 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need LazyCell on top of RefCell? LazyCell is useful to return &T and not Ref<T>, but we are returning a Ref anyway, so just RefCell<Option<git2::Tree<'static>>> should be enough levels of indirection...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm excellent point!

@alexcrichton alexcrichton force-pushed the bare-registry branch 2 times, most recently from 1ea815a to a4e850f Compare May 11, 2017 14:51
@alexcrichton
Copy link
Member Author

Pushed some updates

let handle = ops::http_handle(self.config)?;
self.handle.fill(RefCell::new(handle)).ok().unwrap();
Ok(self.handle.borrow().unwrap())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like easy and repo could use LazyCell::get_or_try_init instead of manually unwrapping things.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha yes indeed! I thought I tried that but thanks for catching

@alexcrichton alexcrichton force-pushed the bare-registry branch 2 times, most recently from e51d729 to c2e82c2 Compare May 11, 2017 19:07
@alexcrichton
Copy link
Member Author

Updated

@matklad
Copy link
Member

matklad commented May 11, 2017

LGTM, though there was seemingly legitimate failure on appveyor on the previous build.

@alexcrichton alexcrichton force-pushed the bare-registry branch 2 times, most recently from 7ff4bca to b7414ce Compare May 11, 2017 20:52
@alexcrichton
Copy link
Member Author

Bah looks like libgit2 cares about \ vs /

@alexcrichton
Copy link
Member Author

@bors: r=matklad

@bors
Copy link
Collaborator

bors commented May 11, 2017

📌 Commit b7414ce has been approved by matklad

@bors
Copy link
Collaborator

bors commented May 11, 2017

🔒 Merge conflict

@bors
Copy link
Collaborator

bors commented May 11, 2017

☔ The latest upstream changes (presumably #4032) made this pull request unmergeable. Please resolve the merge conflicts.

This commit moves working with the crates.io index to operating on the git
object layers rather than actually literally checking out the index. This is
aimed at two different goals:

* Improving the on-disk file size of the registry
* Improving cloning times for the registry as the index doesn't need to be
  checked out

The on disk size of my `registry` folder of a fresh check out of the index went
form 124M to 48M, saving a good chunk of space! The entire operation took about
0.6s less on a Unix machine (out of 4.7s total for current Cargo). On Windows,
however, the clone operation went from 11s to 6.7s, a much larger improvement!

Closes rust-lang#4015
@alexcrichton
Copy link
Member Author

@bors: r=matklad

@bors
Copy link
Collaborator

bors commented May 11, 2017

📌 Commit 15cc376 has been approved by matklad

@bors
Copy link
Collaborator

bors commented May 11, 2017

⌛ Testing commit 15cc376 with merge d8fa3eb...

bors added a commit that referenced this pull request May 11, 2017
Don't check out the crates.io index locally

This commit moves working with the crates.io index to operating on the git
object layers rather than actually literally checking out the index. This is
aimed at two different goals:

* Improving the on-disk file size of the registry
* Improving cloning times for the registry as the index doesn't need to be
  checked out

The on disk size of my `registry` folder of a fresh check out of the index went
form 124M to 48M, saving a good chunk of space! The entire operation took about
0.6s less on a Unix machine (out of 4.7s total for current Cargo). On Windows,
however, the clone operation went from 11s to 6.7s, a much larger improvement!

Closes #4015
@bors
Copy link
Collaborator

bors commented May 12, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: matklad
Pushing d8fa3eb to master...

@bors bors merged commit 15cc376 into rust-lang:master May 12, 2017
@alexcrichton alexcrichton added the relnotes Release-note worthy label May 12, 2017
@alexcrichton alexcrichton deleted the bare-registry branch May 12, 2017 16:49
nabijaczleweli added a commit to nabijaczleweli/cargo-update that referenced this pull request May 16, 2017
Allows for seamless ransition for when
rust-lang/cargo#4026 lands

Closes #32
@ehuss ehuss added this to the 1.19.0 milestone Feb 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Release-note worthy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use a bare clone of crates.io-index
8 participants