Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network retry issue 1602 #2396

Merged
merged 1 commit into from
May 12, 2016
Merged

Conversation

sbeckeriv
Copy link
Contributor

@sbeckeriv sbeckeriv commented Feb 19, 2016

Dearest Reviewer,

This branch resolves #1602 which relates to retrying network
issues automatically. There is a new utility helper for retrying
any network call.

There is a new config called net.retry value in the .cargo.config
file. The default value is 2. The documentation has also been
updated to reflect the new value.

Thanks
Becker

@alexcrichton
Copy link
Member

Thanks for the PR @sbeckeriv! Some thoughts of mine:

  • Could this logic be centralized as much as possible? Currently the retry loop is encoded in two separate places and ideally it'd only be defined in one.
  • I'm pretty wary of retrying on any error, I've generally had bad experiences with tools that have that kind of behavior. Most of the time an error is impossible to resolve until something changes, it's only pretty rarely that an intermittent error can be retried to fix behavior. Can we whitelist specific errors to retry on?
  • I think that this logic needs to be attached directly to the calls to the network themselves, right now it's placed very high up in the chain and far away from where some updates may happen. There's also network updates throughout the rest of cargo (e.g. when publishing or installing) which won't get affected in the locations this is added.
  • Can you be sure to add some tests for this? It may be easiest to just mirror the HTTPS tests where some local sever is spun up, connected to, and immediately dropped to generate an error.
  • Can you be sure to document any new options in src/doc/config.md?

@sbeckeriv
Copy link
Contributor Author

Thanks for the reply.

I will move the net_retry logic in to the utils config object. For the logic it self I believe I know how to do this. I am going to write a function that takes a closure and handles the logic.

I will figure out a white list enum too.

logic needs to be attached directly to the calls to the network themselves

I think with the closure concept I could push the logic down the stack, however, that would not blanket all network calls. Do you have some thoughts on how I might go about that? I think most network calls are else where crates. I could update curl-rust to support a retry option pretty easy. Make it an http setting like proxy and timeout. I read that libcurl does not directly support it. I need to read more around the ssh side. Let me know your thoughts.

@alexcrichton
Copy link
Member

Do you have some thoughts on how I might go about that? I think most network calls are else where crates. I could update curl-rust to support a retry option pretty easy.

Nah I can help audit after the infrastructure in place, and then I can also help to ensure that we always push network requests through the same location. I'd also recommend that even if this were an option in curl-rust that we'd still do it in Cargo as we'd likely want to give a UI update as part of it as well anyway.

@sbeckeriv
Copy link
Contributor Author

@alexcrichton
Copy link
Member

Yeah, that looks great to me!

I think we may want to have another callback for "was this error one we can retry from", but other than that it looks like the right infrastructure to me

@sbeckeriv
Copy link
Contributor Author

Hey,
I have updated to the new callback. I updated the docs and added in some comments in the code. I copied a https test and I am checking the stdout.

For the "was this error one we can retry from"; error handling is new to me. Should I pass it the CargoResult value or the unwrapped Err? What might that callback look like? How do I check its a Err I care about vs any other error?

Thanks!

@@ -134,7 +140,7 @@ impl<'cfg> PackageRegistry<'cfg> {
Some(&(ref previous, _)) => {
if previous.precise() == namespace.precise() {
debug!("load/match {}", namespace);
return Ok(())
return Ok(());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you have rustfmt on auto-run? Could you back out these changes for now? They tend to clutter up the diff unfortunately :(

@alexcrichton
Copy link
Member

I think that it may be good to introduce a new trait:

trait NetworkError {
    pub fn may_be_spurious(&self) -> bool;
}

That way you can define the retry function with a NetworkError bound on the E type coming out. I would only implement NetworkError for types like git2::Error and curl::Error rather than a blanket CargoError because that'll help target the errors individually.

@bors
Copy link
Collaborator

bors commented Feb 26, 2016

☔ The latest upstream changes (presumably #2397) made this pull request unmergeable. Please resolve the merge conflicts.

@sbeckeriv
Copy link
Contributor Author

I pushed down the network call in to the checkout command.

I added the NetworkError trait. I am unsure how to apply the bounds in the retry method. I added it as a trait bound to T but the T is the return Ok value for the cargo result not the error. Any advice or examples of how to do this would be great!

Thanks again
Becker

/// Example:
/// use util::network;
/// cargo_result = network.with_retry(&config, || something.download());
pub fn with_retry<T, F>(config: &Config, mut callback: F) -> CargoResult<T>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the signature you want for this method is something along the lines of:

pub fn with_retry<T, E, F>(config: &Config, mut callback: F) -> CargoResult<T>
    where F: FnMut() -> Result<T, E>,
          E: NetworkError + CargoError

This would then match on the result of callback and call maybe_spurious to see whether it is indeed a spurious error or not.

@alexcrichton
Copy link
Member

@sbeckeriv yeah the point of the bounds was to highlight the fact that Box<CargoError> is opaque and very difficult to see if we need to retry it or not. It basically just means that the retry needs to be pushed farther down in the stack.

@sbeckeriv
Copy link
Contributor Author

The bounds make perfect sense now. I pushed the config down to the bottom fetch call. I am still getting "mismatched type" errors on the result I am returning. The generics at this level are beyond me. The error text https://gist.github.com/sbeckeriv/3866bd2ab7241da32b7c I have tried many combinations. I thought the error came from how I initialized the result value but I removed that. The normal human err is does not impl network error. I then thought it was because the shell command internally could cause an error but I removed the try with no luck.

Do i need to rebuild the result or implement a from?

Thanks again,
Becker

@alexcrichton
Copy link
Member

No worries! Here's how I might write this function:

pub fn with_retry<T, E, F>(config: &Config, mut callback: F) -> CargoResult<T>
    where F: FnMut() -> Result<T, E>,
          E: NetworkError
{
    let mut remaining = try!(config.net_retry());
    loop {
        match callback() {
            Ok(ret) => return Ok(ret),
            Err(ref e) if e.maybe_spurious() && remaining > 0 => {
                // print to shell
                remaining -= 1;
            }
            Err(e) => return Err(Box::new(e)),
        }
    }
}

Perhaps that can help you out?

@sbeckeriv
Copy link
Contributor Author

That does! When I first started I decomposed but I went the other way for reasons I do not remember. I need to update my test but it compiles now. Thanks!

@sbeckeriv
Copy link
Contributor Author

I have the retry at the lowest level fetch. I think this covers all the git actions. I am looking in to the interactions with the registry.

@sbeckeriv
Copy link
Contributor Author

As I figure out the code I am not sure I want to extern cargo in to crates-io. I am guessing there are separated in to there own project for a reason. Now I would require Config and network to cross back in to creates. I could just copy the concept but there is no config system for cargo-io helpers. Maybe this is figured out in another pull request? Let me know your thoughts.

Thanks again!
Becker

}

impl NetworkError for git2::Error {
fn maybe_spurious(&self) -> bool { true }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For git errors this may want to at least check the class of the error for Net, and perhaps look at the code as well to ensure it's not a known non-spurious error.

@alexcrichton
Copy link
Member

Looking good! I think though that this may not catch where we download from the registry (using curl), just git so far. API calls with crates.io also look like they may need instrumentation as well, and you can probably just implement NetworkError for the error type coming out of the crates-io library

@sbeckeriv
Copy link
Contributor Author

I updated the error checking.

For git2::ErrorCode its not clear which errors relate to network activity. Maybe EOF for a disconnect? In my test I found that not connecting to a host gave an git2::ErrorClass::Os error. I am checking for both Os and Net now. Maybe my test is invalid?

For curl errors I left a comment to what I believe the number maps to based off of this website https://curl.haxx.se/libcurl/c/libcurl-errors.html

Thanks again for all the help.
Becker

.file("src/lib.rs", "");
.file("src/lib.rs", "").file(".cargo/config", r#"
[net]
retry=0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory these modifications to tests shouldn't be necessary. All of these are indicative of errors that can never be resolved, so they shouldn't print out warnings anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. These are leftover from before the trait. I should be able to remove these.

@@ -1,3 +1,4 @@
extern crate curl_sys;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be added to the crate root?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it to src/cargo/lib.rs I think thats the "root". Let me know if that is wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah that's it!

@alexcrichton
Copy link
Member

Looking good @sbeckeriv, thanks!

Err(ref e) if e.maybe_spurious() && remaining > 0 => {
try!(config.shell()
.warn(format!("{} {} tries remaining", e,
remaining)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this could be along the lines of:

let msg = format!("spurious network error ({} tries remaining): {}", remaining, e);
try!(config.shell().warn(msg));

@alexcrichton
Copy link
Member

Awesome! Just a minor tweak to some wording, and then perhaps some squashing? Other than that r=me

@sbeckeriv
Copy link
Contributor Author

@alexcrichton I made it one commit. Thanks for working with me and all the help! I learned a lot.

Becker

Ok(ret) => return Ok(ret),
Err(ref e) if e.maybe_spurious() && remaining > 0 => {
let msg = format!("spurious network error ({} tries \
remaining): {}", remaining, e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the remaining count here is actually remaining - 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry config is 2 by default.

count is 2 
I run once and fail i output 
  2 tries remains 
    subtract 1 count is 1
run a second time 
  1 tries remaining 
    subtract 1 count is 0
retry again
just error. 

The retry count is the count of retries not the count of total tries. I can subtract one from the config or update the docs to be explicitly clear it is the count of retries and not the total network tries?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right yeah that makes sense, nevermind!

@alexcrichton
Copy link
Member

Looks like the Windows CI is failing for non-spurious reasons?

@alexcrichton
Copy link
Member

@bors: r+ 2f04d00ae2b5ddce01b7f5890e5311e287ee198f

@bors
Copy link
Collaborator

bors commented May 12, 2016

⌛ Testing commit 2f04d00 with merge e2648ce...

@bors
Copy link
Collaborator

bors commented May 12, 2016

💔 Test failed - cargo-win-msvc-64

@sbeckeriv
Copy link
Contributor Author

Sorry I dont have access to a windows box right now. I might tonight. Does windows not load the .cargo/config file or does it output extra information?

@alexcrichton
Copy link
Member

It should load that, yeah, but I suspect the problem here is that the exact error message is slightly different. It should be possible to just use with_stderr_contains perhaps?

Dearest Reviewer,

This branch resolves rust-lang#1602 which relates to retrying network
issues automatically. There is a new utility helper for retrying
any network call.

There is a new config called net.retry value in the .cargo.config
file. The default value is 2. The documentation has also been
updated to reflect the new value.

Thanks
Becker
@sbeckeriv
Copy link
Contributor Author

Works locally. Lets see!

@alexcrichton
Copy link
Member

@bors: r+ 7b03532

bors added a commit that referenced this pull request May 12, 2016
Network retry issue 1602

Dearest Reviewer,

This branch resolves #1602 which relates to retrying network
issues automatically. There is a new utility helper for retrying
any network call.

There is a new config called net.retry value in the .cargo.config
file. The default value is 2. The documentation has also been
updated to reflect the new value.

Thanks
Becker
@bors
Copy link
Collaborator

bors commented May 12, 2016

⌛ Testing commit 7b03532 with merge a4a9491...

@bors
Copy link
Collaborator

bors commented May 12, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cargo fetch --retry N
4 participants