Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: recovering preHandleMetadata failure from sniffing #769

Merged
merged 1 commit into from
Sep 24, 2023
Merged

feat: recovering preHandleMetadata failure from sniffing #769

merged 1 commit into from
Sep 24, 2023

Conversation

imkiva
Copy link

@imkiva imkiva commented Sep 24, 2023

Motivation

This PR tries to handle some connection failure caused by Fake DNS cache invalidation, which usually happens in a setup where some caching DNS servers (like ADGuard Home) are using Clash's Fake DNS server as their upstream, with certain policy-based-routing implemented by examining the result of DNS queries.

The cache invalidation would occur if either of the following is met

  • the cache.db is corrupted for any reason (like switching to Clash Premium back and forth. I am migrating from Premium to Meta recently)
  • the Clash just restarted without properly storing the fake DNS cache
  • the store-fake-ip is set to false and some caching DNS servers are yet to be synchronized with the new value.

It is really hard to deny, that for simpler setups, the best solution is clearing the DNS cache everywhere, or mainly these caching DNS servers. But that would require human effort to maintain the status carefully, and even distract people. Compared to which, I prefer an automated solution like what is proposed here (honestly more of a workaround) since it costs less for more complicated scenarios like home-lab or small teams.

Approach

In this PR, we adapted the existing feature called "sniffing" to resolve the issue described above. This is done in the following steps:

  1. when preHandleMetadata fails to find the reverse mapping of a destination IP address, do not exit that early. Instead, use some variables to delay the error reporting.
  2. when sniffing is enabled, try to sniff the data on that connection, this is "free" since we always need to call TCPSniff and we are just making it return a flag, indicating whether a domain is discovered.
  3. if TCPSniff returns true, we clear the failure flag set in step 1 and continue the connection. Otherwise, give up, just like what we did before this PR.

I have tested it locally and I am quite satisfied with this solution, as both "tls certificate error" and "no route to host" reported by Chrome or OS are much fewer when the whole policy-based-routing system is recovering from an unexpected restart.

@wwqgtxx wwqgtxx merged commit 67d7e53 into MetaCubeX:Alpha Sep 24, 2023
@imkiva
Copy link
Author

imkiva commented Sep 24, 2023

Wow! Thanks for the quick reply!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants