Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure HTTPS is used for definition cache download #5184

Closed
aleksandrs-ledovskis opened this issue Jan 25, 2021 · 21 comments · Fixed by #5464
Closed

Ensure HTTPS is used for definition cache download #5184

aleksandrs-ledovskis opened this issue Jan 25, 2021 · 21 comments · Fixed by #5464
Labels
bug Issues with our clients or rendering of pages, etc. clients Issues pertaining to a particular client or the clients as whole. decision A (possibly breaking) decision regarding tldr-pages content, structure, infrastructure, etc.

Comments

@aleksandrs-ledovskis
Copy link

aleksandrs-ledovskis commented Jan 25, 2021

Basically this is request to re-evaluate findings from #2253 (comment)

When default tldr client requests definition cache it does download https://tldr-pages.github.io/assets/tldr.zip

Problem: The link has an unsecure/plain HTTP hop which can be seen using cURL:

$ curl -vL 'https://tldr-pages.github.io/assets/tldr.zip'
*   Trying 185.199.111.153...
* TCP_NODELAY set
* Connected to tldr-pages.github.io (185.199.111.153) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* SSL stuff...
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7ff05000d800)
> GET /assets/tldr.zip HTTP/2
> Host: tldr-pages.github.io 
> User-Agent: curl/7.54.0
> Accept: */*
>
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
< HTTP/2 301
< content-type: text/html
< server: github.com
< location: http://tldr.sh/assets/tldr.zip
< x-github-request-id: 15D0:08FF:4A88CE:4FC8C2:600F05FC
< accept-ranges: bytes
< date: Mon, 25 Jan 2021 18:14:51 GMT
< via: 1.1 varnish
< age: 1182
< x-served-by: cache-bma1635-BMA
< x-cache: HIT
< x-cache-hits: 1
< x-timer: S1611598492.730806,VS0,VE1
< vary: Accept-Encoding
< x-fastly-request-id: 27a137c450f28daff2f2e5216128860de8798473
< content-length: 162

The problem, obviously is location: http://tldr.sh/assets/tldr.zip reply header.

As unsecure redirect is advised, the tldr.zip file can be trivially MITM'ed and poisoned by unknown party to do some nefarious command suggestion or exploit a (yet unknown) Markdown processing bug. Definition cache file is not cryptographically signed, therefore a malicious file detection on client side is impossible.

Cause: Reading the discussion linked above it seems that tldr-pages.github.io repository's settings does not have "Enforce HTTPS" checkbox value selected due to a configuration conflict with tldr.sh domain being fronted by Cloudflare CDN.

Suggestion: Unless there's calculated benefit in using Cloudflare CDN to front apex tldr.sh domain a security conscious choice would be to use direct A record assignment for GitHub Pages (as described here). It would thus skip Cloudflare altogether and allow to toggle "Enforce HTTPS" in GitHub panel.

@bl-ue bl-ue added architecture Organization of the pages per language, platform, etc. bug Issues with our clients or rendering of pages, etc. clients Issues pertaining to a particular client or the clients as whole. decision A (possibly breaking) decision regarding tldr-pages content, structure, infrastructure, etc. labels Jan 25, 2021
@bl-ue
Copy link
Contributor

bl-ue commented Jan 25, 2021

@sbrl
Copy link
Member

sbrl commented Jan 31, 2021

Just checked the repository settings. Apparently Enforce HTTPS is not available "because your domain is not configured to support HTTPS:

image

I think this has come up before. I guess we're waiting on @Ostera to reply here. According to the GitHub docs:

GitHub Pages sites using custom domains that are correctly configured with CNAME, ALIAS, ANAME, or A DNS records can be accessed over HTTPS. For more information, see "Securing your GitHub Pages site with HTTPS."

I didn't set it up, but I think the idea of using CloudFlare is due to the large number of requests that will likely hit archive.zip, Cloudflare can alleviate the probably high load from hitting GitHub.

@aleksandrs-ledovskis
Copy link
Author

aleksandrs-ledovskis commented Jan 31, 2021

If TLDR's CloudFlare traffic is reasonably below 100 GB/month, then it fits within GitHub page usage guidelines "GitHub Pages sites have a soft bandwidth limit of 100GB per month."

Alternatively (for high traffic situations), GitHub suggests "... making use of other GitHub features such as releases".
Per docs "There is no limit on the total size of a release, nor bandwidth usage".

So it seems that CloudFlare [proxying] isn't a must in any scenario.

@leostera
Copy link
Contributor

The domain is currently using the CloudFlare DNS, but I didn't do that set up. I only pointed the domain there. 🤔

Perhaps @waldyrious or @agnivade know more about that part of the infra?

@aleksandrs-ledovskis
Copy link
Author

aleksandrs-ledovskis commented Jan 31, 2021

The DNS part of CloudFlare can stay. You just need to ensure that "orange cloud"/proxying part of CloudFlare is disabled for said domain record.

See https://community.cloudflare.com/t/step-3-enabling-the-orange-cloud/52715

@agnivade
Copy link
Member

I have access to Cloudflare. Will take a look.

@agnivade
Copy link
Member

agnivade commented Feb 1, 2021

Total bandwidth last month was 47.88 GB, in which 11.91 GB was cached. So CF is essentially serving 23% of the traffic only. We can bypass Cloudflare for sure, but wondering if it's possible to just point to https://tldr.sh/assets/tldr.zip instead?

@aleksandrs-ledovskis
Copy link
Author

aleksandrs-ledovskis commented Feb 1, 2021

@agnivade

If by "point to" you mean exchange link for definition cache in clients - that's doable. Several TLDR clients (official Python and Node included) currently use "http://tldr-pages.github.io/assets/tldr.zip".

On another hand, I think(?) that TLDR client's are not auto-updated so anyone stuck on with older versions (pre such link change) would not benefit.

Also from different angle, it would seem to be mandated to update CLIENT-SPECIFICATION making it a httpS://tldr.sh/assets/tldr.zip

@agnivade
Copy link
Member

agnivade commented Feb 1, 2021

Great points. Agree with all of them.

  • We can open an issue ccing all client owners to update the cache location to the CF one. (We did this before already once)
  • We would also need to update the client spec.

Since this is anyways a cache, temporary outages in CF, if any, should be bearable.

@sbrl @mebeim @owenvoke - if the above sounds reasonable to you, we should go ahead with the changes.

@sbrl
Copy link
Member

sbrl commented Feb 2, 2021

This sounds like a reasonable solution. tldr.sh still points at GitHub - it's just a different domain name, correct?

@agnivade
Copy link
Member

agnivade commented Feb 3, 2021

Yes, a different domain name fronted by CloudFlare.

@sbrl
Copy link
Member

sbrl commented Feb 12, 2021

Cool. Let's do it!

/cc @owenvoke, @mebeim

@mebeim
Copy link
Member

mebeim commented Mar 9, 2021

Not sure I understand what exactly you guys are suggesting to do. Do you mean we should contact all clients telling them to use another link? Which one is the "good" one precisely?

@agnivade
Copy link
Member

Yes, that was my suggestion. The good link is "https://tldr.sh/assets/tldr.zip". The alternative is to bypass CloudFlare altogether.

@sbrl
Copy link
Member

sbrl commented Mar 13, 2021

Perhaps an update to the client spec is in order.

Specifically in this section: https://github.com/tldr-pages/tldr/blob/master/CLIENT-SPECIFICATION.md#caching

....we mention tldr.sh OR raw.githubusercontent?

@bl-ue
Copy link
Contributor

bl-ue commented Mar 13, 2021

Haha then we should delay #5428 :)

@MasterOdin
Copy link
Collaborator

MasterOdin commented Mar 14, 2021

Perhaps an update to the client spec is in order.

Specifically in this section: https://github.com/tldr-pages/tldr/blob/master/CLIENT-SPECIFICATION.md#caching

....we mention tldr.sh OR raw.githubusercontent?

I would vote strongly for raw.githubusercontent.com. It is a well-known source of truth for GH resources and very unlikely to suffer issues (e.g. domain renewal, DNS, etc.) that wouldn't also affect tldr.sh, while tldr.sh can suffer from them.

@MasterOdin
Copy link
Collaborator

Actually, using tldr.sh is probably the better move given that it's served from behind the CloudFlare CDN, which helps to avoid the soft limits GitHub has in place on its own services (I think it sort of exists for raw.githubusercontent.com, definitely exists for GH pages).

@mebeim
Copy link
Member

mebeim commented Mar 15, 2021

I'd say that we can simply add an s to the current client spec:

-If appropriate, it is RECOMMENDED that clients implement a cache of pages. If implemented, clients MUST download the archive either from **[http://tldr.sh/assets/tldr.zip](http://tldr.sh/assets/tldr.zip)** or [https://raw.githubusercontent.com/tldr-pages/tldr-pages.github.io/master/assets/tldr.zip](https://raw.githubusercontent.com/tldr-pages/tldr-pages.github.io/master/assets/tldr.zip) (which is pointed to by the first link).
+If appropriate, it is RECOMMENDED that clients implement a cache of pages. If implemented, clients MUST download the archive either from **[https://tldr.sh/assets/tldr.zip](https://tldr.sh/assets/tldr.zip)** or [https://raw.githubusercontent.com/tldr-pages/tldr-pages.github.io/master/assets/tldr.zip](https://raw.githubusercontent.com/tldr-pages/tldr-pages.github.io/master/assets/tldr.zip) (which is pointed to by the first link).

We can do this before merging #5428 (then update #5428 with the correct permalink) as @bl-ue suggests.

In any case, a mailing list for tldr clients' developers seems like a good idea to notify of changes like this one, we could do this through a Google group.

@bl-ue
Copy link
Contributor

bl-ue commented Mar 15, 2021

@mebeim see #5377 (comment). What do you think about it?

@bl-ue bl-ue removed the architecture Organization of the pages per language, platform, etc. label Mar 15, 2021
@mebeim
Copy link
Member

mebeim commented Mar 16, 2021

@bl-ue I think it makes sense, I like the idea!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issues with our clients or rendering of pages, etc. clients Issues pertaining to a particular client or the clients as whole. decision A (possibly breaking) decision regarding tldr-pages content, structure, infrastructure, etc.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants