Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curl::has_internet() breaks for those who work behind a proxy #150

Closed
cbizzo opened this issue Jan 16, 2020 · 16 comments
Closed

curl::has_internet() breaks for those who work behind a proxy #150

cbizzo opened this issue Jan 16, 2020 · 16 comments

Comments

@cbizzo
Copy link

cbizzo commented Jan 16, 2020

I left a comment on the commit also:
19a22f3#commitcomment-36824920

Perhaps first check if has_internet and if not, if http_proxy is defined?

@muuankarski
Copy link
Contributor

muuankarski commented Jan 16, 2020

Thanks, for bringing this up. Do you think it is still the case after this fix jeroen/curl@6fc0305 in curl::has_internet()

@JensXII
Copy link

JensXII commented Jan 19, 2020

If the fix is
has_internet <- function(){
!is.null(nslookup("google.com", error = FALSE))
}
then it works fine.
When and where will this fix be avaiable?

@cbizzo
Copy link
Author

cbizzo commented Jan 20, 2020

Hi I was away from the corporate proxy so couldn't text these over the weekend. Unfortunately both the update to curl and the original leave the same outcome: FALSE. I have time to test some other approaches today. If I think of something I will inform you.

@cbizzo
Copy link
Author

cbizzo commented Jan 20, 2020

OK, I've done some digging. If you like I can submit a pull request with the changes. I describe the process below.

First, after looking deeper into the issue, curl_fetch_memory also does not work inside my work, nor do most curl functions. That means eurostat::get_eurostat_json() and similar functions which make such a call would not work for me also before this change.

My usage of eurostat is typically of the form eurostat::get_eurostat(id, filters). Looking under the hood, I've realised that this in turn will call get_eurostat_raw() which relies on download.file().

On Windows, unless otherwise configured download.file() will use wininet.dll to download a file over http and https. This is the similar to using Internet Explorer to download a file.

From what I can tell, that my workplace allows Internet Explorer to access the internet, but does not allow RStudio to, without recourse to proxy information.

My solution is to remove the curl::has_internet() call from get_eurostat() , but to leave it in any function that directly uses curl. This way when get_eurostat_raw() is called, which does not currently have a has_internet() call inside the function will proceed to download.

If a function is needed to check if the internet is available via wininet (or more specifically, download.file() you might consider pingr::is_online() or just to use download.file on something you always expect to be up, e.g. "http://captive.apple.com/hotspot-detect.html". Like so:

# heavily influenced by https://github.com/r-lib/pingr/blob/master/R/http.R

has_internet_download.file <- function(){ 
  temp <- tempfile()
     suppressWarnings(try(download.file("http://captive.apple.com/html", temp, quiet= TRUE), silent = TRUE))
     if (is.na(file.info(temp)$size)){
           FALSE
    }
     else{
         data<- readChar(temp, file.info(temp)$size)
         grepl("Success", readChar(temp, file.info(temp)$size), data)
    }
}

@antagomir
Copy link
Member

Doing some testing now. If @muuankarski has any comments that might help.

@muuankarski
Copy link
Contributor

muuankarski commented Jan 22, 2020

Ok, had a look at this using curl version 4.3 in Windows behind a proxy. As pointed out by @cbizzo eurostat uses download.file() for data downloads that do work in Windows behind a proxy with no need for http_proxy/https_proxy- environment variables. Therefore eurostat has worked for people using it in such settings.

However, we need to comply with CRAN policy...

Packages which use Internet resources should fail gracefully with an
informative message if the resource is not available (and not give a
check warning nor error)

...and need a way to prevent error when resource is not available,

I was not able to produce FALSE with curl::has_internet() when proxy-variables were set empty. curl::curl_download() only works with proper proxy-settings. (In linux, behind a proxy, you need to define the proxy-settings even with download.file().) This is how it was checked.

# EU country level geojson
url <- "https://ec.europa.eu/eurostat/cache/GISCO/distribution/v2/nuts/geojson/NUTS_RG_60M_2006_4326_LEVL_0.geojson"
tmpfile <- tempfile()

# Empty proxy settings
Sys.setenv(http_proxy = "")
Sys.setenv(https_proxy = "")

curl::has_internet()
# [1] TRUE
download.file(url, tmpfile, mode = "wb")
# Content type 'text/plain' length 141040 bytes (137 KB)
# downloaded 137 KB

curl::curl_download(url, tmpfile)
# Error in curl::curl_download(url, tmpfile) : 
#   Failed to connect to ec.europa.eu port 443: Timed out

# Corporate proxy settings
Sys.setenv(http_proxy = "put your proxy settings here")
Sys.setenv(https_proxy = "put your proxy settings here")

curl::has_internet()
# [1] TRUE
download.file(url, tmpfile, mode = "wb")
# Content type 'text/plain' length 141040 bytes (137 KB)
# downloaded 137 KB

curl::curl_download(url, tmpfile)
# success

Although I can't reproduce the behaviour, I agree that we should find an replacement for has_internet(). However, there are users with offline environments that can only access to few url's in internet, http://captive.apple.com not being one of them. For them it would always produce an error although they do have access to ec.europa.eu. Therefore, would something like this do the trick?

has_internet_download.file <- function(){ 
  temp <- tempfile()
  suppressWarnings(try(download.file("http://ec.europa.eu/eurostat/cache/GISCO/distribution/v2/nuts/geojson/NUTS_RG_60M_2006_4326_LEVL_0.geojson", temp, quiet= TRUE), silent = TRUE))
  if (is.na(file.info(temp)$size)){
    FALSE
  }
  else{
    # data<- readChar(con = temp, nchars = file.info(temp)$size)
    # grepl("Success", readChar(temp, file.info(temp)$size), data)
    TRUE
  }
}

Any ideas?

@antagomir
Copy link
Member

My impression is that this would do the trick, and would be ready to proceed with it. Eagerly looking fwd to comments from @cbizzo and @JensXII

@JensXII
Copy link

JensXII commented Jan 22, 2020

I am a ignorant on network setup. Hence, I cannot comment on, if I think it will work.
I have two computers. One where curl::has_internet() TRUE and one with FALSE.
I have looked at the system evironments Sys.getenv() on both of them. None of them have anything with http_proxy or anything with proxy. They are actually alike.
On the other hand, having these two computers, I might be able to test. But you have to tell me exactly what to do.

@muuankarski
Copy link
Contributor

download.file() hangs the system in unix if there is no access to resource. I implemented a slightly improved solution in branch https://github.com/rOpenGov/eurostat/tree/feature/check_internet

@JensXII perhaps you can run the lines below and report if it works for you?

remotes::install_github("ropengov/eurostat", ref = "feature/check_internet")
library(eurostat)
packageVersion("eurostat") # Should be ‘3.4.20006’

# Set proxies empty
Sys.setenv(http_proxy = "")
Sys.setenv(https_proxy = "")
# check they really are empty
Sys.getenv("http_proxy")
Sys.getenv("https_proxy")

eurostat::clean_eurostat_cache()
eurostat::search_eurostat(pattern = "Fertility")
eurostat::get_eurostat(id = "demo_r_frate2")

In unix, behind a proxy with no proxy defined, this should gracefully fail and produce a message

You have no access to ec.europe.eu. 
             Please check your connection and/or review your proxy settings

@cbizzo
Copy link
Author

cbizzo commented Jan 23, 2020

Hi I finally got around to looking. This fix fixes behind the firewall for our uses. Thanks a lot for taking care.

@cbizzo
Copy link
Author

cbizzo commented Jan 23, 2020

If it's pulled into master we can close the issue. Thanks.

@JensXII
Copy link

JensXII commented Jan 23, 2020

I have done the testing suggested by muuankarski, but changed to get_eurostat to what caused me problem in the first place.
As you can see below and cbizzo already has concluded - it work.

Hence, I agree, get the cran-package updated, and close the issue.

Thank you, for the quick solving of the problem :-)

  1. on a computer with curl::has_internet() FALSE:

curl::has_internet()
[1] FALSE

remotes::install_github("ropengov/eurostat", ref = "feature/check_internet")
Skipping install of 'eurostat' from a github remote, the SHA1 (06c1fc0) has not changed since last install.
Use force = TRUE to force installation
library(eurostat)
packageVersion("eurostat") # Should be ‘3.4.20006’
[1] ‘3.4.20006’

Set proxies empty

Sys.setenv(http_proxy = "")
Sys.setenv(https_proxy = "")

check they really are empty

Sys.getenv("http_proxy")
[1] ""
Sys.getenv("https_proxy")
[1] ""

eurostat::clean_eurostat_cache()
Deleted .rds files from C:\Users\NLS\AppData\Local\Temp\4\RtmpA3HIko/eurostat
eurostat::search_eurostat(pattern = "Fertility")

A tibble: 10 x 8

title code type last update of data last table structure change data start data end values

1 Fertility rates by age and NUTS 2 region demo_r_frate2 dataset 13.03.2019 10.05.2019 1990 2017 NA
2 Fertility indicators by NUTS 2 region demo_r_find2 dataset 15.03.2019 10.05.2019 1990 2017 NA
3 Fertility indicators by NUTS 3 region demo_r_find3 dataset 17.10.2019 28.02.2019 2013 2017 NA
4 Fertility and mortality - cities and greater cities urb_cfermor dataset 21.01.2020 21.01.2020 1990 2019 NA
5 Fertility and mortality - functional urban areas urb_lfermor dataset 21.01.2020 21.01.2020 1990 2019 NA
6 Fertility indicators demo_find dataset 06.11.2019 15.08.2019 1960 2017 NA
7 Fertility rates by age demo_frate dataset 06.11.2019 06.11.2019 1960 2017 NA
8 Fertility rates by age and NUTS 2 region demo_r_frate2 dataset 13.03.2019 10.05.2019 1990 2017 NA
9 Fertility indicators by NUTS 2 region demo_r_find2 dataset 15.03.2019 10.05.2019 1990 2017 NA
10 Fertility indicators by NUTS 3 region demo_r_find3 dataset 17.10.2019 28.02.2019 2013 2017 NA

eurostat::get_eurostat(id = "demo_r_frate2")

pop_data <- subset(eurostat::get_eurostat("demo_r_pjangrp3", time_format = "num"),

  •                (age == "TOTAL") & (sex == "T") & (nchar(trimws(geo)) == 5))[, c("time","geo","values")]
    

trying URL 'https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data%2Fdemo_r_pjangrp3.tsv.gz'
Content type 'application/octet-stream;charset=UTF-8' length 2294016 bytes (2.2 MB)
downloaded 2.2 MB

Table demo_r_pjangrp3 cached at C:\Users\NLS\AppData\Local\Temp\4\RtmpA3HIko/eurostat/demo_r_pjangrp3_num_code_TF.rds

  1. on a computer with curl::has_internet() TRUE:

curl::has_internet()
[1] TRUE
remotes::install_github("ropengov/eurostat", ref = "feature/check_internet")
Downloading GitHub repo ropengov/eurostat@feature/check_internet
These packages have more recent versions available.
Which would you like to update?

1: All
2: CRAN packages only
3: None
4: tidyr (0.8.3 -> 1.0.0) [CRAN]
5: hms (0.5.0 -> 0.5.3) [CRAN]

Enter one or more numbers, or an empty line to skip updates:
library(eurostat)
Enter one or more numbers, or an empty line to skip updates:
packageVersion("eurostat") # Should be ‘3.4.20006’
Enter one or more numbers, or an empty line to skip updates:
2
tidyr (0.8.3 -> 1.0.0) [CRAN]
hms (0.5.0 -> 0.5.3) [CRAN]
Installing 2 packages: tidyr, hms
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.5/tidyr_1.0.0.zip'
Content type 'application/zip' length 1293390 bytes (1.2 MB)
downloaded 1.2 MB

trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.5/hms_0.5.3.zip'
Content type 'application/zip' length 107433 bytes (104 KB)
downloaded 104 KB

package ‘tidyr’ successfully unpacked and MD5 sums checked
package ‘hms’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
C:\Users\nls\AppData\Local\Temp\RtmpcbV1oF\downloaded_packages
√ checking for file 'C:\Users\nls\AppData\Local\Temp\RtmpcbV1oF\remotes1bd47e63168d\rOpenGov-eurostat-06c1fc0/DESCRIPTION' (1.9s)

  • preparing 'eurostat': (1.9s)
    √ checking DESCRIPTION meta-information ...
  • checking for LF line-endings in source and make files and shell scripts (855ms)
  • checking for empty or unneeded directories
  • looking to see if a 'data/datalist' file should be added
  • building 'eurostat_3.4.20006.tar.gz' (422ms)
  • installing source package 'eurostat' ...
    ** R
    ** data
    *** moving datasets to lazyload DB
    ** inst
    ** byte-compile and prepare package for lazy loading
    ** help
    *** installing help indices
    converting help for package 'eurostat'
    finding HTML links ... done
    check_access_to_data html
    clean_eurostat_cache html
    convert_time_col html
    cut_to_classes html
    dic_order html
    eu_countries html
    eurostat-package html
    eurostat_geodata_60_2016 html
    eurotime2date html
    eurotime2num html
    get_bibentry html
    get_eurostat html
    get_eurostat_dic html
    get_eurostat_geospatial html
    get_eurostat_json html
    get_eurostat_raw html
    get_eurostat_toc html
    harmonize_country_code html
    label_eurostat html
    search_eurostat html
    set_eurostat_toc html
    tgs00026 html
    tidy_eurostat html
    ** building package indices
    ** installing vignettes
    ** testing if installed package can be loaded
    *** arch - i386
    *** arch - x64
  • DONE (eurostat)
    In R CMD INSTALL
    Warning messages:
    1: In untar2(tarfile, files, list, exdir) :
    skipping pax global extended headers
    2: In untar2(tarfile, files, list, exdir) :
    skipping pax global extended headers

Set proxies empty

Sys.setenv(http_proxy = "")
Sys.setenv(https_proxy = "")

check they really are empty

Sys.getenv("http_proxy")
[1] ""
Sys.getenv("https_proxy")
[1] ""

eurostat::clean_eurostat_cache()
The cache does not exist
eurostat::search_eurostat(pattern = "Fertility")

A tibble: 10 x 8

title code type last update of dat~ last table structure chang~ data start data end values

1 Fertility rates by age and NUTS 2 region demo_r_frate2 dataset 13.03.2019 10.05.2019 1990 2017 NA
2 Fertility indicators by NUTS 2 region demo_r_find2 dataset 15.03.2019 10.05.2019 1990 2017 NA
3 Fertility indicators by NUTS 3 region demo_r_find3 dataset 17.10.2019 28.02.2019 2013 2017 NA
4 Fertility and mortality - cities and greater ci~ urb_cfermor dataset 21.01.2020 21.01.2020 1990 2019 NA
5 Fertility and mortality - functional urban areas urb_lfermor dataset 21.01.2020 21.01.2020 1990 2019 NA
6 Fertility indicators demo_find dataset 06.11.2019 15.08.2019 1960 2017 NA
7 Fertility rates by age demo_frate dataset 06.11.2019 06.11.2019 1960 2017 NA
8 Fertility rates by age and NUTS 2 region demo_r_frate2 dataset 13.03.2019 10.05.2019 1990 2017 NA
9 Fertility indicators by NUTS 2 region demo_r_find2 dataset 15.03.2019 10.05.2019 1990 2017 NA
10 Fertility indicators by NUTS 3 region demo_r_find3 dataset 17.10.2019 28.02.2019 2013 2017 NA

eurostat::get_eurostat(id = "demo_r_frate2")
trying URL 'https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data%2Fdemo_r_frate2.tsv.gz'
Content type 'application/octet-stream;charset=UTF-8' length 907647 bytes (886 KB)
downloaded 886 KB

Table demo_r_frate2 cached at C:\Users\nls\AppData\Local\Temp\RtmpcbV1oF/eurostat/demo_r_frate2_date_code_TF.rds

A tibble: 431,470 x 5

unit age geo time values

1 NR TOTAL AL0 2017-01-01 1.47
2 NR TOTAL AT 2017-01-01 1.52
3 NR TOTAL AT1 2017-01-01 1.46
4 NR TOTAL AT11 2017-01-01 1.36
5 NR TOTAL AT12 2017-01-01 1.57
6 NR TOTAL AT13 2017-01-01 1.41
7 NR TOTAL AT2 2017-01-01 1.48
8 NR TOTAL AT21 2017-01-01 1.51
9 NR TOTAL AT22 2017-01-01 1.46
10 NR TOTAL AT3 2017-01-01 1.62

... with 431,460 more rows

pop_data <- subset(eurostat::get_eurostat("demo_r_pjangrp3", time_format = "num"),

  •                (age == "TOTAL") & (sex == "T") & (nchar(trimws(geo)) == 5))[, c("time","geo","values")]
    

trying URL 'https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data%2Fdemo_r_pjangrp3.tsv.gz'
Content type 'application/octet-stream;charset=UTF-8' length 2294016 bytes (2.2 MB)
downloaded 2.2 MB

Table demo_r_pjangrp3 cached at C:\Users\nls\AppData\Local\Temp\RtmpcbV1oF/eurostat/demo_r_pjangrp3_num_code_TF.rds

@antagomir
Copy link
Member

@muuankarski when you are ready, can you merge to devel branch; I can take care of the rest of testing, CRAN, and master then.

@muuankarski
Copy link
Contributor

merged in to devel

@antagomir
Copy link
Member

Fixed in master now & on the way to CRAN.

@antagomir
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants