Replace server address by IP in curl

I mean to use the IP instead of the server name, to directly curl a file from http://.... I am under Msys2, Win 10 (that's why post here and not in askubuntu, e.g.), but I guess it would be be the same in Linux.

I couldn't make this work. I post below the details of what I tried. I had a similar failure using wget. I wrote a separate post, as I am not sure the explanations and solutions are the same as here.

What is the correct way of doing this?

Note: Using curl ftp://<IP>/... instead of curl http://<IP>/... worked fine.

This is what I tried:

Obtain the IP address for the server.

    $ ping us.archive.ubuntu.com
    
    Haciendo ping a us.archive.ubuntu.com [91.189.91.38] con 32 bytes de datos:
    Respuesta desde 91.189.91.38: bytes=32 tiempo=173ms TTL=52
    Respuesta desde 91.189.91.38: bytes=32 tiempo=166ms TTL=52
    Respuesta desde 91.189.91.38: bytes=32 tiempo=172ms TTL=52
    
    Estadísticas de ping para 91.189.91.38:
        Paquetes: enviados = 3, recibidos = 3, perdidos = 0
        (0% perdidos),
    Tiempos aproximados de ida y vuelta en milisegundos:
        Mínimo = 166ms, Máximo = 173ms, Media = 170ms
    Control-C

Try curling the file using the server name. It works ok.

$ curl -L http://us.archive.ubuntu.com/ubuntu/pool/universe/y/yudit/yudit-common_2.9.6-7_all.deb --output yudit-common_2.9.6-7_all.deb
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1599k  100 1599k    0     0   256k      0  0:00:06  0:00:06 --:--:--  344k

Try curling the file using the IP address. It does not work. Appending --header "Host:us.archive.ubuntu.com" to the command line produces exactly the same result. I am not sure this discards the "Host headers" problem as a cause.

$ curl -L http://91.189.91.39/ubuntu/pool/universe/y/yudit/yudit-common_2.9.6-7_all.deb --output yudit-common_2.9.6-7_all.deb
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   274  100   274    0     0     76      0  0:00:03  0:00:03 --:--:--    76

$ cat yudit-common_2.9.6-7_all.deb
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL was not found on this server.</p>
<hr>
<address>Apache/2.4.29 (Ubuntu) Server at 91.189.91.39 Port 80</address>
</body></html>

EDIT Following the answer by gronostaj I tried two commands.

A. This worked (same as item 2 above),

    $ curl -v --resolve us.archive.ubuntu.com:80:91.189.91.39 -L http://us.archive.ubuntu.com/ubuntu/pool/universe/y/yudit/yudit-common_2.9.6-7_all.deb -- output yudit-common_2.9.6-7_all.deb
    ...
    <
    { [7725 bytes data]
      0 1599k    0  7725    0     0   5360      0  0:05:05  0:00:01  0:05:04  7874* STATE: PERFORM => DONE handle 0x800744e0; line 2199 (connection #0)
    * multi_done
    100 1599k  100 1599k    0     0   675k      0  0:00:02  0:00:02 --:--:--  838k
    * Connection #0 to host us.archive.ubuntu.com left intact

B. This didn't (same as item 3 above).

    $ curl -v --resolve us.archive.ubuntu.com:80:91.189.91.39 -L http://91.189.91.39/ubuntu/pool/universe/y/yudit/yudit-common_2.9.6-7_all.deb --output yu dit-common_2.9.6-7_all.deb
    ...
    <
    { [274 bytes data]
    100   274  100   274    0     0    434      0 --:--:-- --:--:-- --:--:--   444* STATE: PERFORM => DONE handle 0x800744c8; line 2199 (connection #0)
    * multi_done
    100   274  100   274    0     0    430      0 --:--:-- --:--:-- --:--:--   439
    * Connection #0 to host 91.189.91.39 left intact

I wonder whether B is the right fix for the failing item 3, or it is actually using the server name and not the direct IP (as in item 2).

The server doesn't "just know" which domain was requested: the client is resolving the domain name itself and connecting directly to the IP. It turned out that the ability to serve multiple websites from a single IP would be handy, so the Host header was introduced in a revision of the HTTP standard. A spec-conforming HTTP client will extract the domain from request URL and send it in a Host header:

_{Example 1}

$ curl -v superuser.com 
* Rebuilt URL to: superuser.com/
*   Trying 151.101.1.69...
* TCP_NODELAY set
* Connected to superuser.com (151.101.1.69) port 80 (#0)
> GET / HTTP/1.1
> Host: superuser.com
> User-Agent: curl/7.58.0
> Accept: */*
> 
< HTTP/1.1 301 Moved Permanently
< cache-control: no-cache, no-store, must-revalidate
< location: https://superuser.com/
[...]
< 
* Connection #0 to host superuser.com left intact

The client sends a Host: superuser.com header in a request to superuser.com's IP. The server replies requesting a redirect to HTTPS version of the site. There's no document body, which makes sense since the browser should redirect you. curl won't do this without -L.

Now let's try to use the IP directly:

_{Example 2}

$ curl -v 151.101.1.69             
* Rebuilt URL to: 151.101.1.69/
*   Trying 151.101.1.69...
* TCP_NODELAY set
* Connected to 151.101.1.69 (151.101.1.69) port 80 (#0)
> GET / HTTP/1.1
> Host: 151.101.1.69
> User-Agent: curl/7.58.0
> Accept: */*
> 
< HTTP/1.1 500 Domain Not Found
< Server: Varnish
[...]
< 

<html>
<head>
<title>Fastly error: unknown domain 151.101.1.69</title>
</head>
<body>
<p>Fastly error: unknown domain: 151.101.1.69. Please check that this domain has been added to a service.</p>
* Connection #0 to host 151.101.1.69 left intact
<p>Details: cache-ams21021-AMS</p></body></html>

curl sent the IP in the Host header and the response is a 500 error with body detailing the problem. The server doesn't serve the domain provided in Host header.

Let's provide the header manually:

_{Example 3}

$ curl -H 'Host: superuser.com' -v 151.101.1.69
* Rebuilt URL to: 151.101.1.69/
*   Trying 151.101.1.69...
* TCP_NODELAY set
* Connected to 151.101.1.69 (151.101.1.69) port 80 (#0)
> GET / HTTP/1.1
> Host: superuser.com
> User-Agent: curl/7.58.0
> Accept: */*
> 
< HTTP/1.1 301 Moved Permanently
< cache-control: no-cache, no-store, must-revalidate
< location: https://superuser.com/
[...]
< 
* Connection #0 to host 151.101.1.69 left intact

And we've got the redirect again, as expected. The server doesn't "just know" that the request was made by providing IP directly, because it's always made like that: it's the client that's responsible for resolving the domain name. It turned out that the ability to serve multiple websites from a single IP would be handy, so the Host header was introduced in a revision of the HTTP standard.

Unfortunately this won't work with HTTPS. HTTPS is basically HTTP wrapped in TLS. The TLS connection needs to be set up before anything is sent over HTTP. This process involves the server providing appropriate certificate for the requested domain. Knowledge of the domain is required for this, so we're back to square one. This issue is resolved by SNI, an extension for TLS that specifies how the client can communicate the domain to server so the correct certificate can be used.

You can simulate this with curl using --resolve:

_{Example 4}

$ curl -v --resolve superuser.com:443:151.101.65.69 https://superuser.com
* Added superuser.com:443:151.101.65.69 to DNS cache
* Rebuilt URL to: https://superuser.com/
* Hostname superuser.com was found in DNS cache
[...]
* Connected to superuser.com (151.101.65.69) port 443 (#0)
[...]
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=*.stackexchange.com
*  start date: Aug  7 13:01:00 2020 GMT
*  expire date: Nov  5 13:01:00 2020 GMT
*  subjectAltName: host "superuser.com" matched cert's "superuser.com"
*  issuer: C=US; O=Let's Encrypt; CN=Let's Encrypt Authority X3
*  SSL certificate verify ok.
[...]
> GET / HTTP/2
> Host: superuser.com
> User-Agent: curl/7.58.0
> Accept: */*
> 
[...]
< HTTP/2 200 
< cache-control: private
< content-type: text/html; charset=utf-8
[...]
<!DOCTYPE html>
[...]

--resolve bypasses DNS resolution for given host. As the manual puts it, it's "a sort of /etc/hosts alternative". The argument syntax is <host>:<port>:<ip>. So this command:

_{Example 5}

curl -v --resolve superuser.com:443:151.101.65.69 https://superuser.com

Means:

-v: be verbose (print headers and TLS details)
--resolve superuser.com:443:151.101.65.69: if connecting to superuser.com at port 443, actually use IP 151.101.65.69
https://superuser.com: make request using HTTPS to superuser.com

As to why the domain has to be repeated twice, it makes sense when more than one request would be made for single curl invocation, for example due to redirects and -L being provided:

_{Example 6}

$ curl -v --resolve superuser.com:443:151.101.65.69 -L http://superuser.com

This command will first resolve superuser.com using DNS. --resolve doesn't apply for this request because it's specified for port 443 and we're connecting over HTTP, on port 80. Server responds with a 301 redirect to https://superuser.com. We've specified -L, so curl will make a 2nd request to that URL. This time it's over HTTPS on port 443 and we've specified an IP for this host and port using --resolve, so the specified IP will be used (previous DNS lookup is ignored). The Host header is generated for superuser.com in both cases because that's what we're requesting.

Here's actual curl output. Note that the second request results in a "Hostname superuser.com was found in DNS cache" message, that's --resolve in action.

_{Example 6 (continued)}

* Added superuser.com:443:151.101.65.69 to DNS cache
* Rebuilt URL to: http://superuser.com/
*   Trying 151.101.65.69...
* TCP_NODELAY set
* Connected to superuser.com (151.101.65.69) port 80 (#0)
> GET / HTTP/1.1
> Host: superuser.com
> User-Agent: curl/7.58.0
> Accept: */*
> 
< HTTP/1.1 301 Moved Permanently
< cache-control: no-cache, no-store, must-revalidate
< location: https://superuser.com/
[...]
* Ignoring the response-body
[...]
* Connection #0 to host superuser.com left intact
* Issue another request to this URL: 'https://superuser.com/'
* Hostname superuser.com was found in DNS cache
*   Trying 151.101.65.69...
* TCP_NODELAY set
* Connected to superuser.com (151.101.65.69) port 443 (#1)
[...]

Further clarification on correct usage of `--resolve`

When using --resolve, you must request the domain name (, not the IP directly. Requesting IP will:

Generate Host header for the IP rather than domain,
At the SNI step declare that you're accessing the IP directly rather than through a domain name (if HTTPS used),
Cause --resolve to not apply, because --resolve bypasses domain name resolution and when no domain name is provided, there's no need for domain name resolution.

So you want this:

_{Example 7}

curl --resolve example.com:80:93.184.216.34 http://example.com

Rather than this:

_{Example 8}

curl --resolve example.com:80:93.184.216.34 http://93.184.216.34

In example 7, curl will use the IP address provided with --resolve, not the one example.com would be resolved to by DNS.

When does `--resolve` apply

Each --resolve (multiple are allowed) consists of 3 components: host, port, and IP. --resolve applies to a request if host and port match, in that case DNS resolution for this particular request is bypassed and IP from matching --resolve is used. In many cases a single curl invocation makes only one request, in that case --resolve makes sense only if its host and port matches request's host and port. So this call doesn't make sense, because --resolve will never match due to port mismatch (HTTPS uses 443 by default):

_{Example 9}

curl --resolve example.com:80:93.184.216.34 https://example.com

When does curl make more than one request per invocation? The case I'm aware of is when -L is provided and the first request results in a 3xx response (it's the family of redirection responses, see httpstatuses.com). These responses come with a Location header that tells the browser to make another request to address provided in that header. Without -L, curl will simply print the 3xx response. With -L it will make another request like a browser would. (Note that the 2nd request can result in a 3xx response too, generating a 3rd one, and so on).

For example a HTTP request to superuser.com results in a 301 response with a redirect to HTTPS version, see Example 1 where the Location header is shown. With -L you'd get a response identical as if you'd request the HTTPS version in the first place. HTTP and HTTPS use different ports (80 and 443), so you need two --resolves in this case, one for each port. You can also intentionally specify only one to override domain name resolution only for HTTP (or HTTPS) request, leaving the other one pointing to actual IP the DNS would return.

Replace server address by IP in curl

Further clarification on correct usage of `--resolve`

When does `--resolve` apply

Related

Recent Posts

Replace server address by IP in curl

Further clarification on correct usage of --resolve

When does --resolve apply

Related

Recent Posts

Further clarification on correct usage of `--resolve`

When does `--resolve` apply