When should one use CONNECT and GET HTTP methods at HTTP Proxy Server?
TL;DR a web client uses CONNECT
only when it knows it talks to a proxy and the final URI begins with https://
.
When a browser says:
CONNECT www.google.com:443 HTTP/1.1
it means:
Hi proxy, please open a raw TCP connection to google; any following bytes I write, you just repeat over that connection without any interpretation. Oh, and one more thing. Do that only if you talk to Google directly, but if you use another proxy yourself, instead you just tell them the same
CONNECT
.
Note how this says nothing about TLS (https). In fact CONNECT
is orthogonal to TLS; you can have only one, you can have other, or you can have both of them.
That being said, the intent of CONNECT
is to allow end-to-end encrypted TLS session, so the data is unreadable to a proxy (or a whole proxy chain). It works even if a proxy doesn't understand TLS at all, because CONNECT
can be issued inside plain HTTP and requires from the proxy nothing more than copying raw bytes around.
But the connection to the first proxy can be TLS (https) although it means a double encryption of traffic between you and the first proxy.
Obviously, it makes no sense to CONNECT
when talking directly to the final server. You just start talking TLS and then issue HTTP GET
. The end servers normally disable CONNECT
altogether.
To a proxy, CONNECT
support adds security risks. Any data can be passed through CONNECT
, even ssh hacking attempt to a server on 192.168.1.*, even SMTP sending spam. Outside world sees these attacks as regular TCP connections initiated by a proxy. They don't care what is the reason, they cannot check whether HTTP CONNECT
is to blame. Hence it's up to proxies to secure themselves against misuse.
A CONNECT request urges your proxy to establish an HTTP tunnel to the remote end-point. Usually is it used for SSL connections, though it can be used with HTTP as well (used for the purposes of proxy-chaining and tunneling)
CONNECT www.google.com:443
The above line opens a connection from your proxy to www.google.com on port 443.
After this, content that is sent by the client is forwarded by the proxy to www.google.com:443
.
If a user tries to retrieve a page http://www.google.com, the proxy can send the exact same request and retrieve response for him, on his behalf.
With SSL(HTTPS), only the two remote end-points understand the requests, and the proxy cannot decipher them. Hence, all it does is open that tunnel using CONNECT, and lets the two end-points (webserver and client) talk to each other directly.
Proxy Chaining:
If you are chaining 2 proxy servers, this is the sequence of requests to be issued.
GET1 is the original GET request (HTTP URL)
CONNECT1 is the original CONNECT request (SSL/HTTPS URL or Another Proxy)
User Request ==CONNECT1==> (Your_Primary_Proxy ==CONNECT==> AnotherProxy-1 ... ==CONNECT==> AnotherProxy-n) ==GET1(IF is http)/CONNECT1(IF is https)==> Destination_URL
As a rule of thumb GET is used for plain HTTP and CONNECT for HTTPS
There are more details though so you probably want to read the relevant RFC-s
http://www.ietf.org/rfc/rfc2068.txt http://www.ietf.org/rfc/rfc2817.txt