What's the difference between EscapeUriString and EscapeDataString?

Solution 1:

I didn't find the existing answers satisfactory so I decided to dig a little deeper to settle this issue. Surprisingly, the answer is very simple:

There is (almost) no valid reason to ever use Uri.EscapeUriString. If you need to percent-encode a string, always use Uri.EscapeDataString.*

* See the last paragraph for a valid use case.

Why is this? According to the documentation:

Use the EscapeUriString method to prepare an unescaped URI string to be a parameter to the Uri constructor.

This doesn't really make sense. According to RFC 2396:

A URI is always in an "escaped" form, since escaping or unescaping a completed URI might change its semantics.

While the quoted RFC has been obsoleted by RFC 3986, the point still stands. Let's verify it by looking at some concrete examples:

  1. You have a simple URI, like this:

     http://example.org/
    

Uri.EscapeUriString won't change it.

  1. You decide to manually edit the query string without regard for escaping:

     http://example.org/?key=two words
    

Uri.EscapeUriString will (correctly) escape the space for you:

    http://example.org/?key=two%20words
  1. You decide to manually edit the query string even further:

     http://example.org/?parameter=father&son
    

However, this string is not changed by Uri.EscapeUriString, since it assumes the ampersand signifies the start of another key-value pair. This may or may not be what you intended.

  1. You decide that you in fact want the key parameter to be father&son, so you fix the previous URL manually by escaping the ampersand:

     http://example.org/?parameter=father%26son
    

However, Uri.EscapeUriString will escape the percent character too, leading to a double encoding:

    http://example.org/?parameter=father%2526son

As you can see, using Uri.EscapeUriString for its intended purpose makes it impossible to use & as part of a key or value in a query string instead of as a separator between multiple key-value pairs.

This is because, in an attempt at making it suitable for escaping full URIs, it ignores reserved characters and only escapes characters that are neither reserved nor unreserved, which, BTW, is contrary to the documentation. This way you don't end up with something like http%3A%2F%2Fexample.org%2F, but you do end up with the issues illustrated above.


In the end, if your URI is valid, it does not need to be escaped to be passed as a parameter to the Uri constructor, and if it's not valid then calling Uri.EscapeUriString isn't a magic solution either. Actually, it will work in many if not most cases, but it is by no means reliable.

You should always construct your URLs and query strings by gathering the key-value pairs and percent-encoding and then concatenating them with the necessary separators. You can use Uri.EscapeDataString for this purpose, but not Uri.EscapeUriString, since it doesn't escape reserved characters, as mentioned above.

Only if you cannot do that, e.g. when dealing with user-provided URIs, does it make sense to use Uri.EscapeUriString as a last resort. But the previously mentioned caveats apply – if the user-provided URI is ambiguous, the results may not be desirable.

Solution 2:

Use EscapeDataString always (for more info about why, see Livven's answer below)

Edit: removed dead link to how the two differ on encoding

Solution 3:

The plus (+) characters can reveal a lot about the difference between these methods. In a simple URI, the plus character means "space". Consider querying Google for "happy cat":

https://www.google.com/?q=happy+cat

That's a valid URI (try it), and EscapeUriString will not modify it.

Now consider querying Google for "happy c++":

https://www.google.com/?q=happy+c++

That's a valid URI (try it), but it produces a search for "happy c", because the two pluses are interpreted as spaces. To fix it, we can pass "happy c++" to EscapeDataString and voila*:

https://www.google.com/?q=happy+c%2B%2B

*)The encoded data string is actually "happy%20c%2B%2B"; %20 is hex for the space character, and %2B is hex for the plus character.

If you're using UriBuilder as you should be, then you'll only need EscapeDataString to properly escape some of the components of your entire URI. @Livven's answer to this question further proves that there really is no reason to use EscapeUriString.

Solution 4:

Comments in the source address the difference clearly. Why this info isn't brought forward via XML documentation comments is a mystery to me.

EscapeUriString:

This method will escape any character that is not a reserved or unreserved character, including percent signs. Note that EscapeUriString will also do not escape a '#' sign.

EscapeDataString:

This method will escape any character that is not an unreserved character, including percent signs.

So the difference is in how they handle reserved characters. EscapeDataString escapes them; EscapeUriString does not.

According to the RFC, the reserved characters are: :/?#[]@!$&'()*+,;=

For completeness, the unreserved characters are alphanumeric and -._~

Both methods escape characters that are neither reserved nor unreserved.

I disagree with the general notion that EscapeUriString is evil. I think a method that escapes only illegal characters (such as spaces) and not reserved characters is useful. But it does have a quirk in how it handles the % character. Percent-encoded characters (% followed by 2 hex digits) are legal in a URI. I think EscapeUriString would be far more useful if it detected this pattern and avoided encoding % when it's immediately proceeded by 2 hex digits.