URL encoding, also called percent-encoding, is a mechanism for converting characters into a URL-safe format. Defined by the IETF in RFC 3986, its purpose is to allow any character to be safely transmitted in a URL without causing parsing ambiguity.
Only a subset of ASCII characters are allowed directly in a URL. For any other character — such as Chinese, Japanese, or special symbols — the character must first be converted to a byte sequence, and then each byte is represented in %XX format, where XX is a two-digit hexadecimal number.
Example: The Chinese greeting "你好" has the UTF-8 bytes E4 BD A0 E5 A5 BD, which URL-encodes to %E4%BD%A0%E5%A5%BD.
RFC 3986 (Uniform Resource Identifier: Generic Syntax) is the core specification for URLs and URIs. Published by the IETF in 2005, it remains the authoritative standard. It defines URI syntax, classifies reserved characters, and specifies when percent-encoding is required.
According to RFC 3986, URI characters fall into three categories:
RFC 3986 divides reserved characters into two groups:
| Character | Purpose | URL Encoded |
|---|---|---|
: | Protocol / host separator, port separator | %3A |
/ | Path separator | %2F |
? | Start of query string | %3F |
# | Start of fragment | %23 |
[ | IPv6 address | %5B |
] | IPv6 address | %5D |
@ | Userinfo separator | %40 |
| Character | Common Use | URL Encoded |
|---|---|---|
! | Special marker | %21 |
$ | Path parameter | %24 |
& | Query parameter separator | %26 |
' | String marker | %27 |
( | Grouping | %28 |
) | Grouping | %29 |
* | Wildcard | %2A |
+ | Space (legacy form encoding) | %2B |
, | List separator | %2C |
; | Path parameter | %3B |
= | key=value separator | %3D |
| Character | Reason | URL Encoded |
|---|---|---|
| Space | Spaces are not allowed in URLs | %20 or + |
" | Conflicts with HTML attribute quotes | %22 |
< | Conflicts with HTML tags | %3C |
> | Conflicts with HTML tags | %3E |
{ } | Unsafe characters | %7B / %7D |
| | Unsafe character | %7C |
\ | Conflicts with path separators | %5C |
^ | Unsafe character | %5E |
` | Unsafe character | %60 |
URL encoding of non-ASCII characters like Chinese involves two steps:
"中" = UTF-8: E4 B8 AD → URL Encoded: %E4%B8%AD
"文" = UTF-8: E6 96 87 → URL Encoded: %E6%96%87
// encodeURIComponent - strictest, encodes URL structural characters too (recommended for parameter values)
encodeURIComponent('hello world!')
// → "hello%20world%21"
// encodeURI - preserves URL structural characters (for complete URLs)
encodeURI('https://example.com/search?q=hello world')
// → "https://example.com/search?q=hello%20world"
// Decoding
decodeURIComponent('hello%20world%21')
// → "hello world!"
from urllib.parse import quote, unquote, urlencode
# Encode a single value (safe='' means / is also encoded)
quote('hello world!', safe='')
# → 'hello%20world%21'
# Decode
unquote('hello%20world%21')
# → 'hello world!'
# Encode a full query string
params = {'name': 'John Doe', 'city': 'New York'}
urlencode(params)
# → 'name=John+Doe&city=New+York'
<?php
// rawurlencode - follows RFC 3986 (recommended)
rawurlencode('hello world!');
// → 'hello%20world%21'
// urlencode - encodes space as + rather than %20 (for HTML form data)
urlencode('hello world!');
// → 'hello+world%21'
// Decode
rawurldecode('hello%20world%21');
// → 'hello world!'
?>
import java.net.URLEncoder;
import java.net.URLDecoder;
import java.nio.charset.StandardCharsets;
// Encode
String encoded = URLEncoder.encode("hello world!", StandardCharsets.UTF_8);
// → "hello+world%21" (Note: Java encodes spaces as +)
// Decode
String decoded = URLDecoder.decode("hello+world%21", StandardCharsets.UTF_8);
// → "hello world!"
import "net/url"
// Query encode (space → +)
encoded := url.QueryEscape("hello world!")
// → "hello+world%21"
// Path encode (space → %20)
pathEncoded := url.PathEscape("hello world!")
// → "hello%20world%21"
// Decode
decoded, _ := url.QueryUnescape("hello+world%21")
// → "hello world!"
Mistake 1: Double Encoding
Encoding an already-encoded string a second time causes % to itself become %25.
Wrong: encodeURIComponent('%E4%BD%A0') → %25E4%25BD%25A0
Correct: Only encode the original, unencoded string once.
Mistake 2: Using the deprecated escape() function
escape() does not correctly handle non-ASCII characters and has been deprecated by the standard.
Use encodeURIComponent() or encodeURI() instead.
Mistake 3: Confusing encodeURI and encodeURIComponent
Using encodeURI() on a query string value leaves &, =, and ? unencoded, which breaks parameter parsing.
Always use encodeURIComponent() for individual query string parameter values.
Use our free tool to verify your URL encoding results instantly, with multi-language support.
Go to the Tool