Complete URL Encoding Guide: Percent-Encoding, RFC 3986 & Reserved Characters

What Is URL Encoding (Percent-Encoding)?

URL encoding, also called percent-encoding, is a mechanism for converting characters into a URL-safe format. Defined by the IETF in RFC 3986, its purpose is to allow any character to be safely transmitted in a URL without causing parsing ambiguity.

Only a subset of ASCII characters are allowed directly in a URL. For any other character — such as Chinese, Japanese, or special symbols — the character must first be converted to a byte sequence, and then each byte is represented in %XX format, where XX is a two-digit hexadecimal number.

Example: The Chinese greeting "你好" has the UTF-8 bytes E4 BD A0 E5 A5 BD, which URL-encodes to %E4%BD%A0%E5%A5%BD.

Overview of RFC 3986

RFC 3986 (Uniform Resource Identifier: Generic Syntax) is the core specification for URLs and URIs. Published by the IETF in 2005, it remains the authoritative standard. It defines URI syntax, classifies reserved characters, and specifies when percent-encoding is required.

According to RFC 3986, URI characters fall into three categories:

Unreserved Characters: May be used directly without encoding. These are English letters A-Z, a-z, digits 0-9, and the four symbols - _ . ~.
Reserved Characters: Have special meaning within a URL. If used as data (not as delimiters), they must be encoded.
Other Characters: All characters not in either of the above categories must be percent-encoded.

Complete List of Reserved Characters

RFC 3986 divides reserved characters into two groups:

General Delimiters (gen-delims)

Character	Purpose	URL Encoded
`:`	Protocol / host separator, port separator	%3A
`/`	Path separator	%2F
`?`	Start of query string	%3F
`#`	Start of fragment	%23
`[`	IPv6 address	%5B
`]`	IPv6 address	%5D
`@`	Userinfo separator	%40

Sub-Delimiters (sub-delims)

Character	Common Use	URL Encoded
`!`	Special marker	%21
`$`	Path parameter	%24
`&`	Query parameter separator	%26
`'`	String marker	%27
`(`	Grouping	%28
`)`	Grouping	%29
`*`	Wildcard	%2A
`+`	Space (legacy form encoding)	%2B
`,`	List separator	%2C
`;`	Path parameter	%3B
`=`	key=value separator	%3D

Unsafe Characters

Character	Reason	URL Encoded
Space	Spaces are not allowed in URLs	%20 or +
`"`	Conflicts with HTML attribute quotes	%22
`<`	Conflicts with HTML tags	%3C
`>`	Conflicts with HTML tags	%3E
`{` `}`	Unsafe characters	%7B / %7D
`\|`	Unsafe character	%7C
`\`	Conflicts with path separators	%5C
`^`	Unsafe character	%5E
`	Unsafe character	%60

How Non-ASCII Characters Are URL Encoded

URL encoding of non-ASCII characters like Chinese involves two steps:

UTF-8 Encoding: Convert the character to its UTF-8 byte sequence (Chinese characters typically occupy 3 bytes).
Percent Representation: Each byte is represented in %XX format.

"中" = UTF-8: E4 B8 AD → URL Encoded: %E4%B8%AD
"文" = UTF-8: E6 96 87 → URL Encoded: %E6%96%87

Implementation Examples by Language

JavaScript

// encodeURIComponent - strictest, encodes URL structural characters too (recommended for parameter values)
encodeURIComponent('hello world!')
// → "hello%20world%21"

// encodeURI - preserves URL structural characters (for complete URLs)
encodeURI('https://example.com/search?q=hello world')
// → "https://example.com/search?q=hello%20world"

// Decoding
decodeURIComponent('hello%20world%21')
// → "hello world!"

Python

from urllib.parse import quote, unquote, urlencode

# Encode a single value (safe='' means / is also encoded)
quote('hello world!', safe='')
# → 'hello%20world%21'

# Decode
unquote('hello%20world%21')
# → 'hello world!'

# Encode a full query string
params = {'name': 'John Doe', 'city': 'New York'}
urlencode(params)
# → 'name=John+Doe&city=New+York'

PHP

<?php
// rawurlencode - follows RFC 3986 (recommended)
rawurlencode('hello world!');
// → 'hello%20world%21'

// urlencode - encodes space as + rather than %20 (for HTML form data)
urlencode('hello world!');
// → 'hello+world%21'

// Decode
rawurldecode('hello%20world%21');
// → 'hello world!'
?>

Java

import java.net.URLEncoder;
import java.net.URLDecoder;
import java.nio.charset.StandardCharsets;

// Encode
String encoded = URLEncoder.encode("hello world!", StandardCharsets.UTF_8);
// → "hello+world%21"  (Note: Java encodes spaces as +)

// Decode
String decoded = URLDecoder.decode("hello+world%21", StandardCharsets.UTF_8);
// → "hello world!"

Go

import "net/url"

// Query encode (space → +)
encoded := url.QueryEscape("hello world!")
// → "hello+world%21"

// Path encode (space → %20)
pathEncoded := url.PathEscape("hello world!")
// → "hello%20world%21"

// Decode
decoded, _ := url.QueryUnescape("hello+world%21")
// → "hello world!"

Common Mistakes and How to Fix Them

Mistake 1: Double Encoding
Encoding an already-encoded string a second time causes % to itself become %25.
Wrong: encodeURIComponent('%E4%BD%A0') → %25E4%25BD%25A0
Correct: Only encode the original, unencoded string once.

Mistake 2: Using the deprecated escape() function
escape() does not correctly handle non-ASCII characters and has been deprecated by the standard.
Use encodeURIComponent() or encodeURI() instead.

Mistake 3: Confusing encodeURI and encodeURIComponent
Using encodeURI() on a query string value leaves &, =, and ? unencoded, which breaks parameter parsing.
Always use encodeURIComponent() for individual query string parameter values.

Try the URL Encoding Tool Now

Use our free tool to verify your URL encoding results instantly, with multi-language support.

Go to the Tool