Understanding URL Encoding
How Uniform Resource Identifiers represent characters safely, and how to encode without breaking links.
Updated April 9, 2026
What URLs are
A Uniform Resource Identifier tells software where to go on the network or how to name a resource. The common web form you paste into a browser combines a scheme (https:), authority (host), path, optional query string, and fragment. RFC 3986 defines the grammar: certain characters are structural, like the question mark that begins a query or the slash that separates path segments.
Because URIs are plain text, any byte that could be mistaken for syntax—or that is not reliably transmitted—must be represented carefully. That is the job of percent-encoding, also called URL encoding in everyday developer language.
Reserved and unreserved characters
RFC 3986 divides characters into groups. Unreserved characters—letters, digits, hyphen, period, underscore, and tilde—can appear literally in many URI parts without encoding. Reserved characters have meaning in the URI syntax: delimiters like :, /, ?, #, [, ], @, and others. If you need a reserved character as data rather than syntax, you percent-encode it.
Everything outside the allowed ASCII range—spaces, Unicode symbols, letters with diacritics—also needs encoding in typical HTTP URIs. Modern IRIs can use Unicode in the host (IDNA) and UTF-8 percent-encoding in paths and queries.
Percent-encoding mechanism
Percent-encoding writes a percent sign followed by two hexadecimal digits: %XX. The digits are the byte value in uppercase or lowercase hex. For UTF-8 text, each UTF-8 byte becomes its own %XX sequence, which is why a single emoji may expand to a long run of percents.
This is reversible: decode by scanning for %, reading two hex nibbles, emitting the byte, then interpreting the byte sequence as UTF-8. Libraries handle edge cases such as invalid sequences or mixed encodings; hand-rolling parsers is a common source of security bugs.
When encoding is needed
You encode when user-supplied or dynamic text becomes part of a URI. Typical cases include query parameters (q=hello world needs the space encoded), HTML form submissions serialized to the query string or request body, and file names uploaded or linked where spaces or punctuation would break parsing.
API clients should encode each parameter value before concatenation. Server frameworks usually decode once when parsing the request. Mismatches appear as mysterious 404s, OAuth state errors, or signatures that never verify because the bytes signed differ from the bytes sent.
encodeURI vs encodeURIComponent in JavaScript
ECMAScript provides two helpers with different scopes. encodeURI is meant for strings that are mostly a full URI: it encodes characters that are illegal in URIs but preserves delimiters such as ?, #, and / so structure remains intact.
encodeURIComponent is stricter. Use it for individual query parameter values, path segments you embed inside a template, or anything that must not leak delimiter characters. Pair it with decodeURIComponent when reading data back on the client. Choosing the wrong function is a classic way to produce broken redirects or truncated hashes.
Common encoded characters
| Character | Percent | Note |
|---|---|---|
| space | %20 | Also + in form-urlencoded queries |
& | %26 | Separates query pairs when literal |
= | %3D | Separates key and value when literal |
? | %3F | Starts query when used as data |
# | %23 | Fragment delimiter when literal |
/ | %2F | Path segment when literal inside a segment |
Double encoding pitfalls
Double encoding means running percent-encoding on text that already contains % sequences. The percent sign itself becomes %25, so %2F turns into %252F. Servers then decode to %2F instead of /, which breaks routing and caching keys.
Avoid encoding an entire URL with encodeURIComponent unless you truly intend to embed that string as an opaque value. Prefer assembling URLs with a builder or template that encodes each piece exactly once. Log raw and encoded forms during debugging to spot extra layers quickly.
URL encoding in different languages
Python offers urllib.parse.quote and quote_plus; Java uses URLEncoder with attention to which charset you specify. Go has url.QueryEscape for query keys and values and url.PathEscape for path segments—note the distinction mirrors real grammar rules.
Rust, .NET, PHP, and Ruby all ship standard-library helpers. Whichever stack you use, read the documentation for whether spaces become + or %20, and whether the function targets components or full strings. Consistency across your gateway, microservices, and client prevents subtle production bugs.
Frequently asked questions
- Why is space %20 and not +?
- RFC 3986 uses %20 for spaces in generic URI syntax. HTML form encoding historically uses + in
application/x-www-form-urlencodedbodies and sometimes in queries. Treat + as a space only where that media type applies; otherwise prefer %20. - What is double encoding?
- Encoding already-encoded text so percent signs become %25. It usually means a pipeline applied encoding twice or the wrong encoder wrapped a full URL. Encode each component once at composition time.
- Do I need to encode the whole URL?
- Rarely. Encode pieces (segments, keys, values) and join with literal delimiters. Use encodeURI only when you must lightly fix a mostly valid URI; use encodeURIComponent for values you slot into queries or paths.