Guides

Understanding URL Encoding

How Uniform Resource Identifiers represent characters safely, and how to encode without breaking links.

Updated April 9, 2026

What URLs are

A Uniform Resource Identifier tells software where to go on the network or how to name a resource. The common web form you paste into a browser combines a scheme (https:), authority (host), path, optional query string, and fragment. RFC 3986 defines the grammar: certain characters are structural, like the question mark that begins a query or the slash that separates path segments.

Because URIs are plain text, any byte that could be mistaken for syntax—or that is not reliably transmitted—must be represented carefully. That is the job of percent-encoding, also called URL encoding in everyday developer language.

Reserved and unreserved characters

RFC 3986 divides characters into groups. Unreserved characters—letters, digits, hyphen, period, underscore, and tilde—can appear literally in many URI parts without encoding. Reserved characters have meaning in the URI syntax: delimiters like :, /, ?, #, [, ], @, and others. If you need a reserved character as data rather than syntax, you percent-encode it.

Everything outside the allowed ASCII range—spaces, Unicode symbols, letters with diacritics—also needs encoding in typical HTTP URIs. Modern IRIs can use Unicode in the host (IDNA) and UTF-8 percent-encoding in paths and queries.

Percent-encoding mechanism

Percent-encoding writes a percent sign followed by two hexadecimal digits: %XX. The digits are the byte value in uppercase or lowercase hex. For UTF-8 text, each UTF-8 byte becomes its own %XX sequence, which is why a single emoji may expand to a long run of percents.

This is reversible: decode by scanning for %, reading two hex nibbles, emitting the byte, then interpreting the byte sequence as UTF-8. Libraries handle edge cases such as invalid sequences or mixed encodings; hand-rolling parsers is a common source of security bugs.

When encoding is needed

You encode when user-supplied or dynamic text becomes part of a URI. Typical cases include query parameters (q=hello world needs the space encoded), HTML form submissions serialized to the query string or request body, and file names uploaded or linked where spaces or punctuation would break parsing.

API clients should encode each parameter value before concatenation. Server frameworks usually decode once when parsing the request. Mismatches appear as mysterious 404s, OAuth state errors, or signatures that never verify because the bytes signed differ from the bytes sent.

encodeURI vs encodeURIComponent in JavaScript

ECMAScript provides two helpers with different scopes. encodeURI is meant for strings that are mostly a full URI: it encodes characters that are illegal in URIs but preserves delimiters such as ?, #, and / so structure remains intact.

encodeURIComponent is stricter. Use it for individual query parameter values, path segments you embed inside a template, or anything that must not leak delimiter characters. Pair it with decodeURIComponent when reading data back on the client. Choosing the wrong function is a classic way to produce broken redirects or truncated hashes.

Common encoded characters

Character Percent Note
space%20Also + in form-urlencoded queries
&%26Separates query pairs when literal
=%3DSeparates key and value when literal
?%3FStarts query when used as data
#%23Fragment delimiter when literal
/%2FPath segment when literal inside a segment

Double encoding pitfalls

Double encoding means running percent-encoding on text that already contains % sequences. The percent sign itself becomes %25, so %2F turns into %252F. Servers then decode to %2F instead of /, which breaks routing and caching keys.

Avoid encoding an entire URL with encodeURIComponent unless you truly intend to embed that string as an opaque value. Prefer assembling URLs with a builder or template that encodes each piece exactly once. Log raw and encoded forms during debugging to spot extra layers quickly.

URL encoding in different languages

Python offers urllib.parse.quote and quote_plus; Java uses URLEncoder with attention to which charset you specify. Go has url.QueryEscape for query keys and values and url.PathEscape for path segments—note the distinction mirrors real grammar rules.

Rust, .NET, PHP, and Ruby all ship standard-library helpers. Whichever stack you use, read the documentation for whether spaces become + or %20, and whether the function targets components or full strings. Consistency across your gateway, microservices, and client prevents subtle production bugs.

Frequently asked questions

Why is space %20 and not +?
RFC 3986 uses %20 for spaces in generic URI syntax. HTML form encoding historically uses + in application/x-www-form-urlencoded bodies and sometimes in queries. Treat + as a space only where that media type applies; otherwise prefer %20.
What is double encoding?
Encoding already-encoded text so percent signs become %25. It usually means a pipeline applied encoding twice or the wrong encoder wrapped a full URL. Encode each component once at composition time.
Do I need to encode the whole URL?
Rarely. Encode pieces (segments, keys, values) and join with literal delimiters. Use encodeURI only when you must lightly fix a mostly valid URI; use encodeURIComponent for values you slot into queries or paths.

Related guides

Related tools