Guides

Base64 Encoding Explained

Base64 is a reversible way to represent binary data using a small alphabet of printable characters—ideal when the channel only tolerates text.

What Base64 is (and is not)

At a high level, Base64 transforms an arbitrary stream of bytes into a string drawn from 64 safe symbols plus padding. It is an encoding, not a compression algorithm and not a cryptographic primitive. Anyone who sees the Base64 string can decode it back to the original bytes unless separate encryption has been applied beforehand.

That last sentence trips up newcomers: Base64 “hides” binary from protocols that choke on NUL bytes or arbitrary high-bit values, but it does not hide meaning from people. Treat it like hexadecimal with a different alphabet—useful, standardized, and utterly public.

The 64-character alphabet

RFC 4648 defines the classic alphabet: uppercase A–Z (26), lowercase a–z (26), digits 0–9 (10), and two symbols—usually + and /—bringing the total to 64. Each symbol stands for a 6-bit index from 0 to 63. Because 6 bits times 4 symbols equals 24 bits, Base64 naturally chunks binary in groups that align with three bytes at a time.

URL and filename safe variants swap + and / for - and _ to avoid clashes with query strings and path separators. JWTs, for example, often use Base64URL without standard padding—another reminder that “Base64” is a family of closely related rules, not one monolithic string format.

Step-by-step: three bytes to four characters

Take three consecutive input bytes (24 bits). Split those bits into four groups of six bits each. Map each 6-bit value to its letter in the alphabet. Reading left to right, the first character encodes the top six bits of the first byte; the second character mixes the remaining two bits of the first byte with the top four bits of the second; and so on. Decoding reverses the process, reassembling the original byte boundaries.

When the input length is not a multiple of three, the final quantum is padded conceptually with zero bits so the last Base64 characters still represent complete 6-bit indices. Those artificial trailing zeros are not real data; the padding characters tell the decoder how many bytes were meaningful.

Padding with =

The equals sign is not part of the 64-symbol payload alphabet; it signals how many bytes were missing from the last triplet. One = means two input bytes were encoded into the last block; two == means only one byte remained. Some modern profiles omit padding when the length is known from context, but MIME email historically relied on padding and line wrapping every 76 characters to survive ancient SMTP limits.

Why Base64 exists

Historical email systems assumed text. Binary attachments would break on 7-bit links or newline translation. MIME layered structure: headers declare content types, and binary bodies are Base64-encoded into ASCII armor. The same pattern appears anywhere a text-only API must ferry opaque bytes—database text columns, JSON fields, query parameters, and clipboard snippets.

Base64 trades efficiency for universality. That trade is usually correct at boundaries; it is usually wrong for bulk storage inside your own binary-native systems, where you should keep bytes as bytes.

Real-world uses

Email MIME remains the classic teaching example: images and PDFs ride inside messages as encoded text parts. Data URIs embed small resources directly in HTML or CSS—handy for tiny icons, though caching and CSP considerations matter. JSON Web Tokens concatenate Base64URL-encoded header and payload segments (signed or encrypted separately) so browsers and services can pass structured claims without binary framing.

APIs sometimes accept file uploads as Base64 inside JSON for simplicity over multipart forms. Cryptographic keys and certificates are frequently distributed as PEM text: a header line, Base64 body, footer line. In each case, the encoding solves a transport or copy-paste problem, not a threat model.

Base64 vs encryption

Encryption aims for confidentiality and integrity under keys; Base64 aims for representation. Decoding Base64 requires no secret. If you “encrypt” a password by Base64-encoding it, you have not encrypted it—you have performed a public transform that slows attackers by roughly zero seconds. Real protection uses algorithms like AES with proper modes, key management, and authentication—topics outside this article but firmly not optional in production systems.

Sometimes beginners confuse PEM with “encryption” because the file looks obscure. PEM is just labeled Base64 around a DER-encoded key or certificate; the privacy of a private key file comes from filesystem permissions and optional passphrases on the key material, not from Base64.

Size overhead (~33%)

Encoding three bytes as four characters expands size by one-third before counting newlines in wrapped formats. Large attachments in email inflate accordingly; inlined images in CSS grow page weight. For big binaries, prefer direct binary transfer (HTTPS, object storage, multipart uploads) and reserve Base64 for the seams where text is mandatory.

When you measure database or cache memory, remember decoded bytes—not the Base64 string length—reflect actual payload size after you pay the decode cost on the hot path.

Debugging tip: if a Base64 string fails to decode, check for accidental whitespace, missing padding, URL-safe alphabet mismatches, or corruption from copy-paste tools that wrap lines or swap characters. Validators and unit tests that round-trip sample payloads catch these issues before they reach production logs.

Frequently asked questions

Is Base64 secure?
Not for secrecy. It is reversible by design. Use real cryptography when data must stay confidential.
Why does Base64 make data larger?
You represent 24 bits using four 6-bit characters—four output symbols for three input bytes—plus occasional padding.
What are the 64 characters?
A–Z, a–z, 0–9, and two punctuation symbols (commonly + and /), with = reserved for padding in the standard form.
Can I Base64 encode an image?
Yes—common for data URIs and inline assets. Expect larger HTML and consider caching and CSP when doing so.

Related guides

Related tools