Cryptography: Encoding, Encryption, and Hashing (Concept, Use Cases, and Examples)
- 4.3/5
- 133
- Jan 19, 2025
Cryptography is a vast field in computer science and information security, and three fundamental concepts often get mixed up—encoding, encryption, and hashing.
Although these terms are related, they have distinct purposes and mechanisms. Let's break them down with clear definitions, differences, use cases, and examples in Java.
Although these terms are related, they have distinct purposes and mechanisms. Let's break them down with clear definitions, differences, use cases, and examples in Java.
Feature | Encoding | Encryption | Hashing |
---|---|---|---|
Purpose | Transform data to a usable format | Protect data confidentiality | Ensure data integrity |
Reversible | Yes | Yes (with the right key) | No (one-way) |
Security | None | High (ensures confidentiality) | None (checks integrity) |
Use Case | Data transmission, compatibility | Securing data, protecting secrets | Verifying data integrity, password storage |
Output Size | Variable (depends on encoding scheme) | Varies (based on algorithm and key) | Fixed size (e.g., 256-bit for SHA-256) |
Example Algorithms | Base64, Base32, Base16 | AES, RSA, ChaCha20 | SHA-256, SHA-512, MD5 |
1) Encoding
Encoding is the process of transforming data from one format to another using a scheme or standard. The primary purpose of encoding is not to secure the data but to make it transportable and usable in different environments or systems.Encoding is a reversible process, meaning you can easily get back the original data if you know the encoding scheme.
Java provides various encoding techniques to handle different character sets, file formats, and communication protocols. Below are some of the different encoding techniques in Java:
1.1) Base Encoding
How base 10 is structured?
Starting at 0, we count up to 9, filling the "1's" column. Once the ones column is full (has 9), that is the maximum for the column. So, we move to the next column (to the left) and start at 1.For all intents and purposes, we can postulate that there are an infinite number of leading zeros before our first significant column. In other words, "000005" is the same as "5". So, as each column fills up, the next column is then increased by one, and we start back at the previous column to fill it up again in the same manner as before.
Specifically, the 1's column increases from 0 to 9, and then another ten is added to the tens column. This is continued, and if the tens column is at 9 and the 1's column is at 9, 1 is added to the 100's column, and so forth.
Base 2 (Binary)
Binary consists of two digits, 0 and 1. Binary is the most basic system needed for all logical operations (think "true" and "false"). Take the formula above, and instead of using ten, use two.Decimal to Binary
Decimal 50 converts directly to binary as "110010"50 ÷ 2 = 25 (remainder 0) 25 ÷ 2 = 12 (remainder 1) 12 ÷ 2 = 6 (remainder 0) 6 ÷ 2 = 3 (remainder 0) 3 ÷ 2 = 1 (remainder 1) 1 ÷ 2 = 0 (remainder 1)
Binary to Decimal
Each bit in the binary number represents a power of 2, starting from 2^0 at the rightmost bit. 110010 is a 6-bit binary number, where the bits represent the following powers of 2 (from right to left):1 ⋅ 2^5 + 1 ⋅ 2^4 + 0 ⋅ 2^3 + 0 ⋅ 2^2 + 1 ⋅ 2^1 + 0 ⋅ 2^0 32 + 16 + 0 + 0 + 2 + 0 = 50
Hexadecimal (Base 16)
Base 16, also known as hexadecimal, is widely used in computer systems for various purposes. It employs the digits 0 to 9, followed by the letters a to f (case-insensitive).One common application of hexadecimal is in defining RGB color values in CSS, where each channel (red, green, and blue) is represented by two hexadecimal digits.
Hexadecimal is also frequently used in assembly languages, the lowest-level programming languages, as it simplifies the conversion to binary. This makes it a more convenient way to write assembly code instructions.
Similarly, Base 32 and Base 64 encodings are preferred for handling binary data due to their alignment with powers of 2. These encoding schemes use character sets with at least 64 safe characters, which are widely supported on most computers.
50 in decimal is 32 in hexadecimal (5 ⋅ 16^1 + 0 ⋅ 16^0 = 50).
Decimal to Hexadecimal
50 ÷ 16 = 3 remainder 2 3 ÷ 16 = 0 remainder 3Reading the remainders from bottom to top gives us 32 in hexadecimal.
Hexadecimal to Decimal
3 ⋅ 16^1 + 2 ⋅ 16^0 (3 ⋅ 16) + (2 ⋅ 1) = 48 + 2 = 50
Base32
Base32 uses 32 characters (A-Z, 2-7) to encode data, each of which represents a different combination of 5 bits (2^5).Encoding Process for "ASHU"
The string "ASHU" in ASCII is: 65, 83, 72, 85Convert each ASCII value to an 8-bit binary string:
65 → 01000001 83 → 01010011 72 → 01001000 85 → 01010101Splitting them into 5-bit groups:
01000 00101 01001 01001 10000 10101 01 (000) =Because the specification defines that the encoding must be done in chunks of 8 5-bit pieces, we have to pad with 0 if the number of bits isn't divisible by 5 (hence the 01(000) on the last) and with "=" if the number of chunks isn't divisible by 8.
The Base32 alphabet is: A-Z, 2-7 (the encoding for 0 is A)
Each of these 5-bit binary numbers maps to a character in the 32-bit alphabet; specifically, the output for "ASHU" would be IFJUQVI=.
Base64
Base64 encoding uses 64 characters (A-Z, a-z, 0-9, +, /) to encode data. More specifically, the source binary data is taken 6 bits at a time, then this group of 6 bits is mapped to one of 64 unique characters.A similar process is followed for Base64. The Base64 encoding process takes 24-bit strings and breaks them into four 6-bit chunks, mapping the resulting binary number to the Base64 alphabet.
Splitting binary for "ASHU" into 6-bit groups:
010000 010101 001101 001000 010101 01(0000) = =
010000 → 16 → Q 010101 → 21 → V 001101 → 13 → N 001000 → 8 → I 010101 → 21 → V 01(0000) → 16 → QThe output for "ASHU" would be QVNIVQ==.
Base64 Example in Java
Java provides native support for Base64 encoding/decoding via java.util.Base64. Here’s how you can use it:Original text: Hello, World! Encoded (Base64): SGVsbG8sIFdvcmxkIQ== Decoded: Hello, World!
Base32 Example in Java
For Base32 encoding/decoding, Java doesn't provide built-in support, so you'll need an external library. One popular choice is the Apache Commons Codec library, which provides Base32 encoding and decoding.<dependency> <groupId>commons-codec</groupId> <artifactId>commons-codec</artifactId> <version>1.15</version> </dependency>
Original text: Hello, World!, length: 13 Encoded Base32: JBSWY3DPFQQFO33SNRSCC===, length: 24 Decoded String: Hello, World!
1.2) Character Encoding (Text Encoding)
Character encoding is used to represent characters as byte sequences.Unicode is a universal character encoding standard designed to represent text and symbols from all writing systems around the world. For every character, there is a unique 4 to 6-digit hexadecimal number known as a Unicode point.
ASCII (American Standard Code for Information Interchange) is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. ASCII has just 128 code points, of which only 95 are printable characters, which severely limits its scope.
Unicode is compatible with ASCII encoding. This means that the first 128 characters in Unicode directly correspond to the characters represented in the 7-bit ASCII table. We can also say that ASCII is a subset of Unicode.
For the character 'A', the ASCII representation is 0065, and the Unicode point is U+0041. How is it backward compatible with ASCII? This is because the U+0041 is in hexadecimal form, which corresponds to 0065 in decimal.
(0041)₁₆ = (0065)₁₀
UTF-8, UTF-16, and UTF-32 are all different ways of encoding Unicode characters, with the key difference being the number of bits used to represent each character: UTF-8 uses 8 bits (1 byte) for basic characters and can use up to 32 bits (4 bytes) for complex ones, UTF-16 uses 16 bits (2 bytes) for most characters and can use 32 bits for extended characters, while UTF-32 always uses 32 bits (4 bytes) per character.
This makes UTF-8 the most efficient option for most text as it takes up less space when encoding common characters.
UTF-8 has an advantage when ASCII characters represent the majority of characters in a block of text because UTF-8 encodes these into 8 bits (like ASCII). It is also advantageous in that a UTF-8 file containing only ASCII characters has the same encoding as an ASCII file.
UTF-16 is better where ASCII is not predominant since it uses 2 bytes per character primarily. UTF-8 will start to use 3 or more bytes for the higher-order characters, while UTF-16 remains at just 2 bytes for most characters.
UTF-32 will cover all possible characters in 4 bytes. It is an enormous memory hog but fast to operate on. It is rarely used. Its advantage is that you don't need to decode stored data to the 32-bit Unicode code point (e.g., for character-by-character handling). The code point is already available right there in your array/vector/string.
Java's standard libraries primarily provide direct support for UTF-8, UTF-16, and other encodings, but not for UTF-32.
UTF-8 Example in Java
In Java, you can work with UTF-8 encoding using the String class and methods like getBytes() and new String(), along with the Charset class from java.nio.charset.Original text: Hello, World!, length: 13 Encoded to UTF-8: 72 101 108 108 111 44 32 87 111 114 108 100 33 Length: 13 Decoded from UTF-8: Hello, World!
UTF-16 Example in Java
In Java, you can work with UTF-16 encoding using the String class and its methods like getBytes() and new String(), along with Charset (from java.nio.charset).Original text: Hello, World!, length: 13 Encoded to UTF-16: -2 -1 0 72 0 101 0 108 0 108 0 111 0 44 0 32 0 87 0 111 0 114 0 108 0 100 0 33 Length: 28 Decoded from UTF-16: Hello, World!
2) Encryption
Encryption is the process of transforming data (plaintext) into an unreadable format (ciphertext) using an encryption algorithm and a key. The primary goal of encryption is confidentiality—ensuring that only authorized parties can read the original data.Encryption is a reversible process. If you have the correct key, you can decrypt the ciphertext and recover the original data. The purpose of encryption is to protect the data from unauthorized access.
Symmetric and asymmetric encryption are two methods of encrypting data, each with its own strengths and weaknesses.
Symmetric Encryption
- Uses a single key for both encryption and decryption.- Faster and easier to use than asymmetric encryption.
- Ideal for encrypting data at rest, like data stored in a database.
- Less secure than asymmetric encryption because the key can be compromised.
Asymmetric Encryption
- Uses a pair of keys, one public and one private, for encryption and decryption.- More secure than symmetric encryption because the private key is needed for decryption.
- Can be used to create digital signatures.
- Slower and more complicated than symmetric encryption.
The best encryption method should provide a balance of security, performance, and ease of implementation. Here's a breakdown of some of the most popular encryption algorithms in Java, as well as their best use cases:
2.1) AES (Advanced Encryption Standard)
Advanced Encryption Standard (AES) is a highly trusted encryption algorithm used to secure data by converting it into an unreadable format without the proper key. AES encryption uses various key lengths (128, 192, or 256 bits) to provide strong protection against unauthorized access.Symmetric encryption: The same key is used for both encryption and decryption. Secure key management is essential.
AES is a Block Cipher, which means it takes 128 bits as input and outputs 128 bits of encrypted ciphertext. In the Block cipher, the typical block size is between 64 and 128 bits. In contrast, the stream cipher converts 1 byte (8 bits) at a time.
Encrypted (AES): AifhdFCynIgn3JA8t8djD2B8Ix9nGlKeSGbONx8c/sk= Decrypted (AES): This is a secret message
2.2) RSA (Rivest-Shamir-Adleman)
RSA is an asymmetric encryption algorithm that uses a public/private key pair. It is typically used for digital signatures, key exchange, and small-scale encryption (like encrypting a symmetric key for AES).Secure communication: The sender encrypts the data with the public key, and the recipient decrypts it using the private key.
Commonly used in SSL/TLS for securing web traffic (e.g., HTTPS). Asymmetric encryption allows secure communication between parties without needing to share a common key.
Slower than symmetric algorithms (like AES), so it is typically used for encrypting small pieces of data (e.g., symmetric keys) rather than bulk data.
Requires a large key size (2048 bits or more) to be considered secure, which increases computational overhead. RSA can also be used for verifying the integrity of messages or transactions.
Encrypted (RSA): AqWB3mShwBMQSTPhNPlX5vSPYukgzfJf3GJTkhpl+fFIOfFIye89kpcENcF60Qa2elbIbXbwVKLdUb8Hij+4zUv9zlUfcIbxVnmP+G9gHkTbxUSF3+uH/0WjIOW/2CwkadwK7V735qwLd1CUOXcvJKI0a9DPPQ5Jg3euMc6i3wslB/5aemBKX+3wTTk1wya709IH+iMxckOJ7Guy89r0HNp0A1rSd3N1Pq170TWk5zuUF3A7358OZ8qupg7rULaxRtQagGAgFTWTSjomFnb0A6i6wrkQPINnHgH46rdlkyR3akAkOLISYE/XRjYg5M9GT7TQf7nzpUMFXVIclIcxbQ== Decrypted (RSA): This is a secret message
2.3) Elliptic Curve Cryptography (ECC)
ECC is a form of asymmetric encryption that uses elliptic curve mathematics for creating public/private key pairs. ECC provides the same level of security as RSA but with much smaller key sizes, making it more efficient.Increasingly used in mobile devices and environments with limited resources, such as IoT devices.
Signature (ECC): MEQCIEcghilgvVUkJ/2uOT5kxp/R+qZKyBLnVdUuv4RjuVqEAiAA1lrEnVFzqf6O27oijvf7yVEx6qln3A/V3q9ic1iOKg== Signature Verified: true
3) Hashing
Hashing is the process of converting input data (often of variable length) into a fixed-size output, called a hash value or digest, using a mathematical function called a hash function. The primary purpose of hashing is data integrity—ensuring that data has not been tampered with.Hashing is irreversible. Once data is hashed, you cannot retrieve the original input from the hash.
No matter the size of the input, the output will always be of a fixed length (e.g., SHA-256 produces a 256-bit hash). The same input always produces the same hash output.
Good hash functions are designed so that it is computationally infeasible to find two different inputs that produce the same hash (a collision).
In cryptography, a collision attack on a cryptographic hash tries to find two inputs producing the same hash value, i.e., a hash collision. Attackers can then replace one input with the other without changing the hash value. This allows attackers to create forged signatures, tamper with data, or crack passwords.
Here’s an overview of the best hashing algorithms in Java, their use cases, and examples:
3.1) SHA-256 (Secure Hash Algorithm 256-bit)
SHA-256 is a member of the SHA-2 family of cryptographic hash functions, producing a 256-bit (32-byte) hash value. It's widely used in cryptographic applications, blockchain technology (e.g., Bitcoin), file integrity checks, and digital signatures.It's secure and fast, resistant to collisions and preimage attacks. Although slower compared to non-cryptographic hash functions like MD5 or CRC32, this is expected for security.
SHA-256 Hash: xRUzsFJyvpvLxTyxlTTK4ep7ZTNDd+N9SSRFV5fYNEY=
3.2) BCrypt (Blowfish-based)
BCrypt is a password hashing algorithm based on the Blowfish cipher. It includes a salt and supports adaptive complexity (the cost factor), which allows the algorithm to become more computationally intensive over time as hardware improves.A salt is a random data value that is added to a password before it is hashed. Salting makes it harder for attackers to use precomputed tables to crack passwords. Salting also protects passwords that appear multiple times in a database.
BCrypt is commonly used in web applications and systems for securely storing passwords.
Highly resistant to brute-force and rainbow table attacks due to its adaptive cost factor. Slower hashing compared to general-purpose hashing algorithms like SHA-256 (which is intentional for password security).
BCrypt Hashed Password: $2a$12$YR8exCd0c3Qr25D0OxsOvua/9Q7Jw3X7lfiVp8.XMGICMt0AoNgqC Password Match: true
3.3) Argon2
Argon2 is the latest password hashing algorithm, designed to be memory-hard and resistant to GPU-based attacks. It won the Password Hashing Competition (PHC) and is considered the most secure option for password storage today.- Suitable for high-security applications requiring strong resistance to attacks. - Slower than other hashing algorithms like SHA-256 or MD5 (but this is intentional for security). - Requires more resources (memory and CPU), making it less suitable for applications with very high performance requirements.
Argon2 Hashed Password: $argon2i$v=19$m=65536,t=2,p=1$cD14AMv6awBuYy0ks72grw$iGcRB7fl2kqcm2UmKiYXh0ERVPt0qsDrNvtehGOeN3w Password Match: true
3.4) MD5
Although MD5 is fast and commonly used for tasks like checksums, it's not recommended for cryptographic purposes (e.g., password hashing) due to its vulnerabilities to collision attacks. For password storage or cryptographic purposes, it's better to use algorithms like SHA-256, BCrypt, or Argon2.You can use the MessageDigest class from java.security to compute an MD5 hash of a string.
MD5 Hash: c89cba7b7df028e65cb01d86f4d27077