Home Post System Design

System Design Interview : URL Shortening service

Mar 31, 2024


URL shortening is used to create shorter aliases for long URLs. Users are redirected to the original URL when they hit these aliases.


Tell the interviewer that you are going to support the below features. If the interviewer wants to add some more features, he/she will mention that.

a) Functional

1) For a given URL, the application should generate a shorter and unique alias.

2) When this shorter URL is accessed or clicked, the service should redirect them to the original link.

3) Users should be able to specify their own URL alias.

4) Users should also be able to specify an expiration time, and if no expiration time is specified, the links should expire after a default timespan automatically.

b) Non Functional

1) The system should be highly available and scalable to accommodate an increasing user base.

2) Users should have a smooth experience with minimal latency.

3) Shortened links should not be predictable or guessable.

c) Extended Requirements

1) Analytics

2) REST access

Capacity Estimation

There will be lots of redirection requests in comparison to new URL shortening requests.

That means the system will have to be read extensive. Let's assume a 100:1 ratio between reading and writing.


1 billion new URL shortenings or 100 billion redirections per month.

URLs shortenings per second = 1B / (30 days * 24 hours * 3600 seconds) ~ 400 URLs/s.

URL redirections per second or QPS = 100 x URL shortening = 40K/s.


Let's assume these 1 B URLs will be kept for five years and each URL takes around 1 KB of storage.

Storage needed = 1 Billion * 5 years * 12 months * 1 KB = 60 Billion KB ~ 60 TB

Cache Storage: If we assume that 20% of URLs generate 80% of the traffic, we would like to cache these 20% of hot URLs. i.e., 0.2 * 4 B * 1 KB ~= 800 GB

Hard Questions

How do I create a short, unique URL?

1) Encoding the actual URL

The total number of unique URL hashes we would need to generate in 5 years would be 1B * 12 months * 5 years = 60 B.

Using base64 encoding, a "six-letter long key" would result in 64^6 ~= 68.7 billion possible strings.

Two potential issues can arise out of this solution, i.e., multiple users can get the same shortened URL for the same input URL and cases need to be handled if parts of the input URL are already URL-encoded.

The first issue of duplicate URLs can be solved by appending the user id (unique) to the input URL, and to overcome the already encoded input URL, we can decode all input URLs before generating the hash.

2) Using Key Generation Service (KGS)

In this approach, Key Generation Service (KGS) makes sure that all the generated keys are unique.

KGS also has to make sure not to provide the same key to multiple servers or multiple requests from the same server.

The issue with this approach is that the KGS server is now the single point of failure. We can overcome this limitation by having a standby replica of KGS.

Recommended Design


We would have billions of records, small objects or records of 1 KB with no relation, and read heavy queries. A "key-value based" choice like "Amazon DynamoDB", "Aerospike" or "Couchbase" would be easier to scale.

A key-value database stores data as a collection of key-value pairs in which a key serves as a unique identifier.

Both keys and values can be anything, ranging from simple objects to complex compound objects.


Key-value databases are highly partitionable and can scale horizontally to sizes that other types of databases cannot.

We can partition the data based on the first letter of the hash key, i.e., Range Based Partitioning.

The problem with this approach is that it can lead to unbalanced partitioning. For example, there are very few words starting with "Z". On the other hand; we may have too many URLs that begin with the letter "E".

We may combine less frequently occurring letters into one database partition.

We can also use hash-based partitioning.

In this approach, the hash of the object determines the partition in which the data will be stored. The hash function will generate a server number, and we will store the key on that server.

This process can make the distribution more random.

If this approach still leads to overloaded partitions, we need to use Consistent Hashing.

Consistent Hashing is a distributed hashing scheme that operates independently of the number of servers or objects in a distributed hash table by assigning them a position on an abstract circle, or hash ring. This allows servers and objects to scale without affecting the overall system.


We can use a distributed cache layer to store and retrieve frequently accessed URLs.

The application servers, before making a query to the database, may check if the cache has the desired data.

The Least Recently Used (LRU) can be a reasonable policy for our system. In this policy, we remove the least recently used URL first.

Redis is a great choice for implementing a highly available in-memory cache to decrease data access latency, increase throughput.

Redis uses a single core and shows better performance than Memcached in storing small objects (less or no processing) when measured in terms of cores.

Load balancer

We may add a load balancer in front of the URL shortening application server as well as in front of the key generation service.

We may use a simple round-robin approach for request distribution. In this approach, LB distributes incoming requests equally among backend servers.

If a server is dead, LB will stop sending any traffic to it.


NK Chauhan

NK Chauhan is a Principal Software Engineer with one of the biggest E Commerce company in the World.

Chauhan has around 12 Yrs of experience with a focus on JVM based technologies and Big Data.

His hobbies include playing Cricket, Video Games and hanging with friends.

Spring Framework
Core Java
Java Concurrency