Skip to main content

Understanding Cardinality: Unlocking the Meaning Behind Data Relationships

Cardinality can help you better understand your data. Learn more about cardinality and how it’s used here.

Databases are essential for the modern world. Every business in every industry can benefit from computer resources, and those resources are often built on top of databases. If you handle any kind of transaction, you need to create a ledger to keep up with it, which falls into the world of databases.

If you want to understand the tools that manage data tracking (not to mention the analytics that can come along for the ride), then that journey starts with an understanding of database cardinality.

What is cardinality in databases?

To define cardinality, it’s important to realize that databases are really spreadsheets at the heart of it all. Of course, they can be massive in terms of data points and complexity, but they’re just vast ledgers of data.

That said, cardinality is a way to compare the number of rows in the sheet to the number of columns. Specifically in regards to databases, cardinality is usually a measure of how many distinct values you have in a column compared to the number of rows.

Let's stop and note that this is different from cardinality in data science. The scientific definition often involves infinite sets, countably infinite columns, uncountable sets, and other higher concepts that often include "infinite set" in the description.

That might sound oddly specific and hardly useful, but if you keep in mind the concept of comparing rows and columns, further explanations of cardinality and its uses will clarify everything.

The first bit of clarity comes from the idea of high and low cardinality. If a table has a large number of elements relative to the number of rows in the database column, then that is high cardinality. You can also have low cardinality, which suggests that a lot of values are repeating.

Comparing high and low cardinality can help you calculate variance and think about the quality of a column at a glance. Let’s consider a simple example.

Say you’re doing payroll. Your data table will have a unique identifier for each employee (usually a name or tax ID number). You'll most likely see the hours worked, the pay rate, and the total owed. The list of names will have high cardinality, while the hourly pay rate will often have low cardinality. That gives you an idea of what kind of data is present in each column.

Hopefully, this example gives you a better idea of what cardinality is.

Why is cardinality beneficial in a database?

Ultimately, the value of cardinality in a database isn’t really for the person viewing the data. Instead, the value is found in searches and queries.

If you want to find a specific data point in your database, you need the computer system to run it for you, but in a custom database, someone has to build that search function. Cardinality is used by the computer to figure out whether or not multiple possible answers exist (or how likely they are). This can help with creating search results and hierarchies.

That means that high and low cardinality are concepts used by developers to design efficient query systems.

You can also think about how this might help with data analytics. Computer systems need to be able to distinguish values that might have identical inputs, and cardinality can help a developer code around those redundancies.

Types of cardinalities

So far, we’ve talked about high and low cardinality in databases, but the concept also involves a deeper level. You can have cardinality in data modeling.

Now, this goes hand-in-hand with the cardinality we’ve already discussed. It’s still comparing unique values in columns to the number of rows.

But, when you consider data modeling, there are a few more cardinality classifications that become important. They're typically broken up into one-to-one, one-to-many, and many-to-many cardinalities.

One-to-one

One-to-one correspondence is often less common in modern databases, but they do come up, and it’s important to understand how they work.

At the theoretical level, the name says it all. For any item in one column, there's exactly one related item in another column. That relationship is two-directional.

This doesn't mean that the columns have to be the same size. A column can maintain one-to-one correspondence even if some items don't have a pair. Looking at the number of items in each column won't reveal this relationship.

Consider the following example.

Let’s say a large company has thousands of employees, and every employee needs a business email. If you were going to keep track of this in a database, you would need a unique name for each employee (often using an ID number) and their unique email address in the ledger. Each name is related to one email and vice versa.

One-to-many

Now that you know the cardinality meaning and have an idea of what a one-to-one relationship is, the one-to-many concept will be easy to grasp.

In this case, you have a column with unique entries (column “A”). Each entry can be related to multiple entries in another column (column “B”). As such, anything in column A can link to multiple things in B, but each item in B will only link to one thing in A.

An easy example to understand this would be matching sales representatives to their clients. Each rep can have dozens of clients, but to keep relationships stable, each client only has one sales representative. It’s a one-to-many relationship.

Many-to-many

Lastly, we can cover the many-to-many relationship. Any item in column A can have multiple pairs in column B; the reverse is also true.

We can actually use the sales representative example to think about many-to-many relationships.

What if the company is a microchip manufacturer with massive, multi-billion-dollar clients? That company most likely has well-defined sales teams, and each team might handle a few clients. In this case, any client has multiple individual reps, and each agent might work on a couple of different contracts.

This is the idea of a many-to-many relationship.

Cardinality examples

In the previous sections, you learned about different types of cardinal arithmetic and relationships with some simplified examples. Now, we can look at some real-world applications of cardinality to see exactly how it can function in a database.

Uber

Let’s think about designing our own database that would function for Uber.

It’s a well-known company with a well-known model, but if you're not familiar, here’s how it works. Uber is a smartphone app. You can open the app and request a ride from your current location to anywhere else. The app will connect you with a driver, assign the fair, figure out the navigation, and handle all logistics. Your driver will show up, take you where you want to go, and you'll pay for the ride via the app.

For such a large and complicated app, databases built from intent data are definitely in play, and one that's easy to imagine is a database that keeps track of customers and drivers. In this case, every driver and customer needs a unique identifier.

That means that our driver column (D) and our customer column (C) both have high cardinality (in fact, they have infinite cardinality). Every item in each column is unique.

We can also think about the relationships between the columns. At a glance, it might seem like any driver can have multiple passengers and vice versa, but we actually have one-to-one relationships for a transaction. A driver can only have one customer at a time—even if the customer is technically a group of people, there will only be one customer ID on the transaction. We need a one-to-one database to match customers and drivers for any individual transaction.

Amazon

If any company in the world uses databases, it’s Amazon. For this example, let’s consider the Amazon online store. There are a lot of retailers who have shops on the website, and there are billions of people who shop via Amazon.

Once again, every shop and every shopper needs a unique ID in order to manage each transaction. However, this case is more complicated than the Uber example.

When you shop on Amazon, you can purchase things from multiple retailers, all in the same transaction. Amazon handles the logistics, so you don’t need to know if your items originate from different locations. For you, it’s a single transaction.

Meanwhile, shops can also sell to multiple users simultaneously, but each transaction will only be with one user.

In this case, we have a one-to-many relationship. You can buy from multiple stores, but each store is only selling to you in that particular transaction.

Enhance data modeling with cardinality

Now that you know more about data cardinality and how it works with databases and data modeling, the next step is to dive in and explore tools that can deepen your understanding and empower you to work directly with databases.

Mailchimp reports make it easy to assess data and gauge cardinality with an easy-to-read dashboard. Try Mailchimp today.

Share This Article