A guide to the 10 most common data modeling mistakes

A guide to the 10 most common data modeling mistakes

Big data visualization.
Image: garrykillian/Adobe Stock

Data modeling is the process through which we represent information system objects or entities and the connections between them. Such entities could be people, products or something else related to your business; regardless of the entity type, modeling them correctly results in a powerful database set up for fast information retrieval, efficient storage and more.

SEE: Job description: Big data modeler (TechRepublic Premium)

Given the advantages that data modeling offers for database insights, it’s important to learn how to do data modeling effectively in your organization. In this guide, I’ll point out some key mistakes to avoid when modeling your data.

Jump to:

  1. Not considering quality data models as an asset
  2. Failing to consider application usage of the data
  3. Schema-less does not mean data model-less
  4. Failing to tame semi-structured data
  5. Not planning for data model evolution
  6. Rigidly mapping UI to your data’s fields and values
  7. Incorrect or differing levels of granularity
  8. Inconsistent or nonexistent naming patterns
  9. Not separating the concept of keys from indexes
  10. Starting too late on data modeling

Not considering quality data models as an asset

As Microsoft Power BI consultant Melissa Coates has pointed out, we sometimes optimize our data models for one particular use case, such as analyzing sales data, and using the model quickly becomes more complicated when analysts need to analyze more than one thing.

For example, it can be difficult for analysts to jump to analyzing the intersection of sales and support calls if models have been optimized for sales data alone. That is not to mention the additional time, resources and possible costs that might go into making additional models when a single model would have sufficed.

To prevent this kind of model inefficiency, take the time upfront to ensure your data model offers broader applicability and makes good longer-term financial sense.

Failing to consider application usage of the data

One of the hardest things about data modeling is getting the balance right between competing interests, such as:

  • The data needs of application(s)
  • Performance goals
  • How data will be retrieved

It’s easy to get so consumed with considering the structure of the data that you spend insufficient time analyzing how an application will use the data and getting the balance right between querying, updating and processing data.

SEE: Hiring kit: Data scientist (TechRepublic Premium)

Another way of stating this mistake is having insufficient empathy for others who will be using the data model. A good data model considers all users and use cases of an application and builds accordingly.

Schema-less does not mean data model-less

NoSQL databases (document, key-value, wide-column, etc.) have become a critical component of enterprise data architecture, given the flexibility they offer for unstructured data. Though sometimes mistakenly thought of as “schema-less” databases, it’s more accurate to think of NoSQL databases as enabling flexible schema. And though some conflate data schemas with data models, the two serve different functions.

A data schema instructs a database engine on how data in the database is organized, whereas a data model is more conceptual and describes the data and relationships between the data. Regardless of this confusion as to how flexible schema might impact data modeling, just as with a relational database, developers must model data in NoSQL databases. Though depending on the type of NoSQL database, that data model will either be simple (key-value) or more sophisticated (document).

Failing to tame semi-structured data

Most data today is unstructured or semi-structured but, as with mistake number three, this doesn’t mean that your data model should follow those same formats. Though it can be convenient to put off thinking through how to structure your data at ingestion, this almost inevitably will hurt you. You can’t avoid semi-structured data, but the way to deal with it is to apply rigor in the data model rather than taking a hands-off approach during data retrieval.

Not planning for data model evolution

Given how much work can go into mapping out your data model, it can be tempting to assume your work is done when you’ve built the data model. Not so, noted Prefect’s Anna Geller: “Building data assets is an ongoing process,” she said, because “as your analytical needs change over time, the schema will have to be adjusted as well.”

One way to make data model evolution easier, she continued, is by “splitting and decoupling data transformations [to] make the entire process easier to build, debug and maintain in the long run.”

Rigidly mapping UI to your data’s fields and values

As Tailwind Labs partner Steve Schoger has highlighted, “Don’t be afraid to ‘think outside the database’”. He goes on to explain that you don’t necessarily have to map your UI directly to each data field and value. This mistake tends to stem from fixating on your data model rather than the underlying information architecture. The problem also means that you are likely presenting data in ways that are more intuitive to the application’s audience than a one-to-one mapping of the underlying data model.

Incorrect or differing levels of granularity

In analytics, granularity refers to the level of detail we can see. In a SaaS business, we might, for example, want to see the level of consumption of our service per day, per hour or per minute. Getting the right amount of granularity in a data model is important because, if it’s too granular, you can end up with all kinds of unnecessary data, making it complicated to decipher and sort through it all.

But with too little granularity, you may lack sufficient detail to tease out important details or trends. Now add in the possibility that your granularity is focused on daily numbers, but the business wants you to determine the difference between peak and off-peak consumption. At that point, you’d be dealing with mixed granularity and end up confusing users. Determining your exact data use cases for internal and external users is an important first step in deciding how much granular detail your model needs.

Inconsistent or nonexistent naming patterns

Rather than inventing a unique naming convention, you’re better off following standard approaches with data models. If tables, for example, lack a consistent logic in how they’re named, the data model becomes very difficult to follow. It can seem clever to come up with obscure naming conventions that relatively few people will immediately understand, but this will inevitably lead to confusion later, especially if new people are onboarded to work with these models.

Not separating the concept of keys from indexes

In a database, keys and indexes serve different functions. As Bert Scalzo has explained, “Keys enforce business rules – it’s a logical concept. Indexes speed up database access – it’s a purely physical concept.”

Because many conflate the two, they end up not implementing candidate keys and thereby reduce indexes; in the process, they also slow down performance. Scalzo went on to offer this advice: “Implement the least number of indexes [that] can effectively support all the keys.”

Starting too late on data modeling

If the data model is the blueprint to describe an application’s data and how that data interacts, it makes little sense to start building the application before a big data modeler has fully scoped out the data model. Yet this is precisely what many developers do.

Understanding the shape and structure of data is essential to application performance and, ultimately, user experience. This should be the first consideration and brings us back to mistake number one: Not considering quality data models as an asset. Failing to plan out the data model is essentially planning to fail (and planning to do a lot of refactoring later on to fix the mistakes).

Disclosure: I work for MongoDB but the views expressed herein are mine.

SEE: Top data modeling tools (TechRepublic)

Source of Article