Business Data and Analytics Articles

What is a Data Warehouse? | Data Warehouse Best Practices

Written by Brian Cook | Nov 3, 2022 10:42:00 AM

In our previous blog on how to wrangle your data into a single source of truth, we discussed the many methodologies for consolidating your data into a single trustworthy source. For this discussion, we shall refer to the aggregating tool as a data warehouse.

This begs the question, does a data warehouse in and of itself meet the criteria of a single source of truth? Spoiler alert: It does not unless it becomes the single source where:

  • all the business rules are defined
  • proper data governance is consistently applied
  • data sources are known, screened, and groomed
  • the data is cleansed and can be enhanced before it passes muster

What is a Data Warehouse?

A data warehouse is one component of an enterprise-wide data management strategy. At its core, a data warehouse is a data storage model that consolidates and aggregates data from multiple sources into a single data repository. Still, on its own, it does not ensure the accuracy, consistency, or validity of all data sources across the enterprise.

Data Warehouse Best Practices

Improving your data warehouse systems to make better business decisions can occur when you employ the following three building blocks:

1. Utilize data modeling

2. Employ data quality tools and data governance

3. Implement a comprehensive testing approach


Building Block #1

Utilize Data Modeling for Design Based on Business Requirements

As part of the data strategy, you must start with analysis and design. Data modeling is the mechanism that enables the warehouse to be designed. And typically, a tool such as Erwin is utilized for data modeling).

The Data Definition Language (DDL) is key to data modeling. DDL defines everything. It is what will get the actual database created on the platform. Your database must perform in response to DDL commands that alter the database by creating new tables, indexes, relationships, etc. In other words, DDL governs the actual database you’ll use (for example, SQL Server or Oracle).

When you run the DDL against the database, it creates the aforementioned structure. When the structure is completed, you’ll have the “wiring” of the database ready for data to be cleansed, loaded, and sent where the business needs it for the best decision-making.

The Benefits of Using a Data Modeling Tool

The beauty of a Data Modeling tool is that once the database is created, new tables or fields are added, etc., all maintenance going forward is handled through the model and can be automated through the resulting DDL. Specifically:

  • The building of the Data Model starts with business specifications. Then, database coders use the output DDL to physically maintain the database.

  • The Data Modeling tool provides a user-friendly view of the data structures and their elements without diving deep into the database's depths to look for each definition. In addition, it is an ongoing mechanism that can be iterated as updates and charges are needed.

Building Block #2

Employ Data Quality Tools/processes in Conjunction with a Data Governance Strategy

Data quality is two things: cleansing and enhancing. Cleansing means the data is standardized based on defined rules. Enhancing means the data is improved for use, such as address verification. The data quality rules will evolve over time based on user feedback and ongoing analysis, thus continually increasing trust in the data. 

One principle of management is that for every job, there must be someone designated to do it. This also applies to data governance. If you’re building a data warehouse, you must have owners of the data. Someone has to own every piece of data within the organization. So data ownership is joined at the hip with data governance.

The governance step will generate the rules by which the data is manipulated in the extract, transform, and load (ETL)  process of data processing we discussed in our blog on real-time data processing.

So, data quality tools and processes are the keys to overcoming the old “GIGO” (garbage-in-garbage-out) rule. In business data, the quality—i.e., trustworthiness—of the data you depend on for the best decisions is crucial. If this step isn’t taken, the data analysis won’t be as accurate if the information isn’t clean. You’ll get outliers that make analysis difficult, if not impossible.

Building Block #3

Implement a Comprehensive Testing Approach

The proof of the data quality—again, trustworthiness--is in the testing. There are four methods for implementing a comprehensive testing approach are:

  1. Database/schema testing

    This is validation that the actual database structure was created as expected, the DDL worked properly, and it aligned with the data model.

  2. Data validation testing

    Typically, this happens as a result of running tests on data load. You have to look at the data within the tables to ensure that the specifications have been applied correctly. The developer typically does testing.

  3. Performance testing to ensure SLAs are met

    You have to understand what type of performance you’re looking for, i.e., how fast the questions get answered and returned. There are two ways to ask questions:

    The front-end GUI mode is where the answer is returned immediately while the user sits in front of the screen—for example, booking an airline flight. Actions like this require a quick response time and are typically subject to a Service Level Agreement (SLA) that is pre-determined based on the business use case.

    The backend mode submits a query and the results are returned at a later time. In this mode, performance isn’t as critical because the user isn’t sitting and waiting for the answer. For example, a report is prepared and staged for you. 

  4. GUI (front-end) and business rule testing

    This is especially critical for the GUI (graphical user interface) on the front end, as answers are expected in real-time. Business rule testing of the dashboard validates answers. 

    Testers may need to go back to the source system and see if the rules apply to ensure dashboards display what is expected. Business users typically do this to ensure the dashboard is reliable and trustworthy.

    If this testing piece isn’t completed successfully, trust may be lost in the validity of the data. Users are liable to revert to old habits of shadow IT and hoarding their data in silos.

Build a Strategy Based on Trustworthy Data

If the building blocks of an effective enterprise-wide data management system are the concrete, the steel would be the trust. Without trust in the data, the questions you ask and the analysis you seek are worthless.

By following the previous steps, you can set up a trustworthy data warehouse that can be maintained and enhanced as a building block to your business growth.

Take the first step to transform your business from a state of information overload to an organized, data-driven decision-making culture. Download the eBook, "The Executive's Guide to Building a Data Strategy That Leads to Business Growth & Innovation."