Topic:
Data SilosIn your quest for a data-driven culture—i.e., making critical business decisions based on facts, not just gut feelings—you need quality data. The data must serve the systems and the people who need it to run their part of the business. Systems need the flow of raw data, but people need the context and format that comes from processes involved in data integration and, where necessary, data transformation.
Data transformation makes the data usable, understandable, and accurate. Processes such as data integration, data migration, data warehousing, and data wrangling may involve data transformation. This article on integrating multiple data sources highlighted several approaches to data integration. Those approaches are the first steps in using data to make business decisions based on the best analytics.
You must have transformed and integrated the data to perform analytics. The analytics will tell you about your business's problems and opportunities, which will help you grow.
This post focuses on why data transformation is a critical process in processing the flow of vital business information. The goal is to transform all that data through processes that convert eclectic information and work products into a resource that can be curated, calculated, and reported for discovering trends and making the best business decisions.
We will discuss the following:
Data Transformation is part of an enterprise data and analytics initiative to support the decision-making process of diverse audiences across the organization. It is often the process of converting data from one or more sources into a new source that is cleansed, validated, and in an easy-to-use format for analytics consumption.
In addition to consumption by people and uses for analytics, the product of data transformation can also be piped into other systems. Data transformation may be:
Often, before data can be shunted to users who need it most, it must be transformed. For example, medical facilities need to transmit data from their systems to other systems or parties. The format of the data may be useful for informational purposes, but not for analysis. So, to perform analytics, the data needs to be transformed before it can be incorporated into a database.
Data transformation is the process of converting data from one or more sources into a new source that is cleansed, validated, and in an easy-to-use format for analytics consumption.
For data analytics projects, data may be transformed at stages of the data pipeline. For example, organizations that use on-premises data warehouses generally use an ETL (extract, transform, load) process, in which data transformation is the intermediate step.
This intermediate step means that data transformation:
Finally, data transformation may be one or a combination of the following four approaches:
The foregoing approaches are included elements of the 12 components of data transformation described in the next section.
Data transformation consists of the following enhancements/components:
Data transformation must include rules that define the actions and changes taken on data. Companies must define, document, and put governance in the rules for consistency and accuracy. A large company will be doing different things in different locations, creating inconsistent information across the organization.
Data transformation can be expensive. The cost depends on the specific infrastructure, software, and tools used to process data. Expenses may include licensing, computing resources, and hiring necessary personnel.
Data transformation processes can be resource-intensive. Performing transformations in an on-premises data warehouse after loading or transforming data before feeding it into applications can create a computational burden that slows down other operations.
Finally, a lack of expertise and carelessness can introduce problems during transformation. Data analysts without appropriate subject matter expertise are less likely to notice typos or incorrect data because they are less familiar with the range of accurate and permissible values. For example, someone working on medical data who is unfamiliar with relevant terms might fail to flag disease names that should be mapped to a singular value or notice misspellings.
Data transformation can increase the efficiency of analytic and business processes and enable better data-driven decision-making because:
Learn how your organization can begin or continue its journey to develop a data-cleansing strategy. Download our eBook, “Breaking Down Data Silos & Transforming Business Intelligence.”