Master Data is any information that is considered to play a key role in the core operation of a business. Master data can include data about clients, customers, employees, inventory, supplier’s, transactions or many more. Master data is the heart of every business transaction, report and decision, it is commonly stored and replicated across systems.
Master Data Management(MDM) is the process of creating and managing the master data. MDM is very important in organization as it provides a single version of truth. Master data management eliminates the risk of having multiple copies of data that are inconsistent.
MDM is typically more important in larger organizations, as the bigger the organisation means there are more disparate system within the organization. If there is many sources are used to collect data, makes it difficult to provide a single source of truth. The main goal for MDM is to provide process for collecting, aggregating, matching, consolidating,assuring quality and distributing critical data throughout an organization to ensure consistency and control in the ongoing maintenance and application use of this organization. To improve the operation of the business it is very important to manage this master data.
Data Governance(DG) is the specification of who is accountable and who has the decision rights in the overall management of the availability, usability, integrity, and security of data. It includes the valuation, creation, storage, standards and metrics to establish the effective and efficient use of information in enabling an organization to achieve its goals. The first step in data governance is to appoint the owners of the data.
Why Data Governance
There are many reasons why we need data governance today. Few reasons are below:
Ever Growing Data – One of the reason is that data is growing more than humans have ever created. We are gathering data from every where. Data is being collected at a very fast rate which is beyond human’s capability to process it in traditional ways.
New Sources of data – As the data is growing very fast and more rapidly than ever. Most of this information is coming from new sources, such as web traffic logs, web pages, media feeds, smart phones, smart watches, weather satellites and many more other sources.
So, to deal with different aspect of the data we need a body in the organisation who is responsible of collecting and managing the data.
Roles of Data Governance
Chief Data Officer(CDO) is the senior executive role which is dedicated to data. CDO is responsible for the organization’s data and information strategy, governance, control and policy development. CDO role is also involve the responsibility and accountability of information protection and privacy, data quality and lifecycle management.
Data Owner are the decision makers for establishing data quality requirements. Their responsibility is to establishing data quality requirements, Determining and approving access, understanding the legal/compliance/regulatory issues of data.
Data Stewards manage the process to maintain the data for the owners. Their responsibility involves, implementing data owners directive/policies/standards, tracking data, recommending ideas to data owners etc.
Data quality is one of the main aspect in business. As every business have some sort of data, it has become very important that we take the data quality into account and improve it. In business data quality is measured to determine whether or not data can be used as a basis for reliable business intelligence and to make company decisions. High quality data lead to a valuable information insights and to improve revenue, where bad quality data will lead to the loss in organization revenue.
Dimensions of Data Quality
It is normally clear that data quality is about cleaning the bad data like, missing , incorrect or invalid data. But we want to assure that data is trustworthy, it is important to understand the key dimensions of data quality. There are mainly six dimensions of data quality.
Completeness – Data can be complete even the optional data is missing. It is important that we have all our mandatory data. For example, if the customer’s first name and last name is available as these are the mandatory fields even we don’t have customer’s middle name as its optional. Data will still be considered as complete
Timeliness is referred whether the data or information is available when it’s needed and expected. The timeliness of data is extremely important for organizations. Example of data not delivered on time,
- Courier package is dispatched but company have not updated the system and it still says that package is not dispatched.
- The census data is available after two years the census is done.
Consistency means data across the enterprise reflects the same information and should be sync. Example:
- Student has paid the student fees but it still shows that fees is due.
- Electricity account is closed but still receiving the bill.
Validity of data is about if the data itself is valid or not. Example
- In phone number field we should have all numbers. Characters like *,#,@ should be considered as invalid.
- Email field should contain @ and . sign to be valid.
Integrity means the validity of data across the relationships and ensures all the data in database can be traced and connected to each other. Example
- If there is a address in customer record but there is no customer name then it is considered as not valid.
Accuracy is one of the main key factor in data quality. After we have achieved all the first five steps then it comes to how accurate the data is. It refers to the data quality as how accurate the data is when compared to the real world object. Example:
- Address of student in college is the real address where student actually lives.