Scan your Data with Azure Purview

Dr. Niraj Kumari
5 min readJul 18, 2021

Azure Purview is a unified data governance service that helps you manage and govern your on-premises, multicloud and software-as-a-service (SaaS) data. Easily create a holistic, up-to-date map of your data landscape with automated data discovery, sensitive data classification and end-to-end data lineage. Empower data consumers to find valuable, trustworthy data.

Azure Purview Features

Azure Purview got recently launched in preview mode primarily as data governance and catalog service. This service has some very interesting features which one should be aware of before starting to practically work with this service on the Azure portal. Some of the major and important features are listed below.

  • Support for hybrid data platforms — This service supports sourcing metadata from supported data repositories hosted on-premises, on Azure cloud, as well as on cross-clouds
  • Automated data discovery — It offers a mechanism to scan data repositories and detect metadata with more than 100 classification rules that automatically classify the attributes based on the matching rules
  • Lineage identification — Purview has a mechanism with which one can bind it to services like Azure Data Factory from which it can automatically extract lineage
  • Purview Metadata Catalog and Business Glossary — It supports the creation of business glossary terms that can be used and mapped with the metadata created in the purview metadata catalog
  • Data Map with Search — Different data repositories from different hosting sources can be organized in the form of collections which allows the creation of data maps. The metadata created for these data sources can be easily discovered using the search functionality

Azure Purview provides three main functions, starting with the Data Map, which provides fast and precise scanning across your data estate as well as showing Lineage, i.e., where data is sourced and where it’s targeted when it’s transformed. Lineage is tracked both at the asset and column level for supported data sources.

Second is a Data Catalog to present all discovered data sources so that the right people can easily understand what data is there and where it’s stored. Finally, there’s Data Insights which gives you reports to understand what assets you have, glossary terms across them plus your classification and labelling results.

Steps to create Azure Purview account

· First login to your Azure account with your credentials and go to create a resource.

· Next, search For Azure purview and click on create.

· One you landed on Create Purview account page fill the required details (Purview account name, desired location, capacity unit) with the target resource group and click on Review+Create. After review you will get the Create option click on it and boom…….you will have your first Purview account 😊.

Now we have our first Azure Purview account….😊

The role of a data catalogue

A data catalogue lifecycle will register the various sources, discover the shape of the ingested dataset, Lineage and trace the data as it flows through intermediate layers, and finally enable analysts and engineers to consume those data by downstream applications.

1. Register

Azure Purview allows you to register not only cloud native storage and databases such as Azure Data Lake, Azure CosmosDB, Azure Synapse, but also on-premise databases, e.g. SQL Server etc. you can click on Register resource to add your Databases.

You can also logically group assets based on a project or an entire data asset in the organization. An example is shown below:

2. Discover

Let us take the next step. data discovery can help ensure your data lake doesn’t turn into a data swamp. You have TBs of data in your Data Lake connected to 100+ data sources. Is that good enough? No. Data lying undetected or unexplored is as bad as, if not worse than, not ingesting large data in the analytics domain. Hence, the ease of discovering data empowers the end users or business users and this capability will determine the adoption of the Data Lake asset in an organisation.

3. Lineage

The ability to trace and track the data as it goes out of the data source and then goes through multiple layers - usually raw, curated, enriched etc — allows an environment of transparency in the data organisation and encourages self-service analytics in the ecosystem. Azure Purview offers the lineage capability all the way from the source to the destination.

4. Consume

Azure Purview gives the facility to add glossary term in datasets to solves problem by enriching the catalogue, adding business terms and definitions and linking those terms with the datasets or assets which acts as a glue between the business decision users and the technical engineers, and is a very common setup.

Get started with Azure Purview today

Create an Azure purview account today and start understanding your data supply chain from raw data to business insights with free scanning for all your SQL Server on-premises and cross cloud.

More info:

https://medium.com/microsoftazure/azure-purview-83817fc50922

https://techcommunity.microsoft.com/t5/azure-purview/map-your-data-estate-with-azure-purview/ba-p/1958197

https://cloudblogs.microsoft.com/industry-blog/en-gb/technetuk/2020/12/10/unified-data-governance-using-azure-purview-preventing-data-lake-from-becoming-a-data-swamp/

--

--

Dr. Niraj Kumari

Data Analyst by profession. I love exploring data analytics using various tools and techniques. Doctorate from Banaras Hindu University.