Step By Step Guide on Databricks Unity Catalog Setup and its key Features.
In this blog I will be providing you with Steps to Set up Unity Catalog .
In this article you will understand about:
- Basic overview of Unity Catalog like its important features and architecture.
- How to create a metastore.
- How to assign a metastore to workspace.
Lets first understand about Unity Catalog, it is unified governance solution for data and AI assets on the Lakehouse.
Four key functional areas of Unity Catalog:
- Data Access Control — It controls who has access to which data.
- Data Access Audit — It captures and record all access to data.
- Data Lineage — Captures upstream sources and downstream consumers.
- Data Discovery — It provides ability to search for and discover authorized assets.
Unity Catalog Three Level Namespace:
Metastore →Catalog →Schema(Database) →Table/View/Volume
We can reference all data in Unity Catalog using a three level namespace : catalog.schema.table
Let us compare the Architecture before Unity Catalog and with Unity Catalog:
Requirements:
To create a metastore: You must be an Azure Databricks account admin.
Let us see practically how to setup Unity Catalog:
Step:1
Create an Azure Databricks workspace with Premium pricing tier.
Step:2
Click on Manage Account and login into Account console.
Step:3
Next step is click on the Data tab and create metastore tab.Here you can see different metastores which i have created before.
Step:4
Here we need to provide the metastore name, region(best practice is choose same region and resource group), ADLS Gen2 path and access connector id.
Step:5
In this step lets see how to get ADLS Gen2 path and Access connector id. In ADLS Gen2 select endpoint.
Here we need to modify the endpoint https://ucstorage5.dfs.core.windows.net/ , here replace https with the container name and our modied endpoint is uccontainer1@ucstorage5.dfs.core.windows.net/d1 , here d1 is the dicrectory i have created inside the container.
And for Access Connector id we need to create Access Connector for Azure Databricks
Here copy the Resource id , this is our required Access Connector id.
Now we are able to fill all the details we left in Step4. Here before clicking create tab we need to grant Storage blob data contributor role to the Databricks Access connector. Lets see this in next steps..
Note: Here before clicking create tab we need to grant Storage blob data contributor role to the Databricks Access connector. Lets see this in next steps(Step:6 and step:7)
Step:6
In the storage account, go to Access Control (IAM) , add role assignment and grant the new service principal the Storage blob data contributor role to the Access Connector for Azure Databricks(here i have created with the name(accessconnector2).
click on Add role assignment and search for Storage Blob Data Contributer and click next.
Now , click to assign access to User, groups or service principle and click on select members and search for Access Connector for Azure Databricks(here i have created with the name(accessconnector2) and click next.
Now we can see in the role assignment we have given Storage Blob Contributor role to accessconnector2.
Step:7
In this step i am providing Manage ACL permission to the Access Connector for Azure Databricks(here i have created with the name(accessconnector2).
First click on Manage ACL and then select Access permissions and Add principal.
Search for Access Connector for Azure Databricks(here i have created with the name(accessconnector2).
click on select and give Read, write and Execute permissions as shown in the screenshot below.
Similarly for the Default permissions click on configure default permissions and add principal and search for accessconnector2.
Here also grant Read, write and Execute permissions as shown in the screenshot below and save.
Step:8
Now after providing all the necessary permissions in step:6 and step:7
click on Create tab.
After creating the metastore, here see our metastore2 is successfully created. Next step is to assign the workspace , here i have selected my unitycatalogdemo workspace.
Now click on Enable Unity Catalog
Congratulations, Metastore is succesfully created.
Now lets open the assigned workspace.
Here,click on Data tab and assigned metastore having name metastore2 is clearly visible.
Now we can easily create Catalogs.Here i have created Catalog1.To create a catalog, you can use Data Explorer or a SQL command.
Step:9
We can also grant various permissions to any Groups/Users.
Next steps:
- Create and manage schemas (databases)
- Create tables
Useful links:
My linkdin id : https://www.linkedin.com/in/saurav-kumar-919a70109/
Thanks you for your valuable time.
!!!!!Happy Learning!!!!!