CI/CD on Azure Data bricks using Azure DevOps

Sagar Lad
3 min readMay 22, 2020

--

Data bricks is a cloud-based data engineering tool used for processing and transforming massive quantities of data and exploring the data through machine learning models.

CI/CD refers to the process of developing and delivering software in frequent cycles through the use of automation pipelines.

We can set up a CI/CD pipeline for Azure Data bricks Notebook deployment as follows :

  1. Create a Data bricks Workspace

https://azure.microsoft.com/nl-nl/resources/templates/101-databricks-workspace/

2. Assign a Contributor permission to Azure AD Group

3. Assign a Contributor permission to Service Principal

$Context = Get-AzContext
$TenantId = $Context.Subscription.TenantId
$SubscriptionId = $Context.Subscription.Id

$Databricks = Get-AzResource -ResourceGroupName $(ResourceGroupName) -Name $(DatabricksWorkspace)

New-AzRoleAssignment -ObjectId $(ServicePrincipalObjectId) -RoleDefinitionName Contributor -Scope $Databricks.ResourceId

4. Get Service Principal Object Id and Password

$ServicePrincipalObjectId=(Get-AzADServicePrincipal -DisplayNameBeginsWith $(ServicePrincipalName)).Id
write-host “##vso[task.setvariable variable=ServicePrincipalObjectId]$ServicePrincipalObjectId”

$ServicePrincipalPassword = Get-AzKeyVaultSecret -VaultName $(KeyVaultName) -Name $(SPNPassword)
$ServicePrincipalPwd = $ServicePrincipalPassword.SecretValueText
write-host “##vso[task.setvariable variable=ServicePrincipalPassword]$ServicePrincipalPassword”

$ServicePrincipalAppId=(Get-AzADServicePrincipal -DisplayNameBeginsWith $(ServicePrincipalName)).ApplicationId.Guid
write-host “##vso[task.setvariable variable=ServicePrincipalAppId]$ServicePrincipalAppId”

# Get Subscription ID
$Context = Get-AzContext
$SubscriptionId = $Context.Subscription.Id
write-host “##vso[task.setvariable variable=SubscriptionId]$SubscriptionId”

# Get Tenant ID
$tenantId = $Context.Subscription.TenantId
write-host “##vso[task.setvariable variable=tenantId]$tenantId”

5. Generate Databricks Token using Azure Devops Task

6. Store the Data bricks Token to the Key Vault

Write-Host “Store Databricks Bearer token to $(KeyVaultName)”
$DatabricksSecretName = ‘$(adb_token_secret_name)’
$CurrentToken = (Get-AzKeyVaultSecret -VaultName “$(KeyVaultName)” -Name “$DatabricksSecretName”).SecretValueText

$Secret = ConvertTo-SecureString -String ‘$(BearerToken)’ -AsPlainText -Force
Set-AzKeyVaultSecret -VaultName ‘$(KeyVaultName)’ -Name “$DatabricksSecretName” -SecretValue $Secret -ContentType “Databricks Access Token” -Expires (Get-Date).AddMonths($(adb_token_expiry_months))

7. Deploy Data bricks Notebook using Azure Devops Task

8. Create Azure Data bricks Secret Scope

You can use Azure powershell or Databricks CLI to create a secret scope

9. Create a Data bricks Cluster

First check if the data bricks cluster exists or not using Azure Powerhell/command line task of Azure Devops and create a cluster if it doesn’t exists

End to End CD Azure Devops CD Pipeline for Azure Databricks Notebook Deployment/Cluster Creation/Secret Scope etc

If you like what you read, don’t forget to clap :)

--

--