Skip to content

nansravn/Databricks101

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Azure Databricks Samples

Topic 1: Deployment of R models w/ Azure Databricks

If we are discussing a deployment architecture for ML batch scoring scenarios w/ R code, the core components of a deployable architecture could be:

  1. Azure Data Factory
  2. Azure Data Lake Storage
  3. Azure Databricks

Databricks Batch Scoring Architecture

How to start deploying R/SparkR code in Databricks?

Step 0: This is a nice community post to read if SparkR is a novelty to you:

Important quote:

  • “The SparkR API presents a full R interface, supplemented with the {SparkR} package. As an experienced R user, you will be familiar with the R data.frame object. Here's the critical point - SparkR has its own DataFrame object, which is not the same thing as an R data.frame. You can convert between them easily (sometimes too easily), but you must respect which is which.”

Step 1: Create an Azure Databricks Workspace

Step 2: Create an ADLS (Azure Data Lake Storage)

  • Obs.: Create the ADLS on the same region that you’ve provisioned Azure Databricks

Step 3: Create a cluster inside Databricks

Step 4: Execute and understand this sample code (SparkR + ADLS.r) . Tasks performed on this sample:

  • ADLS (Azure Data Lake Storage Gen1) Mount for usage with R and SparkR
  • Usage of Databricks dbutils library
  • R and SparkR read/write taks
  • DataFrame/data.frame mapping between R and SparkR

Orchestrating Databricks batch scoring

Here are some additional resources for understanding the orchestration of the R models execution:

About

Sample Codes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published