Skip to content

Run an ADF Pipeline when an SAP Extraction File is uploaded to Azure Storage

The following article shows how to run an event-driven pipeline in Azure Data Factory to process SAP data extracted with Xtract Universal into an Azure Storage.

About

The depicted example extracts and uploads SAP customer master data to Azure Storage.
An event then triggers an ADF pipeline to process the SAP parquet file, e.g. with Databricks. Xtract Universal supports different file formats for Azure storage, the depicted example uses Apache Parquet, which is a column file format that provides optimizations to speed up queries and is more efficient than CSV or JSON.

Target audience: Customers, who utilize Azure Data Factory (ADF) as a platform for orchestrating data movement and transformation.

Note

The following sections describe the basic principles for triggering an ADF pipeline. Keep in mind, this is not a best practice document or a recommendation.

Azure Storage

Xtract Universal extracts SAP data and loads it into an Azure Storage as a parquet file. An Azure Storage event trigger is used to run an ADF pipeline for further processing of the SAP file.

ADF Pipelines and Storage Event Triggers

The Master pipeline is triggered by an Azure Storage event and calls a child pipeline for further processing. The Master pipeline has an event trigger based on Azure storage.

The Master pipeline has 2 activities:

  • write a log to an Azure SQL database (optional)
  • call a Child pipeline to process the parquet file with Databricks

This article focuses on the Master pipeline. The Child pipeline processes the parquet file e.g., with Databricks. The Child pipeline in this example is a placeholder.

Use Azure SQL for logging (optional)

In the scenario depicted, the ADF pipeline executes a stored procedure to log various details of the pipeline run into an Azure SQL table.

Prerequisites

Procedure

  1. Define an SAP extraction and set the destination to Azure Storage.
    The depicted example uses a storage account xtractstorage and a container called ke-container:

    XU_Extraction_AzureDest1

    XU_Extraction_AzureDest1

    XU_Extraction_AzureDest1

  2. Define two pipelines in ADF:

    • The master pipeline ProcessBlogStorageFile contains 2 activities.
      The first activity sp_pipelinelog executes an SQL stored procedure to write a log entry to an Azure SQL table. The second activity runs a dummy subpipeline . As both activities are out of the scope of this article, there are no further details.
      ADF_Pipeline
    • The child pipeline ProcessWithDataBricks processes the parquet file e.g., with Databricks.
  3. Define the following parameters:
    • fileName: contains the file Name in the Azure Storage.
    • folderPath: contains the file path in the Azure Storage.
  4. Click [New/Edit] to add a new Storage Event Trigger in the ADF Pipeline.
    ADF_Pipeline_Trigger00
  5. Adjust the details and use the Storage account name and Container name defined in the Xtract Universal Azure Storage destination:
    ADF_Pipeline_Trigger01
  6. Adjust the event trigger parameters that are used as input parameters for the Master Pipeline:
    ADF_Pipeline_Trigger03
    • @triggerBody().fileName
    • @triggerBody().folderPath
  7. Publish the pipeline.
  8. Run the extraction in Xtract Universal.
  9. When the extraction finishes successfully, check the Azure Storage.
    Azure_Storage_Parquet
  10. Check the log table in Azure SQL. The log table contains an entry, each for the master and child pipeline.
    SQL_log
  11. Check the trigger and pipeline runs in ADF.

    ADF_Trigger_Run

    ADF_Pipeline_Run

Download JSON Templates

Downloads for the trigger and the master pipeline are provided below:

Download Trigger as json Download MASTER pipeline as json


Last update: September 13, 2024
Author(s): Khoder Elzein, Valerie Schipka