Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse (2023)

  • Article
  • 21 minutes to read

APPLIES TO: Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse (1)Azure Data Factory Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse (2)Azure Synapse Analytics

This article outlines how to use Copy Activity in Azure Data Factory to copy data from and to a REST endpoint. The article builds on Copy Activity in Azure Data Factory, which presents a general overview of Copy Activity.

The difference among this REST connector, HTTP connector, and the Web table connector are:

  • REST connector specifically supports copying data from RESTful APIs.
  • HTTP connector is generic to retrieve data from any HTTP endpoint, for example, to download file. Before this REST connector you may happen to use HTTP connector to copy data from RESTful APIs, which is supported but less functional comparing to REST connector.
  • Web table connector extracts table content from an HTML webpage.

Supported capabilities

This REST connector is supported for the following capabilities:

Supported capabilitiesIR
Copy activity (source/sink)① ②
Mapping data flow (source/sink)

① Azure integration runtime ② Self-hosted integration runtime

For a list of data stores that are supported as sources/sinks, see Supported data stores.

Specifically, this generic REST connector supports:

  • Copying data from a REST endpoint by using the GET or POST methods and copying data to a REST endpoint by using the POST, PUT or PATCH methods.
  • Copying data by using one of the following authentications: Anonymous, Basic, Service Principal, OAuth2 Client Credential, System Assigned Managed Identity and User Assigned Managed Identity.
  • Pagination in the REST APIs.
  • For REST as source, copying the REST JSON response as-is or parse it by using schema mapping. Only response payload in JSON is supported.

Tip

To test a request for data retrieval before you configure the REST connector in Data Factory, learn about the API specification for header and body requirements. You can use tools like Postman or a web browser to validate.

Prerequisites

If your data store is located inside an on-premises network, an Azure virtual network, or Amazon Virtual Private Cloud, you need to configure a self-hosted integration runtime to connect to it.

If your data store is a managed cloud data service, you can use the Azure Integration Runtime. If the access is restricted to IPs that are approved in the firewall rules, you can add Azure Integration Runtime IPs to the allow list.

You can also use the managed virtual network integration runtime feature in Azure Data Factory to access the on-premises network without installing and configuring a self-hosted integration runtime.

For more information about the network security mechanisms and options supported by Data Factory, see Data access strategies.

Get started

To perform the Copy activity with a pipeline, you can use one of the following tools or SDKs:

  • The Copy Data tool
  • The Azure portal
  • The .NET SDK
  • The Python SDK
  • Azure PowerShell
  • The REST API
  • The Azure Resource Manager template

Create a REST linked service using UI

Use the following steps to create a REST linked service in the Azure portal UI.

  1. Browse to the Manage tab in your Azure Data Factory or Synapse workspace and select Linked Services, then click New:

    • Azure Data Factory
    • Azure Synapse

    Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse (3)

  2. Search for REST and select the REST connector.

    Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse (4)

  3. Configure the service details, test the connection, and create the new linked service.

    Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse (5)

Connector configuration details

The following sections provide details about properties you can use to define Data Factory entities that are specific to the REST connector.

Linked service properties

The following properties are supported for the REST linked service:

PropertyDescriptionRequired
typeThe type property must be set to RestService.Yes
urlThe base URL of the REST service.Yes
enableServerCertificateValidationWhether to validate server-side TLS/SSL certificate when connecting to the endpoint.No
(the default is true)
authenticationTypeType of authentication used to connect to the REST service. Allowed values are Anonymous, Basic, AadServicePrincipal, OAuth2ClientCredential, and ManagedServiceIdentity. You can additionally configure authentication headers in authHeaders property. Refer to corresponding sections below on more properties and examples respectively.Yes
authHeadersAdditional HTTP request headers for authentication.
For example, to use API key authentication, you can select authentication type as “Anonymous” and specify API key in the header.
No
connectViaThe Integration Runtime to use to connect to the data store. Learn more from Prerequisites section. If not specified, this property uses the default Azure Integration Runtime.No

For different authentication types, see the corresponding sections for details.

  • Basic authentication
  • Service Principal authentication
  • OAuth2 Client Credential authentication
  • System-assigned managed identity authentication
  • User-assigned managed identity authentication
  • Anonymous authentication

Use basic authentication

Set the authenticationType property to Basic. In addition to the generic properties that are described in the preceding section, specify the following properties:

PropertyDescriptionRequired
userNameThe user name to use to access the REST endpoint.Yes
passwordThe password for the user (the userName value). Mark this field as a SecureString type to store it securely in Data Factory. You can also reference a secret stored in Azure Key Vault.Yes

Example

{ "name": "RESTLinkedService", "properties": { "type": "RestService", "typeProperties": { "authenticationType": "Basic", "url" : "<REST endpoint>", "userName": "<user name>", "password": { "type": "SecureString", "value": "<password>" } }, "connectVia": { "referenceName": "<name of Integration Runtime>", "type": "IntegrationRuntimeReference" } }}

Use Service Principal authentication

Set the authenticationType property to AadServicePrincipal. In addition to the generic properties that are described in the preceding section, specify the following properties:

(Video) 32. Copy data from REST API which sends response in Pages using Azure data factory

PropertyDescriptionRequired
servicePrincipalIdSpecify the Azure Active Directory application's client ID.Yes
servicePrincipalKeySpecify the Azure Active Directory application's key. Mark this field as a SecureString to store it securely in Data Factory, or reference a secret stored in Azure Key Vault.Yes
tenantSpecify the tenant information (domain name or tenant ID) under which your application resides. Retrieve it by hovering the mouse in the top-right corner of the Azure portal.Yes
aadResourceIdSpecify the Microsoft Azure Active Directory (Azure AD) resource you are requesting for authorization, for example, https://management.core.windows.net.Yes
azureCloudTypeFor Service Principal authentication, specify the type of Azure cloud environment to which your Azure AD application is registered.
Allowed values are AzurePublic, AzureChina, AzureUsGovernment, and AzureGermany. By default, the data factory's cloud environment is used.
No

Example

{ "name": "RESTLinkedService", "properties": { "type": "RestService", "typeProperties": { "url": "<REST endpoint e.g. https://www.example.com/>", "authenticationType": "AadServicePrincipal", "servicePrincipalId": "<service principal id>", "servicePrincipalKey": { "value": "<service principal key>", "type": "SecureString" }, "tenant": "<tenant info, e.g. microsoft.onmicrosoft.com>", "aadResourceId": "<Azure AD resource URL e.g. https://management.core.windows.net>" }, "connectVia": { "referenceName": "<name of Integration Runtime>", "type": "IntegrationRuntimeReference" } }}

Use OAuth2 Client Credential authentication

Set the authenticationType property to OAuth2ClientCredential. In addition to the generic properties that are described in the preceding section, specify the following properties:

PropertyDescriptionRequired
tokenEndpointThe token endpoint of the authorization server to acquire the access token.Yes
clientIdThe client ID associated with your application.Yes
clientSecretThe client secret associated with your application. Mark this field as a SecureString type to store it securely in Data Factory. You can also reference a secret stored in Azure Key Vault.Yes
scopeThe scope of the access required. It describes what kind of access will be requested.No
resourceThe target service or resource to which the access will be requested.No

Example

{ "name": "RESTLinkedService", "properties": { "type": "RestService", "typeProperties": { "url": "<REST endpoint e.g. https://www.example.com/>", "enableServerCertificateValidation": true, "authenticationType": "OAuth2ClientCredential", "clientId": "<client ID>", "clientSecret": { "type": "SecureString", "value": "<client secret>" }, "tokenEndpoint": "<token endpoint>", "scope": "<scope>", "resource": "<resource>" } }}

Use system-assigned managed identity authentication

Set the authenticationType property to ManagedServiceIdentity. In addition to the generic properties that are described in the preceding section, specify the following properties:

PropertyDescriptionRequired
aadResourceIdSpecify the AAD resource you are requesting for authorization, for example, https://management.core.windows.net.Yes

Example

{ "name": "RESTLinkedService", "properties": { "type": "RestService", "typeProperties": { "url": "<REST endpoint e.g. https://www.example.com/>", "authenticationType": "ManagedServiceIdentity", "aadResourceId": "<AAD resource URL e.g. https://management.core.windows.net>" }, "connectVia": { "referenceName": "<name of Integration Runtime>", "type": "IntegrationRuntimeReference" } }}

Use user-assigned managed identity authentication

Set the authenticationType property to ManagedServiceIdentity. In addition to the generic properties that are described in the preceding section, specify the following properties:

PropertyDescriptionRequired
aadResourceIdSpecify the Azure AD resource you are requesting for authorization, for example, https://management.core.windows.net.Yes
credentialsSpecify the user-assigned managed identity as the credential object.Yes

Example

{ "name": "RESTLinkedService", "properties": { "type": "RestService", "typeProperties": { "url": "<REST endpoint e.g. https://www.example.com/>", "authenticationType": "ManagedServiceIdentity", "aadResourceId": "<Azure AD resource URL e.g. https://management.core.windows.net>", "credential": { "referenceName": "credential1", "type": "CredentialReference" } }, "connectVia": { "referenceName": "<name of Integration Runtime>", "type": "IntegrationRuntimeReference" } }}

Using authentication headers

In addition, you can configure request headers for authentication along with the built-in authentication types.

Example: Using API key authentication

{ "name": "RESTLinkedService", "properties": { "type": "RestService", "typeProperties": { "url": "<REST endpoint>", "authenticationType": "Anonymous", "authHeaders": { "x-api-key": { "type": "SecureString", "value": "<API key>" } } }, "connectVia": { "referenceName": "<name of Integration Runtime>", "type": "IntegrationRuntimeReference" } }}

Dataset properties

This section provides a list of properties that the REST dataset supports.

For a full list of sections and properties that are available for defining datasets, see Datasets and linked services.

To copy data from REST, the following properties are supported:

PropertyDescriptionRequired
typeThe type property of the dataset must be set to RestResource.Yes
relativeUrlA relative URL to the resource that contains the data. When this property isn't specified, only the URL that's specified in the linked service definition is used. The HTTP connector copies data from the combined URL: [URL specified in linked service]/[relative URL specified in dataset].No

If you were setting requestMethod, additionalHeaders, requestBody and paginationRules in dataset, it is still supported as-is, while you are suggested to use the new model in activity going forward.

Example:

{ "name": "RESTDataset", "properties": { "type": "RestResource", "typeProperties": { "relativeUrl": "<relative url>" }, "schema": [], "linkedServiceName": { "referenceName": "<REST linked service name>", "type": "LinkedServiceReference" } }}

Copy Activity properties

This section provides a list of properties supported by the REST source and sink.

For a full list of sections and properties that are available for defining activities, see Pipelines.

REST as source

The following properties are supported in the copy activity source section:

PropertyDescriptionRequired
typeThe type property of the copy activity source must be set to RestSource.Yes
requestMethodThe HTTP method. Allowed values are GET (default) and POST.No
additionalHeadersAdditional HTTP request headers.No
requestBodyThe body for the HTTP request.No
paginationRulesThe pagination rules to compose next page requests. Refer to pagination support section on details.No
httpRequestTimeoutThe timeout (the TimeSpan value) for the HTTP request to get a response. This value is the timeout to get a response, not the timeout to read response data. The default value is 00:01:40.No
requestIntervalThe time to wait before sending the request for next page. The default value is 00:00:01No

Note

REST connector ignores any "Accept" header specified in additionalHeaders. As REST connector only support response in JSON, it will auto generate a header of Accept: application/json.

Example 1: Using the Get method with pagination

"activities":[ { "name": "CopyFromREST", "type": "Copy", "inputs": [ { "referenceName": "<REST input dataset name>", "type": "DatasetReference" } ], "outputs": [ { "referenceName": "<output dataset name>", "type": "DatasetReference" } ], "typeProperties": { "source": { "type": "RestSource", "additionalHeaders": { "x-user-defined": "helloworld" }, "paginationRules": { "AbsoluteUrl": "$.paging.next" }, "httpRequestTimeout": "00:01:00" }, "sink": { "type": "<sink type>" } } }]

Example 2: Using the Post method

"activities":[ { "name": "CopyFromREST", "type": "Copy", "inputs": [ { "referenceName": "<REST input dataset name>", "type": "DatasetReference" } ], "outputs": [ { "referenceName": "<output dataset name>", "type": "DatasetReference" } ], "typeProperties": { "source": { "type": "RestSource", "requestMethod": "Post", "requestBody": "<body for POST REST request>", "httpRequestTimeout": "00:01:00" }, "sink": { "type": "<sink type>" } } }]

REST as sink

The following properties are supported in the copy activity sink section:

PropertyDescriptionRequired
typeThe type property of the copy activity sink must be set to RestSink.Yes
requestMethodThe HTTP method. Allowed values are POST (default), PUT, and PATCH.No
additionalHeadersAdditional HTTP request headers.No
httpRequestTimeoutThe timeout (the TimeSpan value) for the HTTP request to get a response. This value is the timeout to get a response, not the timeout to write the data. The default value is 00:01:40.No
requestIntervalThe interval time between different requests in millisecond. Request interval value should be a number between [10, 60000].No
httpCompressionTypeHTTP compression type to use while sending data with Optimal Compression Level. Allowed values are none and gzip.No
writeBatchSizeNumber of records to write to the REST sink per batch. The default value is 10000.No

REST connector as sink works with the REST APIs that accept JSON. The data will be sent in JSON with the following pattern. As needed, you can use the copy activity schema mapping to reshape the source data to conform to the expected payload by the REST API.

[ { <data object> }, { <data object> }, ...]

Example:

"activities":[ { "name": "CopyToREST", "type": "Copy", "inputs": [ { "referenceName": "<input dataset name>", "type": "DatasetReference" } ], "outputs": [ { "referenceName": "<REST output dataset name>", "type": "DatasetReference" } ], "typeProperties": { "source": { "type": "<source type>" }, "sink": { "type": "RestSink", "requestMethod": "POST", "httpRequestTimeout": "00:01:40", "requestInterval": 10, "writeBatchSize": 10000, "httpCompressionType": "none", }, } }]

Mapping data flow properties

REST is supported in data flows for both integration datasets and inline datasets.

Source transformation

PropertyDescriptionRequired
requestMethodThe HTTP method. Allowed values are GET and POST.Yes
relativeUrlA relative URL to the resource that contains the data. When this property isn't specified, only the URL that's specified in the linked service definition is used. The HTTP connector copies data from the combined URL: [URL specified in linked service]/[relative URL specified in dataset].No
additionalHeadersAdditional HTTP request headers.No
httpRequestTimeoutThe timeout (the TimeSpan value) for the HTTP request to get a response. This value is the timeout to get a response, not the timeout to write the data. The default value is 00:01:40.No
requestIntervalThe interval time between different requests in millisecond. Request interval value should be a number between [10, 60000].No
QueryParameters.request_query_parameter OR QueryParameters['request_query_parameter']"request_query_parameter" is user-defined, which references one query parameter name in the next HTTP request URL.No

Sink transformation

PropertyDescriptionRequired
additionalHeadersAdditional HTTP request headers.No
httpRequestTimeoutThe timeout (the TimeSpan value) for the HTTP request to get a response. This value is the timeout to get a response, not the timeout to write the data. The default value is 00:01:40.No
requestIntervalThe interval time between different requests in millisecond. Request interval value should be a number between [10, 60000].No
httpCompressionTypeHTTP compression type to use while sending data with Optimal Compression Level. Allowed values are none and gzip.No
writeBatchSizeNumber of records to write to the REST sink per batch. The default value is 10000.No

You can set the delete, insert, update, and upsert methods as well as the relative row data to send to the REST sink for CRUD operations.

Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse (6)

Sample data flow script

Notice the use of an alter row transformation prior to the sink to instruct ADF what type of action to take with your REST sink. I.e. insert, update, upsert, delete.

AlterRow1 sink(allowSchemaDrift: true,validateSchema: false,deletable:true,insertable:true,updateable:true,upsertable:true,rowRelativeUrl: 'periods',insertHttpMethod: 'PUT',deleteHttpMethod: 'DELETE',upsertHttpMethod: 'PUT',updateHttpMethod: 'PATCH',timeout: 30,requestFormat: ['type' -> 'json'],skipDuplicateMapInputs: true,skipDuplicateMapOutputs: true) ~> sink1

Pagination support

When copying data from REST APIs, normally, the REST API limits its response payload size of a single request under a reasonable number; while to return large amount of data, it splits the result into multiple pages and requires callers to send consecutive requests to get next page of the result. Usually, the request for one page is dynamic and composed by the information returned from the response of previous page.

This generic REST connector supports the following pagination patterns:

(Video) Azure Lab : Azure Data Factory V2 (ADFV2) : Copy data from Rest API to Data Lake Storage

  • Next request’s absolute or relative URL = property value in current response body
  • Next request’s absolute or relative URL = header value in current response headers
  • Next request’s query parameter = property value in current response body
  • Next request’s query parameter = header value in current response headers
  • Next request’s header = property value in current response body
  • Next request’s header = header value in current response headers

Pagination rules are defined as a dictionary in dataset, which contains one or more case-sensitive key-value pairs. The configuration will be used to generate the request starting from the second page. The connector will stop iterating when it gets HTTP status code 204 (No Content), or any of the JSONPath expressions in "paginationRules" returns null.

Supported keys in pagination rules:

KeyDescription
AbsoluteUrlIndicates the URL to issue the next request. It can be either absolute URL or relative URL.
QueryParameters.request_query_parameter OR QueryParameters['request_query_parameter']"request_query_parameter" is user-defined, which references one query parameter name in the next HTTP request URL.
Headers.request_header OR Headers['request_header']"request_header" is user-defined, which references one header name in the next HTTP request.
EndCondition:end_condition"end_condition" is user-defined, which indicates the condition that will end the pagination loop in the next HTTP request.
MaxRequestNumberIndicates the maximum pagination request number. Leave it as empty means no limit.
SupportRFC5988By default, this is set to true if no pagination rule is defined. You can disable this rule by setting supportRFC5988 to false or remove this property from script.

Supported values in pagination rules:

ValueDescription
Headers.response_header OR Headers['response_header']"response_header" is user-defined, which references one header name in the current HTTP response, the value of which will be used to issue next request.
A JSONPath expression starting with "$" (representing the root of the response body)The response body should contain only one JSON object. The JSONPath expression should return a single primitive value, which will be used to issue next request.

Note

The pagination rules in mapping data flows is different from it in copy activity in the following aspects:

  1. Range is not supported in mapping data flows.
  2. [''] is not supported in mapping data flows. Instead, use {} to escape special character. For example, body.{@odata.nextLink}, whose JSON node @odata.nextLink contains special character . .
  3. The end condition is supported in mapping data flows, but the condition syntax is different from it in copy activity. body is used to indicate the response body instead of $. header is used to indicate the response header instead of headers. Here are two examples showing this difference:
    • Example 1:
      Copy activity: "EndCondition:$.data": "Empty"
      Mapping data flows: "EndCondition:body.data": "Empty"
    • Example 2:
      Copy activity: "EndCondition:headers.complete": "Exist"
      Mapping data flows: "EndCondition:header.complete": "Exist"

Pagination rules examples

This section provides a list of examples for pagination rules settings.

Example 1: Variables in QueryParameters

This example provides the configuration steps to send multiple requests whose variables are in QueryParameters.

Multiple requests:

baseUrl/api/now/table/incident?sysparm_limit=1000&sysparm_offset=0,baseUrl/api/now/table/incident?sysparm_limit=1000&sysparm_offset=1000,...... baseUrl/api/now/table/incident?sysparm_limit=1000&sysparm_offset=10000

Step 1: Input sysparm_offset={offset} either in Base URL or Relative URL as shown in the following screenshots:

Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse (7)

or

Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse (8)

Step 2: Set Pagination rules as either option 1 or option 2:

  • Option1: "QueryParameters.{offset}" : "RANGE:0:10000:1000"

  • Option2: "AbsoluteUrl.{offset}" : "RANGE:0:10000:1000"

Example 2:Variables in AbsoluteUrl

This example provides the configuration steps to send multiple requests whose variables are in AbsoluteUrl.

Multiple requests:

BaseUrl/api/now/table/t1BaseUrl/api/now/table/t2...... BaseUrl/api/now/table/t100

Step 1: Input {id} either in Base URL in the linked service configuration page or Relative URL in the dataset connection pane.

Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse (9)

or

Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse (10)

Step 2: Set Pagination rules as "AbsoluteUrl.{id}" :"RANGE:1:100:1".

Example 3:Variables in Headers

This example provides the configuration steps to send multiple requests whose variables are in Headers.

Multiple requests:
RequestUrl: https://example/table
Request 1: Header(id->0)
Request 2: Header(id->10)
......
Request 100: Header(id->100)

Step 1: Input {id} in Additional headers.

Step 2: Set Pagination rules as "Headers.{id}" : "RARNGE:0:100:10".

Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse (11)

Example 4:Variables are in AbsoluteUrl/QueryParameters/Headers, the end variable is not pre-defined and the end condition is based on the response

This example provides configuration steps to send multiple requests whose variables are in AbsoluteUrl/QueryParameters/Headers but the end variable is not defined. For different responses, different end condition rule settings are shown in Example 4.1-4.6.

Multiple requests:

Request 1: baseUrl/api/now/table/incident?sysparm_limit=1000&sysparm_offset=0, Request 2: baseUrl/api/now/table/incident?sysparm_limit=1000&sysparm_offset=1000,Request 3: baseUrl/api/now/table/incident?sysparm_limit=1000&sysparm_offset=2000,...... 

Two responses encountered in this example:

(Video) Use Azure Data Factory to copy and transform data

Response 1:

{ Data: [ {key1: val1, key2: val2 }, {key1: val3, key2: val4 } ]}

Response 2:

{ Data: [ {key1: val5, key2: val6 }, {key1: val7, key2: val8 } ]}

Step 1: Set the range of Pagination rules as Example 1 and leave the end of range empty as "AbsoluteUrl.{offset}": "RANGE:0::1000".

Step 2: Set different end condition rules according to different last responses. See below examples:

  • Example 4.1: The pagination ends when the value of the specific node in response is empty

    The REST API returns the last response in the following structure:

    { Data: []}

    Set the end condition rule as "EndCondition:$.data": "Empty" to end the pagination when the value of the specific node in response is empty.

    Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse (12)

  • Example 4.2: The pagination ends when the value of the specific node in response does not exist

    The REST API returns the last response in the following structure:

    {}

    Set the end condition rule as "EndCondition:$.data": "NonExist" to end the pagination when the value of the specific node in response does not exist.

    Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse (13)

  • Example 4.3: The pagination ends when the value of the specific node in response exists

    The REST API returns the last response in the following structure:

    { Data: [ {key1: val991, key2: val992 }, {key1: val993, key2: val994 } ], Complete: true}

    Set the end condition rule as "EndCondition:$.Complete": "Exist" to end the pagination when the value of the specific node in response exists.

    Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse (14)

  • Example 4.4: The pagination ends when the value of the specific node in response is a user-defined const value

    The REST API returns the response in the following structure:

    { Data: [ {key1: val1, key2: val2 }, {key1: val3, key2: val4 } ], Complete: false}

    ......

    And the last response is in the following structure:

    { Data: [ {key1: val991, key2: val992 }, {key1: val993, key2: val994 } ], Complete: true}

    Set the end condition rule as "EndCondition:$.Complete": "Const:true" to end the pagination when the value of the specific node in response is a user-defined const value.

    Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse (15)

  • Example 4.5: The pagination ends when the value of the header key in response equals to user-defined const value

    The header keys in REST API responses are shown in the structure below:

    Response header 1: header(Complete->0)
    ......
    Last Response header: header(Complete->1)

    Set the end condition rule as "EndCondition:headers.Complete": "Const:1" to end the pagination when the value of the header key in response is equal to user-defined const value.

    Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse (16)

  • Example 4.6: The pagination ends when the key exists in the response header

    The header keys in REST API responses are shown in the structure below:

    Response header 1: header()
    ......
    Last Response header: header(CompleteTime->20220920)

    Set the end condition rule as "EndCondition:headers.CompleteTime": "Exist" to end the pagination when the key exists in the response header.

    (Video) Copy Data From An HTTP Endpoint By Using Azure Data Factory - ADF Tutorial 2021

    Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse (17)

Example 5:Set end condition to avoid endless requests when range rule is not defined

This example provides the configuration steps to send multiple requests when the range rule is not used. The end condition can be set refer to Example 4.1-4.6 to avoid endless requests. The REST API returns response in the following structure, in which case next page's URL is represented in paging.next.

{ "data": [ { "created_time": "2017-12-12T14:12:20+0000", "name": "album1", "id": "1809938745705498_1809939942372045" }, { "created_time": "2017-12-12T14:14:03+0000", "name": "album2", "id": "1809938745705498_1809941802371859" }, { "created_time": "2017-12-12T14:14:11+0000", "name": "album3", "id": "1809938745705498_1809941879038518" } ], "paging": { "cursors": { "after": "MTAxNTExOTQ1MjAwNzI5NDE=", "before": "NDMyNzQyODI3OTQw" }, "previous": "https://graph.facebook.com/me/albums?limit=25&before=NDMyNzQyODI3OTQw", "next": "https://graph.facebook.com/me/albums?limit=25&after=MTAxNTExOTQ1MjAwNzI5NDE=" }}...

The last response is:

{ "data": [], "paging": { "cursors": { "after": "MTAxNTExOTQ1MjAwNzI5NDE=", "before": "NDMyNzQyODI3OTQw" }, "previous": "https://graph.facebook.com/me/albums?limit=25&before=NDMyNzQyODI3OTQw", "next": "Same with Last Request URL" }}

Step 1: Set Pagination rules as "AbsoluteUrl": "$.paging.next".

Step 2: If next in the last response is always same with the last request URL and not empty, endless requests will be sent. The end condition can be used to avoid endless requests. Therefore, set the end condition rule refer to Example 4.1-4.6.

Example 6:Set the max request number to avoid endless request

Set MaxRequestNumber to avoid endless request as shown in the following screenshot:

Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse (18)

Example 7:The RFC 5988 pagination rule is supported by default

The backend will automatically get the next URL based on the RFC 5988 style links in the header.

Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse (19)

Tip

If you don't want to enable this default pagination rule, you can set supportRFC5988 to false or just delete it in the script.

Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse (20)

Example 8: The next request URL is from the response body when use pagination in mapping data flows

This example states how to set the pagination rule and the end condition rule in mapping data flows when the next request URL is from the response body.

The response schema is shown below:

Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse (21)

The pagination rules should be set as the following screenshot:

Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse (22)

By default, the pagination will stop when body .{@odata.nextLink} is null or empty.

But if the value of @odata.nextLink in the last response body is equal to the last request URL, then it will lead to the endless loop. To avoid this condition, define end condition rules.

  • If Value in the last response is Empty, then the end condition rule can be set as below:

    Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse (23)

  • If the value of the complete key in the response header equals to true indicates the end of pagination, then the end condition rule can be set as below:

    Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse (24)

Example 9: The response format is XML and the next request URL is from the response body when use pagination in mapping data flows

This example states how to set the pagination rule in mapping data flows when the response format is XML and the next request URL is from the response body. As shown in the following screenshot, the first URL is https://<user>.dfs.core.windows.net/bugfix/test/movie_1.xml

Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse (25)

The response schema is shown below:

Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse (26)

The pagination rule syntax is the same as in Example 8 and should be set as below in this example:

Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse (27)

Export JSON response as-is

You can use this REST connector to export REST API JSON response as-is to various file-based stores. To achieve such schema-agnostic copy, skip the "structure" (also called schema) section in dataset and schema mapping in copy activity.

(Video) 81. Copying REST API data to Azure SQL DB using Azure Data Factory || Parsing Collection Reference

Schema mapping

To copy data from REST endpoint to tabular sink, refer to schema mapping.

Next steps

For a list of data stores that Copy Activity supports as sources and sinks in Azure Data Factory, see Supported data stores and formats.

FAQs

Can Azure Data factory connect to REST API? ›

Azure Data Factory allows you to add your required REST API Url and the authentication method to connect REST API to Azure products such as Azure SQL Database & Blob Storage. You can easily enter your credentials for the SQL Database and choose the columns you want to replicate from the REST API data source.

How do I convert data into Azure synapse? ›

Create a pipeline with a Data Flow activity

Go to the Integrate tab. Select on the plus icon next to the pipelines header and select Pipeline. In the Properties settings page of the pipeline, enter TransformMovies for Name. Under Move and Transform in the Activities pane, drag Data flow onto the pipeline canvas.

Is copy activity one of the data transformation activities in Azure Data factory? ›

In Azure Data Factory and Synapse pipelines, you can use the Copy activity to copy data among data stores located on-premises and in the cloud. After you copy the data, you can use other activities to further transform and analyze it.

What is the difference between copy data and data flow in ADF? ›

When you use a copy data activity, you configure the source and sink settings inside the pipeline. When you use a data flow, you configure all the settings in the separate data flow interface, and then the pipeline works more as a wrapper.

What is source transformation in Azure Data factory? ›

Data flows are available both in Azure Data Factory and Azure Synapse Pipelines. This article applies to mapping data flows. If you are new to transformations, please refer to the introductory article Transform data using a mapping data flow. A source transformation configures your data source for the data flow.

How do I transfer data from REST API? ›

REST API POST Example

To send data to the REST API server, you must make an HTTP POST request and include the POST data in the request's body. You also need to provide the Content-Type: application/json and Content-Length request headers. Below is an example of a REST API POST request to a ReqBin API endpoint.

How does REST API transfer data? ›

Use the File transfer REST API to upload and download files using HTTP or HTTPS as the transport protocol and to list the contents of a directory. Uploads a file to any back-end application that supports REST APIs over HTTP or HTTPS protocol.

How do I call API from ADF pipeline? ›

To access an API, one needs to get authenticated with the provider of the API. There are various means of authentication, one of the most common ones is by using an API Key that is provided to the subscriber by the API Provider. Using the API key one can invoke the API by passing the key as a parameter in the API URL.

How do I copy data from one Azure database to another? ›

Copy using the Azure portal

To copy a database by using the Azure portal, open the page for your database, and then choose Copy to open the Create SQL Database - Copy database page. Fill in the values for the target server where you want to copy your database to.

What is the difference between Azure synapse and Azure Data Factory? ›

The main difference between the two services is that Synapse Analytics is an analytics service, and Data Factory is a hybrid data integration service that simplifies the ETL at scale.

Which are the various types of transformation that can be used in data flow pipeline? ›

Data Flow Transformations in Azure Data Factory
  • SPLIT. In Azure Data Factory, the split transform can be used to divide the data into two streams based on a criterion. ...
  • EXISTS. The Exists transform in Azure Data Factory is an equivalent of SQL EXISTS clause. ...
  • UNION. ...
  • LOOKUP. ...
  • DERIVED COLUMN. ...
  • SELECT. ...
  • AGGREGATE. ...
  • SURROGATE KEY.
Aug 30, 2021

What are the various data transfer options available to copy data Azure? ›

You copy data to the device and then ship it to Azure where the data is uploaded. The available options for this case are Data Box Disk, Data Box, Data Box Heavy, and Import/Export (use your own disks).

Which three tasks are involved in the data transformation process? ›

How Data Transformation Works. The goal of the data transformation process is to extract data from a source, convert it into a usable format, and deliver it to a destination. This entire process is known as ETL (Extract, Load, Transform).

What is ETL process in Azure Data Factory? ›

Extract, transform, and load (ETL) is a data pipeline used to collect data from various sources. It then transforms the data according to business rules, and it loads the data into a destination data store.

What are the different ways of copying and moving the data? ›

The simplest way to do this involves dragging the selection box. A more advanced way involves a formal cut or copy operation and then a paste operation in the new location.
...
  • Simple Copy and Move. ...
  • Cut, Copy and Paste. ...
  • Paste Special. ...
  • Cut and Paste Between Gnumeric and Other Applications.

What is the difference between copy data and move data? ›

Copying – make a duplicate of the selected file or folder and place it in another location. Moving – move the original files or folder from one place to another (change the destination). The move deletes the original file or folder, while copy creates a duplicate.

How can we incremental copy data in ADF? ›

Next steps
  1. Prepare the data store to store the watermark value.
  2. Create a data factory.
  3. Create linked services.
  4. Create source, sink, and watermark datasets.
  5. Create a pipeline.
  6. Run the pipeline.
  7. Monitor the pipeline run.
  8. Review results.
Sep 27, 2022

What are data transformation activities in Azure data Factory? ›

Mapping data flows are visually designed data transformations in Azure Data Factory and Azure Synapse. Data flows allow data engineers to develop graphical data transformation logic without writing code. The resulting data flows are executed as activities within pipelines that use scaled-out Spark clusters.

Is Azure data/factory ETL or ELT? ›

With Azure Data Factory, it's fast and easy to build code-free or code-centric ETL and ELT processes. In this scenario, learn how to create code-free pipelines within an intuitive visual environment.

Which Azure data Factory component orchestrates a transformation job? ›

ADF is used mainly to orchestrate the data copying between different relational and non-relational data sources, hosted in the cloud or locally in your datacenters. Also, ADF can be used for transforming the ingested data to meet your business requirements.

How do I import data from rest to data? ›

Go to Source Data tab and Add new source application and select REST API application type. If you have created already another similar REST API data source then you can export its definition and paste it in Import definition to create new REST API source application with the same parameters.

How to get JSON data from REST API? ›

To get JSON from a REST API endpoint, you must send an HTTP GET request to the REST API server and provide an Accept: application/json request header. The Accept: application/json header tells the REST API server that the API client expects to receive data in JSON format.

How do you fetch data from API? ›

Fetch API Syntax

First, you send a request to the desired URL using the fetch() method. Next, you handle the response with the . then() method. In this case, we're not doing anything with the code yet, but you could use this same code to parse HTML documents, send data over POST requests, and more.

What is the difference between REST API and RESTful API? ›

Put simply, there are no differences between REST and RESTful as far as APIs are concerned. REST is the set of constraints. RESTful refers to an API adhering to those constraints. It can be used in web services, applications, and software.

What is the difference between API and REST API? ›

An API, or application programming interface, is a set of rules that define how applications or devices can connect to and communicate with each other. A REST API is an API that conforms to the design principles of the REST, or representational state transfer architectural style.

How many ways we can send data to REST API? ›

REST API endpoints can pass data within their requests through 4 types of parameters: Header, Path, Query String, or in the Request Body.

How do I copy ADF pipeline from one ADF to another? ›

Once our new Azure data factory is created, then go to our old Azure data factory and go to the Author tab then click on the pipeline which we have to copy, then go to the right corner and click on the ellipsis button and then click on ''Export Template'' once you click on that it will create and download a zip file ...

How do I trigger my ADF pipeline manually? ›

Trigger the pipeline manually

Select Trigger on the toolbar, and then select Trigger Now. On the Pipeline Run page, select OK. Go to the Monitor tab on the left. You see a pipeline run that is triggered by a manual trigger.

How do I run multiple pipelines in Azure data Factory? ›

You can simply loop through multiple pipelines using combination of "Execute Pipeline activity" and "If Activity". The Execute Pipeline activity allows a Data Factory pipeline to invoke another pipeline. For false condition execute a different pipeline.

How to transfer data from one database to another database in SQL Server using query? ›

Import the data
  1. Open the destination database. ...
  2. Click Import the source data into a new table in the current database, and then click OK.
  3. In the Select Data Source dialog box, if the . ...
  4. Click OK to close the Select Data Source dialog box. ...
  5. Under Tables, click each table or view that you want to import, and then click OK.

How do you sync two SQL databases using Azure? ›

Create sync group
  1. Go to the Azure portal to find your database in SQL Database. ...
  2. Select the database you want to use as the hub database for Data Sync. ...
  3. On the SQL database menu for the selected database, select Sync to other databases.
  4. On the Sync to other databases page, select New Sync Group.
Jan 25, 2023

Why use Synapse over ADF? ›

Unless you are looking to use legacy features, use CI/CD that ADF have, or general orchestration, it's probably better to focus on Azure Synapse. It reduces deployed resources, permissions, and general cost of maintaining multiple resources.

Is Azure Synapse A ETL? ›

Basically, Azure Synapse completes the whole data integration and ETL process and is much more than a normal data warehouse since it includes further stages of the process giving the users the possibility to also create reports and visualizations.

Is Azure Synapse a ETL tool? ›

Azure Synapse uses ELT rather than ETL. ETL is a method of loading data in the pattern of Extract, Transform, and Load. Azure Synapse uses an Extract, Load, and Transform process since it does not require the resources for data transformation prior to loading. This is due to its scalable and distributed architecture.

What are the two types of data transformation? ›

Data transformation may be constructive (adding, copying, and replicating data), destructive (deleting fields and records), aesthetic (standardizing salutations or street names), or structural (renaming, moving, and combining columns in a database).

What are the three types of data transformations that can be used when a distribution is skewed to the right? ›

If tail is on the right as that of the second image in the figure, it is right skewed data. It is also called positive skewed data. Common transformations of this data include square root, cube root, and log.

How do I convert string to int in Azure Data Factory? ›

If you are looking for converting string type value into integer type inside Pipeline expressions then use int() function.

Is Azure Data Factory A ETL tool? ›

With Azure Data Factory, it's fast and easy to build code-free or code-centric ETL and ELT processes.

What are the three most common transformations in ETL processes? ›

ETL transformation types

Basic transformations: Cleaning: Mapping NULL to 0 or "Male" to "M" and "Female" to "F," date format consistency, etc. Deduplication: Identifying and removing duplicate records. Format revision: Character set conversion, unit of measurement conversion, date/time conversion, etc.

What are the 5 stages of transforming data into information? ›

Data processing cycle
  • Data collection.
  • Data input.
  • Data processing.
  • Data output.

How can we convert a string to a map in Azure data Synapse data flow? ›

We can use Parse transformation in mapping data flow to achieve that.
  1. Select Single document as Document form. Select Column_1 , enter Column_1 as column name, enter (name as string,age as integer) as Output column type:
  2. Data preview is as follows:
Apr 27, 2021

Which method is used to converts data to string format? ›

The Integer.toString() method converts int to String. The toString() is the static method of Integer class.

Which method is used to convert data into strings? ›

Converting Values to Strings

Values can be explicitly converted to strings by calling either String() or n. toString() .

What are data transformation activities in Azure Data Factory? ›

Mapping data flows are visually designed data transformations in Azure Data Factory and Azure Synapse. Data flows allow data engineers to develop graphical data transformation logic without writing code. The resulting data flows are executed as activities within pipelines that use scaled-out Spark clusters.

What is the difference between Azure Data Factory and Synapse? ›

The main difference between the two services is that Synapse Analytics is an analytics service, and Data Factory is a hybrid data integration service that simplifies the ETL at scale.

What are the three types of activities that Azure Data Factory supports? ›

Data Factory supports three types of activities: data movement activities, data transformation activities, and control activities.

Does Azure data/factory replace SSIS? ›

ADF is not just “SSIS in the cloud”. While these are both ETL tools with some amount of shared functionality, they are completely separate products, each with its own strengths and weaknesses. ADF is not a replacement for SSIS.

Is Azure synapse Azure data warehouse? ›

Azure Synapse uses Azure Data Lake Storage Gen2 as a data warehouse and a consistent data model that incorporates administration, monitoring and metadata management sections.

Videos

1. 40. Working with Token based REST API in Azure Data Factory
(WafaStudies)
2. How to Read the data from Rest API and write to Blob Storage in ADF By using REST Connector in ADF
(TechBrothersIT)
3. Azure Data Factory - Parse JSON file using Copy Activity
(All About BI !)
4. #26. Copy data from REST API which sends the response in Pages| Azure Data Factory Tutorial |
(GeekCoders)
5. Data Factory: API data to Azure Table storage
(javiBI)
6. Pulling O365 data using Azure Data factory Demo
(Lalit Mohan@Microsoft)
Top Articles
Latest Posts
Article information

Author: Domingo Moore

Last Updated: 06/05/2023

Views: 6225

Rating: 4.2 / 5 (53 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Domingo Moore

Birthday: 1997-05-20

Address: 6485 Kohler Route, Antonioton, VT 77375-0299

Phone: +3213869077934

Job: Sales Analyst

Hobby: Kayaking, Roller skating, Cabaret, Rugby, Homebrewing, Creative writing, amateur radio

Introduction: My name is Domingo Moore, I am a attractive, gorgeous, funny, jolly, spotless, nice, fantastic person who loves writing and wants to share my knowledge and understanding with you.