Submit a ticket My tickets
Welcome
Login  Sign up

Azure Data Lake Services Gen2 (ADLS Gen2) Connector

This article describes how to use DataGalaxy Azure Data Lake Services Gen2 (ADLS Gen2) Connector

This connector is available in the following modes:

Desktop mode ✅SaaS Online mode ✅

Connector scope

Azure Data Lake Services Gen2 (ADLS Gen2) Connector allows you to import the following metadata from an Azure Data Lake Gen2 :

  • The set of directories in the datalake
  • All the files present in the datalake
  • The fields present in the CSV files

The recovered objects and their correspondences in DataGalaxy are detailed in the following table:

ADSLGen2 Object
DataGalaxy Object
Comments
DirectoryDirectory (Conteneur)
FileFile (Structure)
FieldFieldThe definition of the columns is imported if the processed file is a CSV file (separator ";")

Configuration of a connection

The ADLS Gen2 connector uses the Azure Data Lake Store REST APIs (https://docs.microsoft.com/en-us/rest/api/storageservices/data-lake-storage-gen2) and requires the configuration of a service account with the following rights:

  • Authorized API (to be defined at application registration): Azure Storage (user_impersonation )
  • Role assignment (to be defined at the storage account level): Storage Blob Data Reader

You can optionally set additional restrictions using ACLs to limit the resources the service account will have access to.

The following information is required to set up a connection:

Parameter
Mandatory
Description
Storage accountYesStorage account name
Tenant IdYesAzure tenant identifier
Client IdYesAzure Client Service Account ID
Client SecretYesClient Secret
Container nameYesFilesystem container name
Custom endpointNoCustom endpoint (default values are dfs.core.windows.net in hierarchical mode, blob.core.windows.net in blob mode)
API Endpoint typeNo3 types, Auto, Hierarchical and Blob. In Auto mode Hierarchical and Blob mode will be tried one after another.
PathNoRoot path to navigate
Mask patternsNoMasks allow you to define strategies for grouping and filtering folders and files according to naming patterns. Example: /datasource/{YYYYMMDD}/file_{YYYYMM}_{zz}.csv

Masks must be absolute paths from the root and each character is important, so it may be necessary to define multiple masks to cover all your cases.

More information about this setting is available when running the connector.

Execution of the connector

To create a connection via the Online connector, the entry points are as follow: 

  • From the Import button of the "Shortcuts" widget on the home screen of a client space or workspace 

  • From the Import button of one of the modules when it is empty

  • From the Import button in the contextual menu of one of the modules, on the right side of the filtered views

  • From the Add a connection button in the Connector tab available in the workspace setup screen

You can optionally filter (by module, connector type or by using the search bar), then click on the desired technology: 

You then need to complete the login form using the login information described above to perform an import. For more details on the steps involved in running the Online connector, you can consult the following article: [HowTo] Running the Online Connector

This technology is also available via the Desktop Connector, you can find more information on the procedure here: [How to] How to use the connector.

Releases

DatePlugin
Version
DataGalaxy
release
Desktop Connector
version (minimum)
Description
19/12/20244.1.0
5.3.6Addition of the possibility to set a custom endpoint
23/08/20244.0.2v3.69.05.2.3Updated the logger to show more information when using verbose mode  
14/08/20244.0.1v3.67.05.0.4CVE fixes
30/07/20244.0.0v3.63.0
5.0.4Migrated from java 11 to java 17 + CVE fixes

Did you find it helpful? Yes No

Send feedback
Sorry we couldn't be helpful. Help us improve this article with your feedback.