Submit a ticket My tickets
Welcome
Login  Sign up

Talend Open Studio Connector

This article describes how Talend connector for DataGalaxy works.

This connector is available in the following modes:

Desktop mode ✅SaaS Online mode ❌

Connector Overview

Talend Connector allows importing the processes of a Talend Open Studio project from the .item and .properties files in the process folder of a workspace.  

The extracted objects and their correspondences are detailed in the following table:

Talend objectDataGalaxy objectComments
Project (workspace)Data FlowThe project (workspace) is imported as a root flow in the Processing module and as a relational source in the Dictionary module to store the unmapped detected tables.
FolderData FlowThe tree of folders and subfolders is represented using the Data Flow objects.
JobData ProcessingDeleted jobs (in the Recycle bin) are not extracted.
ComponentData Processing ItemLinks of type "Row" and "Iterate" are used to manipulate data flows: they are aggregated and represented by a processing unit 
  • A flow is the set of components linked by "Row" and "Iterate" type relationships (FLOW and ITERATE codes) with potentially several entry points and several exit points

  • The name of the processing unit corresponds to that of the last component of the flow (there will therefore be as many processing units as there are outputs in the flow)

  • The description of the processing unit is an aggregation of the names of the flow components

Input components of a jobData Processing InputIn order to perform an API import, the Talend connector will import the table and column definitions detected in the components. The connector allows defining the correspondences between the object definitions identified in the Talend flows and the objects present in DataGalaxy using the correspondence rules stored in a mapping file. This file is automatically generated at the time of your first import and can be modified and reused for future imports.
Output components of a jobData Processing Output

Step 1: Installation

  • Download DataGalaxy connector from the portal (see here)
  • Extract the connector archive in the directory of your choice
  • Download the Talend plug-in from the portal and copy it into the /lib directory of the connector

Step 2: Run the Talend connector

  • After starting the connector, access Processing type connectors: 
  • If it has been correctly installed, the Talend plug-in appears in the list
  • The following information is requested:  
ParameterMandatoryDescription
Workspace directoryYesPath to Talend workspace directory
Context nameYesAllows you to select the context to be applied to the imported processes
Mapping fileNoPath to a mapping file: file allowing to define string substitutions for input and output elements of processing items.

A mapping file can be generated at the import summary stage during a first import.

Test button allows you to check that all the files necessary for the import are present in the selected workspace folder.

Technical information

At the summary stage of the import, it is therefore possible to generate a mapping file: 

The file will be generated in the /out directory of the connector. It contains all the tables that the connector has detected in the Talend job components. A detection is of the following form:

1.beginsWith=\\REPORTING_INTERNE@talend\\REPORTING_INTERNE\\dbo\\deal_company\\
1.replace.path=\\REPORTING_INTERNE@talend\\REPORTING_INTERNE\\dbo\\deal_company
1.replace.type=\\Relational\\Model\\Model\\Table
If the information associated with the tables could not be extracted from the Talend component, the process IDs of the Talend job will be used to create a Model\Table tree in DataGalaxy.
  • The beginsWith attribute corresponds to the path detected by the connector and will allow a replacement to be made, so it must not be modified for the substitution to work.
  • The replace.path and replace.type attributes allow you to map to an existing Dictionary table. So you can modify this part to make the link with the Dictionary objects already existing in DataGalaxy. It is also possible to ignore a mapping that allows you not to create a table by default. To do this, you have to set the two attributes to empty (see example below). 
The type must be provided in English for the mapping to work.

Example

In this paragraph, we will detail an example to illustrate the use of the mapping file.

We have the job hubspot_owners in our workspace "REPORTING_INTERNE" :

This job loads the "husbspot_owner" table in the REPORTING_INTERNE database from a Hubspot API call.

After a first run of the Talend connector by generating a mapping file, we get the following mapping:

1.beginsWith=\\REPORTING_INTERNE@talend\\REPORTING_INTERNE\\dbo\\hubspot_owner\\
1.replace.path=\\REPORTING_INTERNE@talend\\REPORTING_INTERNE\\dbo\\hubspot_owner
1.replace.type=\\Relational\\Model\\Model\\Table
2.beginsWith=\\REPORTING_INTERNE@talend\\default@talend\\hubspot_owners\\tExtractJSONFields_1\\
2.replace.path=\\REPORTING_INTERNE@talend\\default@talend\\hubspot_owners\\tExtractJSONFields_1
2.replace.type=\\Relational\\Model\\Model\\Table
3.beginsWith=\\REPORTING_INTERNE@talend\\default@talend\\hubspot_owners\\tExtractJSONFields_3\\
3.replace.path=\\REPORTING_INTERNE@talend\\default@talend\\hubspot_owners\\tExtractJSONFields_3
3.replace.type=\\Relational\\Model\\Model\\Table
4.beginsWith=\\REPORTING_INTERNE@talend\\default@talend\\hubspot_owners\\tFilterRow_1\\
4.replace.path=\\REPORTING_INTERNE@talend\\default@talend\\hubspot_owners\\tFilterRow_1
4.replace.type=\\Relational\\Model\\Model\\Table
5.beginsWith=\\REPORTING_INTERNE@talend\\default@talend\\hubspot_owners\\tJavaRow_1\\
5.replace.path=\\REPORTING_INTERNE@talend\\default@talend\\hubspot_owners\\tJavaRow_1
5.replace.type=\\Relational\\Model\\Model\\Table
6.beginsWith=\\REPORTING_INTERNE@talend\\default@talend\\hubspot_owners\\tJavaRow_2\\
6.replace.path=\\REPORTING_INTERNE@talend\\default@talend\\hubspot_owners\\tJavaRow_2
6.replace.type=\\Relational\\Model\\Model\\Table
7.beginsWith=\\REPORTING_INTERNE@talend\\default@talend\\hubspot_owners\\tJava_1\\
7.replace.path=\\REPORTING_INTERNE@talend\\default@talend\\hubspot_owners\\tJava_1
7.replace.type=\\Relational\\Model\\Model\\Table
8.beginsWith=\\REPORTING_INTERNE@talend\\default@talend\\hubspot_owners\\tRESTClient_1\\
8.replace.path=\\REPORTING_INTERNE@talend\\default@talend\\hubspot_owners\\tRESTClient_1
8.replace.type=\\Relational\\Model\\Model\\Table

Following the import, in the DataGalaxy platform, a source has been created in the Dictionary module in order to have the lineage automatically:

All the tables created are those detected by the connector and listed in the mapping file.

In addition, the flow is described in the processing unit :

In our case, we already have an existing REPORTING_INTERNE relational source in the Dictionary module, so we want to map our job to it. We also have the API route definitions from the Hubspot tool in a NoSQL source:

We make the following changes to the mapping file: 

  • Mapping the hubspot_owner source table from the REPORTING_INTERNE source
  • Mapping of the API response /crm/v3/owners/ from the Hubspot API source
  • Ignore the other tables of the technical steps of the Talend job
#Thu Oct 20 10:45:41 CEST 2022
1.beginsWith=\\REPORTING_INTERNE@talend\\REPORTING_INTERNE\\dbo\\hubspot_owner\\
1.replace.path=\\REPORTING_INTERNE\\dbo\\hubspot_owner
1.replace.type=\\Relational\\Model\\Table
2.beginsWith=\\REPORTING_INTERNE@talend\\default@talend\\hubspot_owners\\tExtractJSONFields_1\\
2.replace.path=
2.replace.type=
3.beginsWith=\\REPORTING_INTERNE@talend\\default@talend\\hubspot_owners\\tExtractJSONFields_3\\
3.replace.path=\\API Hubspot\\CRM Owners\\/crm/v3/owners/\\get\\responses\\200\\results
3.replace.type=\\NoSql\\Directory\\Directory\\Directory\\File\\SubStructure\\SubStructure
4.beginsWith=\\REPORTING_INTERNE@talend\\default@talend\\hubspot_owners\\tFilterRow_1\\
4.replace.path=
4.replace.type=
5.beginsWith=\\REPORTING_INTERNE@talend\\default@talend\\hubspot_owners\\tJavaRow_1\\
5.replace.path=
5.replace.type=
6.beginsWith=\\REPORTING_INTERNE@talend\\default@talend\\hubspot_owners\\tJavaRow_2\\
6.replace.path=
6.replace.type=
7.beginsWith=\\REPORTING_INTERNE@talend\\default@talend\\hubspot_owners\\tJava_1\\
7.replace.path=
7.replace.type=
8.beginsWith=\\REPORTING_INTERNE@talend\\default@talend\\hubspot_owners\\tRESTClient_1\\
8.replace.path=
8.replace.type=
Nous sauvegardons le fichier pour le mettre dans notre connexion Talend et nous relançons le connecteur pour mettre à jour les liens. Après import, nous obtenons bien les liens attendus : 

Releases

DatePlugin
Version
DataGalaxy
release
Desktop Connector
version (minimum)
Description
23/08/20243.0.1v3.69.05.2.3Updated the logger to show more information when using verbose mode  
06/08/20243.0.0v3.65.0
5.0.5Migrated from java 11 to java 17 + CVE fixes


Did you find it helpful? Yes No

Send feedback
Sorry we couldn't be helpful. Help us improve this article with your feedback.