Submit a ticket My tickets
Welcome
Login  Sign up

The URN mode

Connectors propose an option to use the URN mode for importing objects. This article explains what is this mode and what are its pros and limits. A FAQ is provided, if you still miss information after reading this article please contact the support team and we'll improve the page.

This documentation takes examples of URN usage by the connectors, which is the main usage. The connectors rely on DataGalaxy API, so all that is described here is applicable to URN APIs which can be used for custom developments around the platform.

What is a URN?

A URN, for Uniform Resource Name, is an address which allows to uniquely identify an object from a data system. This address is built following a syntax defined by DataGalaxy, but which is totally independent from the DataGalaxy platform or the way these objects are organized and represented in DataGalaxy. For information, the URN syntax for each technology is described here.

The elements which compose the URN of an object (called "segments") only come from the data system itself and are known by every other data system interacting with this object. This allows each of these systems to build the URN of this object: the URN becomes a unique address whatever the system which manipulates it.

As an example, here is a URN of a Snowflake view:

urn:snowflake-1:xx12345.eu-central-1:SNOW_DB:CUSTOMER360:V_CUSTOMER@view

The segments which compose the URN of this view are:

  • xx12345.eu-central-1: the name of the Snowflake account
  • SNOW_DB: the name of the database
  • CUSTOMER360: the name of the schema
  • V_CUSTOMER: the name of the view
  • @view: the type of object in Snowflake.

Please notice than the URN of an object can contain the URN of another object, when these objects are linked with a parent link in the data system. So the view contains the URN of the schema:

urn:snowflake-1:xx12345.eu-central-1:SNOW_DB:CUSTOMER360

which itself contains the URN of the database:

urn:snowflake-1:xx12345.eu-central-1:SNOW_DB

which itself contains the URN of the Snowflake account:

urn:snowflake-1:xx12345.eu-central-1

Every system which needs to interact with this view must know all these elements (maybe not the type of the object, which is not always necessary). For instance, to build a Power BI dataset which loads its data from this view, it is necessary to configure a Snowflake connection in Power BI and provide among other information the name of the account, of the database, of the schema and of the view. Once this Power BI dataset created, Power BI knows all the elements of the URN of the Snowflake view (in this example, Power BI even knows that the object is a view).

Let's imagine a tool A which would discuss with Snowflake, and another tool B connected to Power BI: as both would be able to create the URN of this view, they would recognize it thanks to its unique address. In DataGalaxy, these tools are the connectors.

The benefit of URN mode for the connectors

How it works

Historically, DataGalaxy connectors were “siloed”: a connector for one technology would only bring up objects from that technology in DataGalaxy. With URN mode, connectors can now take advantage of all metadata contained in data systems, including objects from other technologies with which these systems interact. Connectors complement each other to give DataGalaxy a complete view of the data landscape and the interconnections between objects, including lineage.

To continue the previous example, by adding the Sifflet observability platform, on which we would have created monitors to track the quality of Snowflake data:  

  • The Power BI connector brings up the dataset, as well as the Snowflake V_CUSTOMER view from which it loads its data, linking the two objects by a lineage link,
  • The Snowflake connector brings up the Snowflake objects, completing the object file of the V_CUSTOMER view with its description, and creating all the other Snowflake objects that Power BI doesn't know about,
  • The Sifflet connector does not retrieve any Sifflet technology objects, but does retrieve data quality information on all Snowflake objects in the SNOW_DB database.

We therefore have three connectors that have contributed to the construction of the V_CUSTOMER view object sheet.

Changes compared with Standard (“non-URN”) mode

In Standard mode, connectors create objects by specifying their path and typepath in the DataGalaxy platform. In addition to retrieving information from the source system, connectors in Standard mode therefore have the following responsibilities:

  • Defining which DataGalaxy metamodel object type is used to represent a data system object 
  • Create missing technologies, if required
  • Arrange objects in DataGalaxy.

This is why, for a connection in Standard mode, it is necessary to configure a root object for each module, in which the connector will store objects imported from the source data system. As only this connection knows about these root objects, interactions between connectors are severely limited. What's more, since the connector always stores the objects in the same place, it's impossible to store them in any other way within the platform.

In URN mode, connectors no longer have these three responsibilities. They simply list the objects and metadata found in the data source system and send them to the platform in the form of a list of URNs with attributes and links. 

The DataGalaxy platform thus takes over the three previous responsibilities, which has several impacts:

  • It is no longer necessary to configure root objects on connections.
  • The platform always arranges objects according to the same hierarchy.
    A little more explanation on this point:
    In Standard mode, certain connector configurations could influence the arrangement of objects in the platform: for example, the Snowflake connector in “Import unitary” configuration creates the database at the root of the dictionary and the schemas as children at level 1, whereas in “Import several databases” configuration the databases were at level 1 and the schemas at level 2.
    In URN mode, whatever the connector configuration, the hierarchy is the same. For Snowflake, for example, the Snowflake account will be created at root, databases at level 1 and schemas at level 2. 
  • On import, the platform will search for existing objects using their URN only, regardless of their location in DataGalaxy. It is therefore possible to move objects within DataGalaxy without impacting future connector executions. So, compared to the previous point, if the organization proposed automatically doesn't suit you, you'll have options to adjust it according to your needs.
  • Connectors can retrieve objects from multiple technologies.

URN mode and object organization in the platform

Differences in terms of access rights between URN mode and Bulktree mode 

To perform an import in URN mode you must have Admin access on the targeted workspace, even when you simply wish to update an existing source.

The reason for this is that in URN mode the connector might have to create a new source, coming from a linked external technology for example, in addition to updating the targeted source. We impose the Admin access for your imports so that the URN mode can use 100% of its capacities.

Logic for organizing objects in the platform

When connectors send a list of URNs to the platform, the latter automatically arranges the objects according to a few rules:

  • If the object is recognized (an object with this URN already exists in the platform), then it is updated: its attributes are updated and, depending on the technology, the object can also be moved in DataGalaxy if the connector is able to detect that it has been moved in the data system (which is another new feature).
  • If the object is not recognized and has to be created, the platform will search for its nearest parent: 
    • If a parent is found in the platform, then the object will be attached to it (with any intermediate parents to be created).
    • Otherwise, the object (and any intermediate parents) will be created at the root of the module concerned.

How to reorganize imported objects

One of the advantages of URN mode for connectors is the ability to move objects within the platform without impacting the execution of the connectors concerned. This gives you the flexibility to organize your objects. This can be particularly useful in the Dictionary module, where DataGalaxy user permissions are managed at the level of each root object.

Let's look at a few examples.

Dividing children into several parents

It is possible to distribute several children from the same root into several different roots. Simply create other root objects and move the children you wish to place there. If a new child is created, it will always be automatically created under the initial root. 

For example, if you run the Snowflake connector in “Multiple databases” configuration, these will be created in the Dictionary under a common root object that represents the Snowflake account and carries its URN. 

If you need to distribute these databases across several root objects, for example to give different permissions to your users, you can create other root objects and move each Snowflake database under the root object that suits you best.

Group children under a common parent

For this use case, the operation is the same, but the objects will move down one level in the object hierarchy.

In the Usage and Processing modules, the root objects created by the connectors will be App and DataFlow respectively. In the DataGalaxy metamodel, objects of these types can be nested within other objects of the same type, making it easy to create a specific object hierarchy. When an object cluster has been created by a connector with an object at the root of one of these modules, it is possible to move the entire object cluster elsewhere in the module without impacting the connector. This is also possible for the Dictionary module, but there are a few limitations to be aware of, see below.

For example, after launching the Databricks connector with the creation of Notebooks and Workflows, we obtain an object cluster in the Processing module, with the Databricks instance at the root. If we have several Databricks environments, they will all be at the root of the Processing module. 

If you wish to group these objects in a common root object, simply create this object (DataFlow type) and move each Databricks instance under this object. Connectors in URN mode will continue to update each objects tree.

Move children at root level

Conversely, it is also possible to root objects that are by default created under a common root object. The same limitations apply to the Dictionary module, as described below. For other modules, simply move the child objects to the root of the module, then delete the empty root object once all the children have been moved up. Again, if a new object is imported and can't find its parent in the existing objects, the initial root will be recreated and the object will be stored underneath, so you'll need to manually move this new cluster to the root if this is the desired organization.

Let's take Power BI as an example: by default, the connector creates the Power BI platform at the root of the Usages module, and the Power BI workspaces as its children. Both objects are of type App. If you wish to represent the Power BI Workspaces at the root of the Usages module, you can move them all to the root and then delete the initial root object, which has become empty. If a new Power BI Workspace is created at a later date, the root will be recreated and you will have to repeat the operation for this new object.

The special case of the Dictionary module

In the Dictionary module, the particularity is that there is currently no root object type enabling objects to be easily nested within each other, as is the case with the DataFlow objects in the Treatments module and the App objects in the Uses module. This is a known limitation of our metamodel, and many of you have asked us for greater flexibility in this area, we're working on it.

Until this evolves, it's still possible to organize objects and move them in the hierarchy, but this requires more manipulation. The main problem arises when you wish to change the hierarchical level at which objects are represented, as this requires you to change the object type, and this is not permitted by the move operation. To do this, you'll need to recreate the first levels of the hierarchy you want, either manually or using csv export/import, by placing the URNs of the corresponding objects in the right place, so that the connectors can link to them.

Let's take Google BigQuery as an example. Unlike most of the platforms we've discussed in our examples, BigQuery is a SaaS service that doesn't provide a relevant identifier for a root object, as is the case with a Databricks instance or a Snowflake account. The highest-level objects in the BigQuery hierarchy that do carry identifiers are Projects. At the time of writing, BigQuery Projects are therefore created by the connector in URN mode at the root of the Dictionary module.

If you wish to group the Dictionary's BigQuery Projects under a common root, here's how to proceed:

  • Create the new root object as desired, of type Source NoSql (like the other BigQuery sources) and Google BigQuery technology.
  • Under this root object, create the Projects, of type Directory. This can be done manually or by export/import csv with file manipulation to reflect the change in hierarchy and type. Make sure that each Project has its URN as imported by the connector.
  • Once the new hierarchy has been created, the old Projects objects at the root of the module can be deleted.
  • The connector can continue to update the content of each Project using its URN.
  • If a new Project is brought up by the connector, it will be necessary to repeat the operation for this Project.

We are aware that these limitations and manual operations are not ideal. That's why we're working on the evolution of the Dictionary metamodel, and that's also why URN mode remains optional for the time being, because even though it can bring many advantages, in some cases the solution needs to be improved. If you are affected by these limitations, don't hesitate to speak to your Account Manager, who will be able to raise the matter with the Product team. 

FAQ

Does URN mean full automated lineage?

It's a bit more complex. The URN is a convention for associating a unique address to an object from a data system: it's a convention. Thanks to this convention, the connectors can recognize objects across systems, so it definitely make it easier to automate lineage. But the URN mode doesn't bring lineage by itself, there is no magic: the connectors still have to parse the metadata from the data systems, recognize objects and create links between them to create lineage. Now that the connectors can leverage this tool, we'll continue to improve the connectors and bring more and more automated lineage to DataGalaxy.

How to migrate from Standard mode to URN mode?

Depending on connectors, the migration may require some changes in the way the objects are organized in the platform. If you already have objects in DataGalaxy imported from the Standard mode of the connectors, our advice is first to carefully read the documentation page of the corresponding connector and then to try the URN mode in a non-production workspace.

For which connectors is the URN mode available?

The availability of the URN mode is displayed on the documentation page of every connector which support it. At the date of writing, 6 connectors support this mode:

Is the URN mode compatible with CSV export/import?

The URN is an attribute like others and can be imported or exported from the platform in CSV. However, it's not possible to use the URN as identifier to update objects in CSV, the identifier used in CSV mode is still path/typepath. That's why the CSV export mode of the Desktop connector is disabled in URN mode, because it would generate a file you'll not be able to import in the platform.

Are there any known limitations regarding the URN mode?

A few features of connectors are not supported in this mode yet. Here is the list of known limitations:

  • In Snowflake, the tags synchronization is not implemented in URN mode yet. This will be implemented soon.

Is it normal for connectors in URN mode to take longer than in Standard mode?

Yes, as explained in this article the additional responsibilities on platform's side takes time: find if the object already exists with its URN, organize the new objects to be created by looking for the first parent... So the duration of the connectors is impacted.

How to enable the URN mode in the Desktop connector's CLI?

The import mode is part of the parameters stored in the .properties file of the connection. If you configure your profile in graphical mode and save it, you can automatically have the right parameter for enabling URN mode. If you want to manually edit the .properties file, the parameter is data-structure, which should switch from TREE (default mode) to URN to activate URN mode.

Did you find it helpful? Yes No

Send feedback
Sorry we couldn't be helpful. Help us improve this article with your feedback.