Submit a ticket My tickets
Welcome
Login  Sign up

Running the connector using a Docker image

Note: in order to follow this documentation, some knowledge about Docker and building images is necessary.

The Desktop connector can be run via a command line using a Docker image.

The configuration needed to run the image will be the same as for running the Desktop connector. This is detailed in the following page: Needed configuration. Obviously, you will also need a Docker client or an orchestrator of your choice.

The Desktop connector is made in Java. Hence our choice here of using a Docker base image providing a Java runtime compatible with the Desktop connector prerequisites.

On this base image, 4 types of resources should be added:

  • The Desktop connector folder from the unzipped zip archive
  • The connector plugins according to your technologies
  • The connections configuration files (.properties files that are created by the Desktop connector GUI mode in the connection/ directory when saving a connection profile)
  • The file containing your DataGalaxy API token (except if you prefer to pass the value as an argument at runtime. We will be opting for this option in the example).

Regarding the Desktop connector and its plugins, you can download them from your DataGalaxy platform, or programmatically by following this documentation (especially if you want to automate the construction of your images on a regular basis in order to keep them up to date).

Building the connector-cli Docker image

As with any Docker image, you can include the necessary files when building the image so that it is self-contained, or you can use mount points or volumes. Depending on your context and your constraints, its up to you to choose what to embed in the image. In this documentation, we choose to embed the connector and its plugins (in this example: Databricks) in the image, and use mount points for configuration files.

As we already embed the Temurin open source JRE in the connector zip, we will also base our connector-cli Docker image on the one provided by Temurin. 

In this example, we build our Docker image under a Linux environment. We place ourselves in a new working directory created for this purpose.

Note: this document describes a functional way to obtain a Docker image for running the connector. You are free to adapt the scripts according to your needs and constraints.

Environment variables

Let us start by defining some environment variables that will be useful for the next steps:

  • Our DataGalaxy API token
  • The password of our data system (here Databricks)
  • The URL of the API of our DataGalaxy environment (API guide)
  • The name of our DataGalaxy Workspace to which we want to push the metadata to.
$ export DG_TOKEN=ey...
$ export DATABRICKS_PWD=xy...
$ export DG_API_URL=https://myinstance.api.datagalaxy.com/v2
$ export WS="MyWorkspace"

Preparing the necessary files

We will need the following files: 

  • The .properties files for configuring connections to our systems (here Databricks): 
    • For the example, this is represented by the vi command. However, on your end, you will rather copy these files from the connection/ folder of a Desktop connector once you have saved connection profiles in GUI mode
  • The .jar runtime files of the Desktop connector in CLI mode that we will get from the zip archive downloaded thanks to the DataGalaxy connectors API
  • The .jar files of the plugins downloaded with the same API.

The files that we would like to embed in the Docker image are put into a connector-cli/ sub-directory created for this purpose, while the configuration files are put in another connection/ sub-directory.

$ mkdir connection
$ vi connection/databricks-sql.properties
$
$ mkdir desktop-connector
$ curl -H "Authorization: Bearer $DG_TOKEN" -O "$DG_API_URL/connectivity/packages/desktop/latest/64bits.zip"
$ unzip -d dg-connector 64bits.zip
$
$ mkdir -p connector-cli/dg-connector
$ cp dg-connector/datagalaxy-cli-connector.jar connector-cli/dg-connector
$ cp -r dg-connector/lib connector-cli/dg-connector
$ cp -r dg-connector/conf connector-cli/dg-connector
$
$ mkdir connector-cli/plugins
$ curl -H "Authorization: Bearer $DG_TOKEN" -o connector-cli/plugins/databricks.jar "$DG_API_URL/connectivity/packages/databricks/latest/contents.jar"

CLI mode tests

To make sure that the connector works properly in CLI mode before building our image, we can run a few tests from our connector-cli/ directory.

We can check that:

  • Our Java version is compatible with the connector prerequisites (we will not need this for the Docker image as we will choose our base image in order to meet those requirements)
  • The option --version of the connector CLI returns the version of the connector we have downloaded
  • Calling the plugins option of the connector CLI lists all the plugins we have downloaded
  • The connector CLI works as expected to import the metadata from our system (here Databricks) to our DataGalaxy platform.
$ cd connector-cli
$ CWD=$(dirname $0)
$ java -version
$ java -classpath "$CWD/datagalaxy-cli-connector.jar:$CWD/lib/*:$CWD/plugins/*" \
   --add-opens=java.base/java.nio=ALL-UNNAMED  \
   com.datagalaxy.connector.desktop.cli.Main --version
$ java -classpath "$CWD/datagalaxy-cli-connector.jar:$CWD/lib/*:$CWD/plugins/*"  \
   --add-opens=java.base/java.nio=ALL-UNNAMED  \
   com.datagalaxy.connector.desktop.cli.Main plugins
$ java -classpath "$DIR/datagalaxy-cli-connector.jar:$DIR/lib/*:$DIR/plugins/*" \
   -Ddatagalaxy.configurationPath="$DIR/conf" \
   --add-opens=java.base/java.nio=ALL-UNNAMED  \
   com.datagalaxy.connector.desktop.cli.Main \
   import-api \
      --config ../connection/databricks-sql.properties \
      --server-url "$DG_API_URL" \
      --project-name "$WS" \
      --source-name "Databricks" \
      --create-source \
      --password "$DATABRICKS_PWD" \
      --token-value "$DG_TOKEN"
Note: the arguments you can pass to the connector CLI are described in this documentation.

Building the image

Now that we have checked that the connector CLI works as expected, we can embed all of this in a Docker image.

For the sake of convenience in using this image, we create in the connector-cli/ directory an entrypoint.sh file. This file will contain the command line for running the connector CLI. Here is the content of the entrypoint.sh file:

#!/bin/sh

java $JAVA_OPTS $JAVA_EXTRA_OPTS \
  -classpath "datagalaxy-cli-connector.jar:lib/*:plugins/*" \
  com.datagalaxy.connector.desktop.cli.Main "$@"

Then, we create the Dockerfile which will be used to build our image. The Dockerfile is based on an Open JDK Temurlin Alpine image and provides the Java version that meets the connector's requirements. Here is the content of this file:

FROM eclipse-temurin:17-jre-alpine

RUN addgroup -g 1001 datagalaxy && adduser -D -G datagalaxy -u 1001 datagalaxy

COPY conf/* /etc/datagalaxy-connector/
COPY lib/* /opt/datagalaxy-connector/lib/
COPY plugins/* /opt/datagalaxy-connector/plugins/
COPY datagalaxy-cli-connector.jar /opt/datagalaxy-connector/
COPY entrypoint.sh /opt/datagalaxy-connector/
RUN chmod a+x /opt/datagalaxy-connector/entrypoint.sh

USER datagalaxy
WORKDIR /opt/datagalaxy-connector

ENV DIR=/opt/datagalaxy-connector
ENV JAVA_OPTS="--add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/jdk.internal.misc=ALL-UNNAMED -Dio.netty.tryReflectionSetAccessible=true --illegal-access=warn -Dlogback.configurationFile=/etc/datagalaxy-connector/logback.xml -Ddatagalaxy.configurationPath=/etc/datagalaxy-connector -Dlog4j2.formatMsgNoLookups=true"
ENV JAVA_EXTRA_OPTS=""
ENTRYPOINT ["./entrypoint.sh"]
CMD ["--version"]

We now have all of the necessary elements to build the image:

$ ls
conf  datagalaxy-cli-connector.jar  Dockerfile  entrypoint.sh  lib  plugins

We can launch the image build:

$ docker build -t connector-cli:latest .

Image tests

Just like we did for the connector CLI, we can check that the image works as intended.

We can check that:

  • Running the image without arguments returns the version of the embedded connector
  • Running the plugins command lists the embedded plugins
  • The full execution of a metadata import from our system (here Databricks) to DataGalaxy.
$ cd ..
$ docker run --rm connector-cli:latest
$ docker run --rm connector-cli:latest plugins
$ docker run --rm \
   -v "./connection:/etc/datagalaxy-connector/connection" \
   connector-cli:latest \
   import-api \
      --config /etc/datagalaxy-connector/connection/databricks-sql.properties \
      --server-url "$DG_API_URL" \
      --project-name "$WS" \
      --source-name "Databricks" \
      --create-source \
      --password "$DATABRICKS_PWD" \
      --token-value "$DG_TOKEN"
Note: to pass the .properties files from the connections to the container, we use a mount point and then point the configuration (--config) to the path of the relevant file in the container. You may need other mount points, for example for the output files if you use the import-csv command or to pass your DataGalaxy API token file to the container. It is up to you to adjust this model depending on whether you prefer to use only one mount point or several, or other modifications you would like to make to adapt this to your context.
Note: thanks to the command that we have added in the entrypoint.sh, all arguments you add to the container are passed to the connector CLI. This results in the list of supported arguments being the same (except the path of the .properties files as explained above).
Note: you may have noticed that we have added a JAVA_EXTRA_OPTS variable in our entrypoint.sh that we have not used. It is thanks to this variable that you will be able to pass additional options to the container, for example setting a custom memory size, as explained in the frequently asked questions in the documentation of the connector CLI. You may use the -e Docker client option to add a value to this environment variable.

Full script

$ export DG_TOKEN=ey...
$ export DATABRICKS_PWD=xy...
$ export DG_API_URL=https://myinstance.api.datagalaxy.com/v2
$ export WS="MyWorkspace"
$
$ mkdir connection
$ vi connection/databricks-sql.properties
$
$ mkdir desktop-connector
$ curl -H "Authorization: Bearer $DG_TOKEN" -O "$DG_API_URL/connectivity/packages/desktop/latest/64bits.zip"
$ unzip -d dg-connector 64bits.zip
$
$ mkdir -p connector-cli/dg-connector
$ cp dg-connector/datagalaxy-cli-connector.jar connector-cli/dg-connector
$ cp -r dg-connector/lib connector-cli/dg-connector
$ cp -r dg-connector/conf connector-cli/dg-connector
$
$ mkdir connector-cli/plugins
$ curl -H "Authorization: Bearer $DG_TOKEN" -o connector-cli/plugins/databricks.jar "$DG_API_URL/connectivity/packages/databricks/latest/contents.jar"
$
$ cd connector-cli
$ CWD=$(dirname $0)
$ java -version
$ java -classpath "$CWD/datagalaxy-cli-connector.jar:$CWD/lib/*:$CWD/plugins/*" \
   --add-opens=java.base/java.nio=ALL-UNNAMED  \
   com.datagalaxy.connector.desktop.cli.Main --version
$ java -classpath "$CWD/datagalaxy-cli-connector.jar:$CWD/lib/*:$CWD/plugins/*"  \
   --add-opens=java.base/java.nio=ALL-UNNAMED  \
   com.datagalaxy.connector.desktop.cli.Main plugins
$ java -classpath "$DIR/datagalaxy-cli-connector.jar:$DIR/lib/*:$DIR/plugins/*" \
   -Ddatagalaxy.configurationPath="$DIR/conf" \
   --add-opens=java.base/java.nio=ALL-UNNAMED  \
   com.datagalaxy.connector.desktop.cli.Main \
   import-api \
      --config ../connection/databricks-sql.properties \
      --server-url "$DG_API_URL" \
      --project-name "$WS" \
      --source-name "Databricks" \
      --create-source \
      --password "$DATABRICKS_PWD" \
      --token-value "$DG_TOKEN"
$ 
$ vi entrypoint.sh
$ vi Dockerfile
$ docker build -t connector-cli:latest .
$
$ cd ..
$ docker run --rm connector-cli:latest
$ docker run --rm connector-cli:latest plugins
$ docker run --rm \
   -v "./connection:/etc/datagalaxy-connector/connection" \
   connector-cli:latest \
   import-api \
      --config /etc/datagalaxy-connector/connection/databricks-sql.properties \
      --server-url "$DG_API_URL" \
      --project-name "$WS" \
      --source-name "Databricks" \
      --create-source \
      --password "$DATABRICKS_PWD" \
      --token-value "$DG_TOKEN"

File entrypoint.sh

#!/bin/sh

java $JAVA_OPTS $JAVA_EXTRA_OPTS \
  -classpath "datagalaxy-cli-connector.jar:lib/*:plugins/*" \
  com.datagalaxy.connector.desktop.cli.Main "$@"

File Dockerfile

FROM eclipse-temurin:17-jre-alpine

RUN addgroup -g 1001 datagalaxy && adduser -D -G datagalaxy -u 1001 datagalaxy

COPY conf/* /etc/datagalaxy-connector/
COPY lib/* /opt/datagalaxy-connector/lib/
COPY plugins/* /opt/datagalaxy-connector/plugins/
COPY datagalaxy-cli-connector.jar /opt/datagalaxy-connector/
COPY entrypoint.sh /opt/datagalaxy-connector/
RUN chmod a+x /opt/datagalaxy-connector/entrypoint.sh

USER datagalaxy
WORKDIR /opt/datagalaxy-connector

ENV DIR=/opt/datagalaxy-connector
ENV JAVA_OPTS="--add-opens java.base/jdk.internal.misc=ALL-UNNAMED -Dio.netty.tryReflectionSetAccessible=true --illegal-access=warn -Ddatagalaxy.configurationPath=/etc/datagalaxy-connector -Dlog4j2.formatMsgNoLookups=true"
ENV JAVA_EXTRA_OPTS=""
ENTRYPOINT ["./entrypoint.sh"]
CMD ["--version"]

Frequently asked questions

Configuring a proxy

It is possible to configure a proxy directly with the Docker client, using the file `~/.docker/config.json`.

You can find out more in the official documentation: https://docs.docker.com/network/proxy/#configure-the-docker-client




Did you find it helpful? Yes No

Send feedback
Sorry we couldn't be helpful. Help us improve this article with your feedback.