Globus

Current VT Status

As of Fall 2023, Virginia Tech has an institutional subscription with Globus coordinated through ARC. In addition, ARC has established a Globus Data Transfer Node which provides access to /projects storage. Individuals are also able to create personal Globus accounts and use Globus Connect Personal software on ARC systems.

Make a /projects directory visible to Globus

The /projects directories can be made visible (“shared”) via Globus. The owner (usually the PI) of the /projects directory can enable sharing via ARC’s ColdFront allocation management site.

  1. The PI will login to ColdFront, navigate to the corresponding “Project (free) (Storage)” allocation and check the box for “Share via Globus” and then click “update”. This will take effect immediately.

  2. Any member of the associated group can then login to Globus.org and access/manage files there. Under “File Manager” search for “Virginia Tech” and then the “Virginia Tech ARC Globus Projects Directories” collection can be selected and the shared directory should be visible.

Filesets with "Lots of Small Files" (LoSF) are the worst-case scenario for most file systems and transfer tools. For stability and performance, it is vital that such LoSF filesets be packaged into archives via tools such as `tar`. Attempting transfers of LoSF filesets via Globus is known to cause very poor performance and faults such as `ENDPOINT_TOO_BUSY`.

Globus Connect Personal (GCP)

GCP can be used to connect a device or storage location you own to the Globus network. For example, you can make your /home/<username> or /projects/<groupname> group-shared directory accessible to you when you log into the Globus.org web application. When you do this, it shows up in your “Collections”. Then you can browse, upload, download, and coordinate transfers among other collections you have in the Globus web application. This can be a very powerful and enabling way to manage data among multiple institutions. Detailed information on using GCP is available on Globus’s website.

Using GCP on ARC Systems

Here is an outline of the steps to you’ll need to take to use GCP on an ARC cluster. These are derived from the more complete instrutions provided by Globus.

Connecting GCP to Globus requires and that you have an account with Globus and you will need to access their web application, so the first step is to

  1. Log in to https://globus.com in a web browser.

On the Tinkercliffs cluster, a software module for GCP is provided.

module load tinkercliffs-rome/GlobusConnectPersonal

By loading the module, the program globusconnectpersonal is made available to you, but it still needs to be configured.

Configuring

  1. From the command line on an ARC system (eg. Tinkercliffs login node), load the module and then run the command globusconnectpersonal. If you have not already completed configuration, then it will provide you with a URL and walk you through the next two steps.

  2. Authenticate GCP client with the Globus web application by copying the provided URL into your browser. This will prompt you for some setup information and then provide an “auth code”.

  3. Copy the “auth code” from your browser and paste it into your the command-line shell which should have a prompt waiting for this code.

  4. (optional) Edit the file ~/.globusonline/lta/config-paths to configure which directories GCP should use and whether or not to present them as writable in the Globus system.

Note

Any text editor can be used to modify the config file. If you don’t already have a preferred command-line text editor, then nano may be a good choice.

Here is an example config-paths file. It is a header-less CSV (comma separated values) file. The three fields are

  1. the directory (ie. “folder” or “path”) to connect

  2. [0,1] indicating whether or not “Globus sharing” is enabled. “0” is the only viable options while VT does not have an institutional subscription to Globus.

  3. 0 or 1 indicating whether the directory is “not writable” or “writable”, respectively in the Globus interface.

Note

Writability in Globus also requires that the writing user actually have write permissions on the filesystem as well. So, indicating that a directory is writable for GCP does not somehow override the file/directory permissions on ARC system.

~/,0,1
/projects/proj_name,0,0

Here, two directories (~, and /projects/proj_name) are being made available to GCP. The last field

Note

~ is a shortcut for /home/<username>

Installing GCP on linux

Note

These are derived from the more complete instrutions provided by Globus.

  1. Verify that you can log in to https://globus.org. If you do not already have a Globus account, you will need to create one.

  2. Download and extract the latest GCP, then run the setup. The ls command is needed to determine the version number you have downloaded which you must specify to cd to the correct directory:

# Download latest GCP
wget https://downloads.globus.org/globus-connect-personal/linux/stable/globusconnectpersonal-latest.tgz
# Extract the compressed tar file
tar xzf globusconnectpersonal-latest.tgz
# Determine the name and version of the extracted directory
ls -ld globusconnect*
# Change directory to the newly extracted on
cd globusconnectpersonal-__.__.__
# This will run the GCP setup if you have not already done so
./globusconnecpersonal
  1. Authenticate the GCP client with the Globus website. The last step above should have provided a URL for you to copy-paste into a web browser. Navigating to that URL will connect the GCP you have installed with the Globus web app.

  1. Complete the authentication. Review the details at the page loaded by that URL, configure as desired, and you will be provided with an “auth code” when complete. Copy that from your browser and paste it into the shell which has prompted for this and is awaiting your input.

-----
Enter the auth code: ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ

== starting endpoint setup


Input a value for the Endpoint Name: tc2
registered new endpoint, id: 5874dee8-edcf-11ed-9bb3-c9bb788c490e
setup completed successfully

Will now start globusconnectpersonal in GUI mode
Graphical environment not detected

To launch Globus Connect Personal in CLI mode, use
  globusconnectpersonal -start

Or, if you want to force the use of the GUI, use
  globusconnectpersonal -gui
  1. *Start the client to make your files available to you in the Globus web app.

globusconnectpersonal -start
  1. (optional) Edit the configuration to add other directories and set permissions