Globus

Introduction

Globus is an infrastructure designed for transferring large volumes of data. At Virginia Tech, ARC maintains an institutional Globus Standard Subscription, coordinated through the Globus Connect Server (GCS). ARC also operates a dedicated Globus Data Transfer Node that provides access to the /projects storage system and is subscribed to the High Assurance tier for safe transfer of Protected Health Information (PHI), Personally Identifiable Information (PII), and Controlled Unclassified Information (CUI) data. In addition, individuals may create personal Globus accounts and use Globus Connect Personal (GCP) on ARC systems or other platforms if they have their own Globus license. Among other features, Globus provides fault tolerance for (large) data transfers.

Prerequisites

  1. Log in to Globus: Confirm that you can log in at https://globus.org. If you do not already have a Globus account, you will need to create one.

  2. Enable Globus for your ARC project: If you plan to use Virginia Tech ARC’s Globus license, you must enable Globus sharing for your /projects directory. The owner (usually the PI) of the /projects allocation can do this through ARC’s ColdFront management site. Use the steps and screenshot below as guidance:

    • Open your project storage allocation in ColdFront.

    • Check the box for “Share via Globus” and click Update.

    • The change takes effect immediately.

  3. File and directory permissions for your ARC project: Even if you enable your /projects directory to share files using Globus via Cold Front, you still control permissions on your files and directories. That is, you still have control over the directories and files within your /projects directories that are accessible by Globus. So, for example, you can make invisible files that you do not want to be copied from your area using Globus by unsetting the read bit; and you can ensure that a file does not get overwritten using Globus (when copying a file into your /projects area) by unsetting the write bit on that file. (These file permissions bits are set and unset using the Unix chmod command.)

When transferring data with Globus, at least one endpoint must be covered by an active Globus subscription (institutional or personal license).  
- If your institution already has a license, you can transfer directly to `/home` or `/projects` using Globus Connect Personal (GCP).  
- If you intend to use Virginia Tech ARC’s Globus license, the ARC endpoint must be `/projects`, and it must be enabled in ColdFront as described above.

Transferring to/from ARC using VT ARC’s Globus license (/projects)

If you plan to use VT ARC’s Globus license, transfers are only supported to the /projects directory with “Share via Globus” enabled (as described in the Prerequisites section). Once enabled, any member of the associated project group can:

  1. Log in to https://globus.org

  2. In the File Manager, search for “Virginia Tech” to locate the “Virginia Tech ARC Globus Projects Directories” (GCS) collection. Select this collection, and the shared directory will be visible.

    • All /projects directories with “Share via Globus” enabled will appear.

    • Access permissions remain restricted to project group members, the same as on ARC clusters.

  3. Transfer files between your /projects directory endpoint and another endpoint. The other endpoint can be any GCS or GCP endpoint and does not require a license, since you are using VT ARC’s Globus license. (If you are not seeing two endpoints as in the graphic immediately below, go to the “Panels” area at the upper right and choose the middle of the three icons.)

Transferring to/from ARC using own Globus license

If you plan to use your own Globus license, you can transfer data to both /home and /projects directories using Globus Connect Personal (GCP).

Globus Connect Personal (GCP)

GCP can be used to connect a device or storage location you own to the Globus network. For example, you can make your /home/<username> or /projects/<groupname> group-shared directory accessible to you when you log into the https://globus.org web application. When you do this, it shows up in your “Collections.” You can then browse, upload, download, and coordinate transfers among other collections in the Globus web application. Detailed information on using GCP is available on Globus’ website.

Using GCP on ARC Systems

Here is an outline of the steps you’ll need to take to use GCP on an ARC cluster. These are derived from the more complete instructions provided by Globus.

Connecting GCP to Globus requires that you have a Globus account and access to the Globus web application.

  1. Log in to https://globus.org in a web browser.

  2. On ARC systems, a software module for GCP is provided:

    module load GlobusConnectPersonal
    

    By loading the module, the program globusconnectpersonal is made available, but it still needs to be configured.

Configuring

  1. From the command line on an ARC system (e.g., a Tinkercliffs login node), load the module and then run the command globusconnectpersonal. If you have not already completed configuration, it will provide you with a URL and guide you through the next steps.

  2. Authenticate the GCP client with the Globus web application by copying the provided URL into your browser. This will prompt you for setup information and then provide an “auth code.”

  3. Copy the “auth code” from your browser and paste it into your command-line shell, which should be waiting for this input.

  4. (optional) Edit the file ~/.globusonline/lta/config-paths to configure which directories GCP should use and whether or not to present them as writable in the Globus system.

Note

Any text editor can be used to modify the config file. If you don’t already have a preferred command-line text editor, nano may be a good choice.

Here is an example config-paths file. It is a header-less CSV (comma-separated values) file.

~/,0,1
/projects/proj_name,0,0

The three fields are:

  1. the directory (path) to connect

  2. [0,1] indicating whether or not “Globus sharing” is enabled (0 is the only viable option while VT does not have an institutional subscription to Globus sharing)

  3. 0 or 1 indicating whether the directory is “not writable” or “writable,” respectively, in the Globus interface

Note

Writability in Globus also requires that the writing user actually have write permissions on the filesystem. Marking a directory as writable in GCP does not override ARC file/directory permissions.

In the example above, two directories (~, and /projects/proj_name) are being made available to GCP.

Note

~ is a shortcut for /home/<username>.

Installing GCP on Linux (Non-ARC side such as a personal PC)

Note

These steps are derived from the more complete instructions provided by Globus.

  1. Verify that you can log in to https://globus.org. If you do not already have a Globus account, you will need to create one.

  2. Download and extract the latest GCP, then run the setup. The ls command is used to determine the version number you have downloaded so you can cd to the correct directory:

# Download latest GCP
wget https://downloads.globus.org/globus-connect-personal/linux/stable/globusconnectpersonal-latest.tgz
# Extract the compressed tar file
tar xzf globusconnectpersonal-latest.tgz
# Determine the name and version of the extracted directory
ls -ld globusconnect*
# Change directory to the newly extracted one
cd globusconnectpersonal-__.__.__
# This will run the GCP setup if you have not already done so
./globusconnectpersonal
  1. Authenticate the GCP client with the Globus website. The previous step should have provided a URL to copy-paste into a web browser. Navigating to that URL will connect the installed GCP client with the Globus web app.

  1. Complete the authentication. Review the details at the page loaded by that URL, configure as desired, and you will be provided with an “auth code.” Copy that from your browser and paste it into the shell which is awaiting your input.

-----
Enter the auth code: ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ

== starting endpoint setup

Input a value for the Endpoint Name: tc2
registered new endpoint, id: 5874dee8-edcf-11ed-9bb3-c9bb788c490e
setup completed successfully

Will now start globusconnectpersonal in GUI mode
Graphical environment not detected

To launch Globus Connect Personal in CLI mode, use
  globusconnectpersonal -start

Or, if you want to force the use of the GUI, use
  globusconnectpersonal -gui
  1. Start the client to make your files available in the Globus web app:

globusconnectpersonal -start
  1. (optional) Edit the configuration to add other directories and set permissions.

Optimizing file transfer performance

Filesets with “Lots of Small Files” (LoSF) are the worst-case scenario for most file systems and transfer tools. For stability and performance, it is best to package such LoSF filesets into archives using tools such as tar. Attempting transfers of LoSF filesets via Globus can cause very poor performance and faults such as ENDPOINT_TOO_BUSY.