proxystore.connectors.globus¶
Globus transfer connector implementation.
GlobusEndpoint
¶
GlobusEndpoint(
uuid: str,
endpoint_path: str,
local_path: str | None,
host_regex: str | Pattern[str],
)
Globus Collection endpoint configuration.
Defines the directory within the Globus Collection to be used for storage and transfer of files.
Tip
A Globus Collection may have a different mount point than what you
would use when logged in to a system. The endpoint_path
and
local_path
parameters are used as the mapping between the two.
For example, if I created a directory bar/
within the foo
project
allocation on ALCF's Grand filesystem, the endpoint_path
would be
/foo/bar
but the local_path
would be /projects/foo/bar
. Be sure
to check that the two paths point to the same physical directory
when instantiating this type.
Warning
The path should refer to a unique directory that ProxyStore can
exclusively use. For example, do not use your $HOME
directory and
instead prefer a directory suitable for bulk data storage, such as
a subdirectory of a project allocation
(e.g., /projects/FOO/proxystore-globus-cache
).
Parameters:
-
uuid
(str
) –UUID of the Globus Collection. This can be found by searching for the collection on app.globus.org.
-
endpoint_path
(str
) –Directory path within the Globus Collection to use for storing objects and transferring files. This path can be found via the File Manager on app.globus.org.
-
local_path
(str | None
) –The local path equivalent of
endpoint_path
. This may or may not be equal toendpoint_path
, depending on the configuration of the Globus Collection. This is equivalent to the path that you wouldls
when logged on to the system. -
host_regex
(str | Pattern[str]
) –String or regular expression that matches the hostname where the Globus Collection exists. The host pattern is used by the
GlobusConnector
to determine what the "local" endpoint is when reading, writing, and transferring files.
Source code in proxystore/connectors/globus.py
GlobusEndpoints
¶
GlobusEndpoints(endpoints: Collection[GlobusEndpoint])
A collection of Globus endpoints.
Parameters:
-
endpoints
(Collection[GlobusEndpoint]
) –Iterable of
GlobusEndpoint
instances.
Raises:
-
ValueError
–If
endpoints
has length 0 or if multiple endpoints with the same UUID are provided.
Source code in proxystore/connectors/globus.py
from_dict
classmethod
¶
Construct an endpoints collection from a dictionary.
Example:
```python
{
"endpoint-uuid-1": {
"host_regex": "host1-regex",
"endpoint_path": "/path/to/endpoint/dir",
"local_path": "/path/to/local/dir"
},
"endpoint-uuid-2": {
"host_regex": "host2-regex",
"endpoint_path": "/path/to/endpoint/dir",
"local_path": "/path/to/local/dir"
}
}
```
Source code in proxystore/connectors/globus.py
from_json
classmethod
¶
from_json(json_file: str) -> GlobusEndpoints
Construct a GlobusEndpoints object from a json file.
The dict
read from the JSON file will be passed to
from_dict()
and should match the format expected by
from_dict()
.
Source code in proxystore/connectors/globus.py
dict
¶
Convert the GlobusEndpoints to a dict.
Note that the
GlobusEndpoints
object can be reconstructed by passing the dict
to.
from_dict()
.
Source code in proxystore/connectors/globus.py
get_by_host
¶
get_by_host(host: str) -> GlobusEndpoint
Get endpoint by host.
Searches the endpoints for a endpoint who's host_regex
matches
host
.
Parameters:
-
host
(str
) –Host to match.
Returns:
-
GlobusEndpoint
–Globus endpoint.
Raises:
-
ValueError
–If
host
does not match any of the endpoints.
Source code in proxystore/connectors/globus.py
GlobusKey
¶
Bases: NamedTuple
Key to object transferred with Globus.
Attributes:
-
filename
(str
) –Unique object filename.
-
task_id
(str | tuple[str, ...]
) –Globus transfer task IDs for the file.
__eq__
¶
Match keys by filename only.
This is a hack around the fact that the task_id is not created until after the filename is so there can be a state where the task_id is empty.
Source code in proxystore/connectors/globus.py
GlobusConnector
¶
GlobusConnector(
endpoints: (
GlobusEndpoints
| list[GlobusEndpoint]
| dict[str, dict[str, str]]
),
polling_interval: int = 1,
sync_level: (
int | Literal["exists", "size", "mtime", "checksum"]
) = "mtime",
timeout: int = 60,
clear: bool = True,
)
Globus transfer connector.
The GlobusConnector
is
similar to a FileConnector
in that objects are saved to disk but allows for the transfer of objects
between remote file systems. Directories on separate file systems are kept
in sync via Globus transfers. The
GlobusConnector
is useful when moving data between hosts that have a Globus Transfer
endpoint but may have restrictions that prevent the use of other connectors
(e.g., ports cannot be opened for using a
RedisConnector
).
Note
To use Globus for data transfer, Globus authentication needs to be
performed with the proxystore-globus-auth
CLI. If
authentication is not performed before initializing a
GlobusConnector
,
the program will prompt the user to perform authentication. This can
result in unexpected program hangs while the constructor waits on the
user to authenticate. Authentication only needs to be performed once
per system
Warning
The close()
method will, by default, delete all of the provided directories
to keep in sync. Ensure that the provided directories are unique
and only used by ProxyStore.
Parameters:
-
endpoints
(GlobusEndpoints | list[GlobusEndpoint] | dict[str, dict[str, str]]
) –Collection of directories across Globus Collection endpoints to keep in sync. If passed as a
dict
, the dictionary must match the format expected byGlobusEndpoints.from_dict()
. Note that givenn
endpoints there will ben-1
Globus transfers per operation, so we suggest not using too many endpoints at the same time. I.e., stored objects are transferred to all endpoints. If this behavior is not desired, use multiple connector instances, each with a different set of endpoints. -
polling_interval
(int
, default:1
) –Interval in seconds to check if Globus Transfer tasks have finished.
-
sync_level
(int | Literal['exists', 'size', 'mtime', 'checksum']
, default:'mtime'
) –Globus Transfer sync level.
-
timeout
(int
, default:60
) –Timeout in seconds for waiting on Globus Transfer tasks.
-
clear
(bool
, default:True
) –Delete all directories specified in
endpoints
whenclose()
is called to cleanup files.
Raises:
-
GlobusAuthFileError
–If the Globus authentication file cannot be found.
-
ValueError
–If
endpoints
is of an incorrect type. -
ValueError
–If fewer than two endpoints are provided.
Source code in proxystore/connectors/globus.py
close
¶
close(clear: bool | None = None) -> None
Close the connector and clean up.
Warning
This will delete the directory at local_path
on each endpoint
by default.
Warning
This method should only be called at the end of the program when
the store will no longer be used, for example once all proxies
have been resolved. Calling close()
multiple times
can raise file not found errors.
Parameters:
-
clear
(bool | None
, default:None
) –Delete the user-provided directories on each endpoint. Overrides the default value of
clear
provided when theGlobusConnector
was instantiated.
Source code in proxystore/connectors/globus.py
config
¶
Get the connector configuration.
The configuration contains all the information needed to reconstruct the connector object.
Source code in proxystore/connectors/globus.py
from_config
classmethod
¶
from_config(config: dict[str, Any]) -> GlobusConnector
Create a new connector instance from a configuration.
Parameters:
evict
¶
evict(key: GlobusKey) -> None
Evict the object associated with the key.
Parameters:
-
key
(GlobusKey
) –Key associated with object to evict.
Source code in proxystore/connectors/globus.py
exists
¶
Check if an object associated with the key exists.
Note
If the corresponding Globus Transfer is still in progress, this method will wait to make sure the transfers is successful.
Parameters:
-
key
(GlobusKey
) –Key potentially associated with stored object.
Returns:
-
bool
–If an object associated with the key exists.
Source code in proxystore/connectors/globus.py
get
¶
Get the serialized object associated with the key.
Parameters:
-
key
(GlobusKey
) –Key associated with the object to retrieve.
Returns:
-
bytes | None
–Serialized object or
None
if the object does not exist.
Source code in proxystore/connectors/globus.py
get_batch
¶
Get a batch of serialized objects associated with the keys.
Parameters:
Returns:
-
list[bytes | None]
–List with same order as
keys
with the serialized objects orNone
if the corresponding key does not have an associated object.
Source code in proxystore/connectors/globus.py
put
¶
Put a serialized object in the store.
Parameters:
-
obj
(bytes
) –Serialized object to put in the store.
Returns:
-
GlobusKey
–Key which can be used to retrieve the object.
Source code in proxystore/connectors/globus.py
put_batch
¶
Put a batch of serialized objects in the store.
Parameters:
Returns:
-
list[GlobusKey]
–List of keys with the same order as
objs
which can be used to retrieve the objects.