proxystore.connectors.globus¶
Globus transfer connector implementation.
GlobusEndpoint
¶
GlobusEndpoint(
uuid: str,
endpoint_path: str,
local_path: str | None,
host_regex: str | Pattern[str],
)
Globus endpoint representation.
Parameters:
-
uuid
(str
) –UUID of Globus endpoint.
-
endpoint_path
(str
) –Path within endpoint to directory to use for storing objects.
-
local_path
(str | None
) –Local path (as seen by the host filesystem) that corresponds to the directory specified by
endpoint_path
. -
host_regex
(str | Pattern[str]
) –String that matches the host where the Globus endpoint exists or regex pattern than can be used to match the host. The host pattern is needed so that proxies can figure out what the local endpoint is when they are resolved.
Source code in proxystore/connectors/globus.py
GlobusEndpoints
¶
GlobusEndpoints(endpoints: Collection[GlobusEndpoint])
A collection of Globus endpoints.
Parameters:
-
endpoints
(Collection[GlobusEndpoint]
) –Iterable of
GlobusEndpoints
instances.
Raises:
-
ValueError
–If
endpoints
has length 0 or if multiple endpoints with the same UUID are provided.
Source code in proxystore/connectors/globus.py
from_dict
classmethod
¶
Construct an endpoints collection from a dictionary.
Example:
```python
{
"endpoint-uuid-1": {
"host_regex": "host1-regex",
"endpoint_path": "/path/to/endpoint/dir",
"local_path": "/path/to/local/dir"
},
"endpoint-uuid-2": {
"host_regex": "host2-regex",
"endpoint_path": "/path/to/endpoint/dir",
"local_path": "/path/to/local/dir"
}
}
```
Source code in proxystore/connectors/globus.py
from_json
classmethod
¶
from_json(json_file: str) -> GlobusEndpoints
Construct a GlobusEndpoints object from a json file.
The dict
read from the JSON file will be passed to
from_dict()
and should match the format expected by
from_dict()
.
Source code in proxystore/connectors/globus.py
dict
¶
Convert the GlobusEndpoints to a dict.
Note that the
GlobusEndpoints
object can be reconstructed by passing the dict
to.
from_dict()
.
Source code in proxystore/connectors/globus.py
get_by_host
¶
get_by_host(host: str) -> GlobusEndpoint
Get endpoint by host.
Searches the endpoints for a endpoint who's host_regex
matches
host
.
Parameters:
-
host
(str
) –Host to match.
Returns:
-
GlobusEndpoint
–Globus endpoint.
Raises:
-
ValueError
–If
host
does not match any of the endpoints.
Source code in proxystore/connectors/globus.py
GlobusKey
¶
Bases: NamedTuple
Key to object transferred with Globus.
Attributes:
-
filename
(str
) –Unique object filename.
-
task_id
(str | tuple[str, ...]
) –Globus transfer task IDs for the file.
__eq__
¶
Match keys by filename only.
This is a hack around the fact that the task_id is not created until after the filename is so there can be a state where the task_id is empty.
Source code in proxystore/connectors/globus.py
GlobusConnector
¶
GlobusConnector(
endpoints: (
GlobusEndpoints
| list[GlobusEndpoint]
| dict[str, dict[str, str]]
),
polling_interval: int = 1,
sync_level: (
int | Literal["exists", "size", "mtime", "checksum"]
) = "mtime",
timeout: int = 60,
clear: bool = True,
)
Globus transfer connector.
The GlobusConnector
is
similar to a FileConnector
in that objects are saved to disk but allows for the transfer of objects
between two remote file systems. The two directories on the separate file
systems are kept in sync via Globus transfers. The
GlobusConnector
is useful when moving data between hosts that have a Globus endpoint but
may have restrictions that prevent the use of other store backends
(e.g., ports cannot be opened for using a
RedisConnector
.
Note
To use Globus for data transfer, Globus authentication needs to be
performed with the proxystore-globus-auth
CLI. If
authentication is not performed before initializing a
GlobusConnector
,
the program will prompt the user to perform authentication. This can
result in unexpected program hangs while the constructor waits on the
user to authenticate. Authentication only needs to be performed once
per system
Parameters:
-
endpoints
(GlobusEndpoints | list[GlobusEndpoint] | dict[str, dict[str, str]]
) –Globus endpoints to keep in sync. If passed as a
dict
, the dictionary must match the format expected byGlobusEndpoints.from_dict()
. Note that givenn
endpoints there will ben-1
Globus transfers per operation, so we suggest not using too many endpoints at the same time. -
polling_interval
(int
, default:1
) –Interval in seconds to check if Globus tasks have finished.
-
sync_level
(int | Literal['exists', 'size', 'mtime', 'checksum']
, default:'mtime'
) –Globus transfer sync level.
-
timeout
(int
, default:60
) –Timeout in seconds for waiting on Globus tasks.
-
clear
(bool
, default:True
) –Clear all objects on
close()
by deleting thelocal_path
of each endpoint.
Raises:
-
GlobusAuthFileError
–If the Globus authentication file cannot be found.
-
ValueError
–If
endpoints
is of an incorrect type. -
ValueError
–If fewer than two endpoints are provided.
Source code in proxystore/connectors/globus.py
close
¶
close(clear: bool | None = None) -> None
Close the connector and clean up.
Warning
This will delete the directory at local_path
on each endpoint
by default.
Warning
This method should only be called at the end of the program when the store will no longer be used, for example once all proxies have been resolved.
Parameters:
-
clear
(bool | None
, default:None
) –Remove the store directory. Overrides the default value of
clear
provided when theGlobusConnector
was instantiated.
Source code in proxystore/connectors/globus.py
config
¶
Get the connector configuration.
The configuration contains all the information needed to reconstruct the connector object.
Source code in proxystore/connectors/globus.py
from_config
classmethod
¶
from_config(config: dict[str, Any]) -> GlobusConnector
Create a new connector instance from a configuration.
Parameters:
evict
¶
evict(key: GlobusKey) -> None
Evict the object associated with the key.
Parameters:
-
key
(GlobusKey
) –Key associated with object to evict.
Source code in proxystore/connectors/globus.py
exists
¶
Check if an object associated with the key exists.
Note
If the corresponding Globus transfer is still in progress, this method will wait to make sure the transfers is successful.
Parameters:
-
key
(GlobusKey
) –Key potentially associated with stored object.
Returns:
-
bool
–If an object associated with the key exists.
Source code in proxystore/connectors/globus.py
get
¶
Get the serialized object associated with the key.
Parameters:
-
key
(GlobusKey
) –Key associated with the object to retrieve.
Returns:
-
bytes | None
–Serialized object or
None
if the object does not exist.
Source code in proxystore/connectors/globus.py
get_batch
¶
Get a batch of serialized objects associated with the keys.
Parameters:
Returns:
-
list[bytes | None]
–List with same order as
keys
with the serialized objects orNone
if the corresponding key does not have an associated object.
Source code in proxystore/connectors/globus.py
put
¶
Put a serialized object in the store.
Parameters:
-
obj
(bytes
) –Serialized object to put in the store.
Returns:
-
GlobusKey
–Key which can be used to retrieve the object.
Source code in proxystore/connectors/globus.py
put_batch
¶
Put a batch of serialized objects in the store.
Parameters:
Returns:
-
list[GlobusKey]
–List of keys with the same order as
objs
which can be used to retrieve the objects.