Advanced¶
Proxies¶
Proxies can be created easily using ProxyStore.
from proxystore.proxy import Proxy
def resolve_object(...):
# Function that produces the object of interest
return obj
p = Proxy(resolve_object)
resolve_object()
will be called when the proxy p
does its
just-in-time resolution, and then p
will behave exactly like obj
.
A factory for a Proxy
can be
any callable object (i.e., object which implements __call__
).
Proxies are powerful because they can intercept and redefine functionality of an object while emulating the rest of the objects behavior.
import numpy as np
from proxystore.proxy import Proxy
x = np.array([1, 2, 3])
class MyFactory():
def __init__(self, obj):
self.obj = obj
def __class__(self):
return self.obj
p = Proxy(MyFactory(x))
# A proxy is an instance of its wrapped object
assert isinstance(p, Proxy)
assert isinstance(p, np.ndarray)
# The proxy can do everything the numpy array can
assert np.array_equal(p, [1, 2, 3])
assert np.sum(p) == 6
y = x + p
assert np.array_equal(y, [2, 4, 6])
The ProxyStore Proxy
is built on the proxy
from lazy-object-proxy
which intercepts all calls to the object's magic functions
(__func_name__()
functions) and forwards the calls to the wrapped
object. If the wrapped object has not been resolved yet, the proxy calls the
factory that was passed to the proxy constructor to retrieve the object that
should be wrapped.
Generally, a proxy is only ever resolved once. However, when a proxy is serialized, only the factory is serialized, and when the proxy is deserialized again and used, the factory will be called again to resolve the object.
Utilities¶
Proxystore provides some useful utility functions for dealing with proxies.
from proxystore import proxy
p = proxy.Proxy(...)
# Check if a proxy has been resolved yet
proxy.is_resolved(p)
# Force a proxy to resolve itself
proxy.resolve(p)
# Extract the wrapped object from the proxy
x = proxy.extract(p)
assert not isinstance(x, proxy.Proxy)
Other Uses¶
Proxies can be used add functionality to existing objects. Two common examples are access control and partial resoluion.
Stores¶
Asynchronous Resolving¶
It is common in distributed computation for inputs to functions executed
remotely to not be needed immediately upon execution.
Proxies created by a Store
support
asynchronous resolution to overlap communication and computation.
from proxystore.store.utils import resolve_async
def complex_function(large_proxied_input):
resolve_async(large_proxied_input)
# More computation...
# First access to the proxy will not be as expensive because
# of the asynchronous resolution
compute_input(large_proxied_input)
Caching¶
Store
provides built in caching functionality.
Caches are local to the Python process but will speed up the resolution when
multiple proxies refer to the same object.
from proxystore.store.file import FileStore
# Cache size of 16 is the default
FileStore('mystore', store_dir='/tmp/proxystore', cache_size=16)
Transactional Guarantees¶
ProxyStore is designed around optimizing the communication of ephemeral data (e.g., inputs and outputs of functions) which is typically write-once, read-many. Thus, ProxyStore does not provides any guarantees about object versions if a user manually overwrites an object.
Serialization¶
All Store
operation uses ProxyStore's provided
serialization utilities (proxystore.serialize) by default. However,
all Store
methods that move data in or out of
the store can be provided custom serializers or deserializers of the form:
In some cases, data may already be serialized in which case an identity
function can be passed as the serializer/deserializer (e.g., lambda x: x
).
Implementing a custom serializer may be beneficial for complex structures
where pickle/cloudpickle (the default serializers used by ProxyStore) are
innefficient. E.g.,
import torch
import io
from proxystore.serialize import serialize
from proxystore.store.redis import RedisStore
def serialize_torch_model(obj: Any) -> bytes:
if isinstance(obj, torch.nn.Module):
buffer = io.BytesIO()
torch.save(model, buffer)
return buffer.read()
else:
# Fallback for unsupported types
return serialize(obj)
mymodel = torch.nn.Module()
store = RedisStore(...)
key = store.set(mymodel, serializer=serialize_torch_model)
See Issue #146 for further discussion.