ProxyStore¶
ProxyStore facilitates efficient data flow management in distributed Python applications, such as dynamic task-based workflows or serverless and edge applications.
The transparent object proxy, the core building block within ProxyStore, acts like a wide-area reference that can be cheaply communicated. Unlike traditional references that are only valid within the virtual address space of a single process, the proxy references an object in remote storage and can be implicitly dereferenced in arbitrary processes—even on remote machines. The proxy is transparent in that it implicitly dereferences its target object when used—referred to a just-in-time resolution—and afterwards forwards all operations on itself to the cached target object.
This paradigm results in the best of both pass-by-reference and pass-by-value semantics, improves performance and portability by reducing transfer overheads through intermediaries, and abstracts low-level communication methods which reduces code complexity. A proxy contains within itself all the information and logic necessary to resolve the target object. This self-contained nature means a proxy consumer need not be aware of the low-level communication mechanisms used by the proxy; rather, this is unilaterally determined by the producer of the proxy.
ProxyStore supports a diverse set of programming patterns built on the proxy paradigm:
- Task-based Workflows
- Function-as-a-Service/Serverless Applications
- Distributed Futures
- Bulk Data Streaming
- and more!
ProxyStore can leverage many popular mediated data transfer and storage systems: DAOS, Globus Transfer, Kafka, KeyDB, and Redis. Custom communication methods built on Mochi, UCX, WebRTC, and ZeroMQ are provided for high-performance and peer-to-peer applications.
Read more about ProxyStore's concepts here. Complete documentation for ProxyStore is available at docs.proxystore.dev.
Installation¶
The base ProxyStore package can be installed with pip
.
Leveraging third-party libraries may require dependencies not installed by default but can be enabled via extras installation options (e.g., endpoints
, kafka
, or redis
).
All additional dependencies can be installed with:
proxystore-ex
package which contains extensions and experimental features.
The extensions package can also be installed with pip
using
proxystore[extensions]
or proxystore-ex
.
See the Installation guide for more information about the available extras installation options. See the Contributing guide to get started for local development.
Example¶
Using ProxyStore to store and transfer objects only requires a few lines of code.
from proxystore.connectors.redis import RedisConnector
from proxystore.proxy import Proxy
from proxystore.store import Store
data = MyDataType(...)
def my_function(x: MyDataType) -> ...:
# x is transparently resolved when first used by the function.
# Then the proxy, x, behaves as an instance of MyDataType
# for the rest of its existence.
assert isinstance(x, MyDataType)
with Store(
'example',
connector=RedisConnector('localhost', 6379),
register=True,
) as store:
# Store the object in Redis (or any other connector).
# The returned Proxy acts like a reference to the object.
proxy = store.proxy(data)
assert isinstance(proxy, Proxy)
# Invoking a function with proxy works without function changes.
my_function(proxy)
Check out the Get Started guide to learn more!
Citation¶
If you use ProxyStore or any of this code in your work, please cite our ProxyStore (SC '23) and Proxy Patterns (arXiv preprint) papers.
@inproceedings{pauloski2023proxystore,
author = {Pauloski, J. Gregory and Hayot-Sasson, Valerie and Ward, Logan and Hudson, Nathaniel and Sabino, Charlie and Baughman, Matt and Chard, Kyle and Foster, Ian},
title = {{Accelerating Communications in Federated Applications with Transparent Object Proxies}},
address = {New York, NY, USA},
articleno = {59},
booktitle = {Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis},
doi = {10.1145/3581784.3607047},
isbn = {9798400701092},
location = {Denver, CO, USA},
numpages = {15},
publisher = {Association for Computing Machinery},
series = {SC '23},
url = {https://doi.org/10.1145/3581784.3607047},
year = {2023}
}
@misc{pauloski2024proxystore,
author = {J. Gregory Pauloski and Valerie Hayot-Sasson and Logan Ward and Alexander Brace and André Bauer and Kyle Chard and Ian Foster},
title = {{Object Proxy Patterns for Accelerating Distributed Applications}},
archiveprefix = {arXiv},
eprint = {2407.01764},
primaryclass = {cs.DC},
url = {https://arxiv.org/abs/2407.01764},
year = {2024}
}