
vgi-rpc¶
Transport-agnostic RPC framework built on Apache Arrow IPC serialization.
Built by 🚜 Query.Farm
Define RPC interfaces as Python Protocol classes. The framework derives Arrow schemas from type annotations and provides typed client proxies with automatic serialization/deserialization.
Key Features¶
- Protocol-based interfaces — define services as typed Python
Protocolclasses; proxies preserve the Protocol type for full IDE autocompletion - Apache Arrow IPC wire format — zero-copy serialization for structured data using PyArrow
- Two method types — unary and streaming (producer and exchange patterns)
- Transport-agnostic — in-process pipes, subprocess, Unix domain sockets, shared memory, or HTTP — see Transports
- Automatic schema inference — Python type annotations map to Arrow types
- Pluggable authentication —
AuthContext+ middleware for HTTP auth (JWT, API key, etc.) - Runtime introspection — opt-in
__describe__RPC method for dynamic service discovery - CLI tool —
vgi-rpc describeandvgi-rpc callfor ad-hoc service interaction - Shared memory transport — zero-copy batch transfer between co-located processes — see Transports
- IPC validation — configurable batch validation levels for untrusted data
- Large batch support — transparent externalization to S3/GCS for oversized data
- Per-call I/O statistics —
CallStatisticstracks batches, rows, and bytes for usage accounting (access log + OTel spans) - Wire protocol debug logging — enable
vgi_rpc.wireat DEBUG for full wire-level visibility — see Logging
Two Method Types¶
vgi-rpc supports two RPC patterns. The method's return type in the Protocol determines which one is used:
Unary¶
A single request produces a single response — like a function call across a process boundary. The client sends parameters, the server returns a result.
Use unary for: lookups, computations, CRUD operations — anything that returns one value.
Streaming¶
A single request opens an ongoing session that produces multiple batches of data. The return type Stream[S] signals a streaming method, where S is a StreamState subclass that holds state between iterations.
There are two streaming patterns:
Producer — the server pushes data to the client (like a generator). The client iterates, the server emits batches until calling out.finish():
Exchange — lockstep bidirectional streaming. The client sends data, the server responds, one round at a time:
Use streaming for: pagination, progress reporting, incremental computation, or any workflow where data is produced or exchanged over time.
Installation¶
Optional extras:
pip install vgi-rpc[http] # HTTP transport (Falcon + httpx)
pip install vgi-rpc[s3] # S3 storage backend
pip install vgi-rpc[gcs] # Google Cloud Storage backend
pip install vgi-rpc[cli] # CLI tool (typer + httpx)
pip install vgi-rpc[external] # External storage fetch (aiohttp + zstandard)
pip install vgi-rpc[otel] # OpenTelemetry instrumentation
Requires Python 3.13+.
Quick Start¶
Define a service as a Protocol, implement it, and call methods through a typed proxy:
from typing import Protocol
from vgi_rpc import serve_pipe
class Calculator(Protocol):
"""A simple calculator service."""
def add(self, a: float, b: float) -> float:
"""Add two numbers."""
...
class CalculatorImpl:
"""Calculator implementation."""
def add(self, a: float, b: float) -> float:
"""Add two numbers."""
return a + b
with serve_pipe(Calculator, CalculatorImpl()) as proxy:
print(proxy.add(a=2.0, b=3.0)) # 5.0
See the Examples page for streaming, HTTP transport, authentication, and more.
Limitations¶
vgi-rpc is designed for RPC with structured, tabular data. Some things it deliberately does not do:
- No full-duplex streaming — the exchange pattern is lockstep (one request, one response, repeat), not concurrent bidirectional like gRPC.
- No client streaming — the client cannot push a stream of batches to the server independently. Use exchange for bidirectional workflows.
- Columnar data model — all data crosses the wire as Arrow
RecordBatchobjects. Scalar values are wrapped in single-row batches. If your payloads are small heterogeneous messages, a row-oriented format (protobuf, JSON) may be more natural. - No service mesh integration — no built-in load balancing, circuit breaking, or service discovery. The HTTP transport is a standard WSGI app, so you can put it behind any reverse proxy.
- No async server — the server is synchronous. Streaming methods run in a blocking loop. This keeps the implementation simple but limits concurrency to one request at a time per connection (HTTP transport handles concurrency at the WSGI layer).
Next Steps¶
- Browse the API Reference for detailed documentation
- Explore available Transports — pipes, subprocess, Unix sockets, shared memory, HTTP
- Check out the Examples for runnable scripts
- Review Benchmarks for transport performance comparisons
- Read Hosting for production deployment guidance
- Compare with gRPC and alternatives
- See the Contributing guide to get involved