vgi-rpc¶

Transport-agnostic RPC framework built on Apache Arrow IPC serialization.

Define RPC interfaces as Python Protocol classes. The framework derives Arrow schemas from type annotations and provides typed client proxies with automatic serialization/deserialization.

Key Features¶

Protocol-based interfaces — define services as typed Python Protocol classes; proxies preserve the Protocol type for full IDE autocompletion
Apache Arrow IPC wire format — zero-copy serialization for structured data using PyArrow
Two method types — unary and streaming (producer and exchange patterns)
Transport-agnostic — in-process pipes, subprocess, Unix domain sockets, shared memory, or HTTP — see Transports
Automatic schema inference — Python type annotations map to Arrow types
Pluggable authentication — AuthContext + middleware for HTTP auth (JWT, API key, etc.)
Runtime introspection — opt-in __describe__ RPC method for dynamic service discovery
CLI tool — vgi-rpc describe and vgi-rpc call for ad-hoc service interaction
Shared memory transport — zero-copy batch transfer between co-located processes — see Transports
IPC validation — configurable batch validation levels for untrusted data
Large batch support — transparent externalization to S3/GCS for oversized data
Per-call I/O statistics — CallStatistics tracks batches, rows, and bytes for usage accounting (access log + OTel spans)
Wire protocol debug logging — enable vgi_rpc.wire at DEBUG for full wire-level visibility — see Logging

Two Method Types¶

vgi-rpc supports two RPC patterns. The method's return type in the Protocol determines which one is used:

Unary¶

A single request produces a single response — like a function call across a process boundary. The client sends parameters, the server returns a result.

Use unary for: lookups, computations, CRUD operations — anything that returns one value.

Streaming¶

A single request opens an ongoing session that produces multiple batches of data. The return type Stream[S] signals a streaming method, where S is a StreamState subclass that holds state between iterations.

There are two streaming patterns:

Producer — the server pushes data to the client (like a generator). The client iterates, the server emits batches until calling out.finish():

Exchange — lockstep bidirectional streaming. The client sends data, the server responds, one round at a time:

Use streaming for: pagination, progress reporting, incremental computation, or any workflow where data is produced or exchanged over time.

Installation¶

pip install vgi-rpc

Optional extras:

pip install vgi-rpc[http]       # HTTP transport (Falcon + httpx)
pip install vgi-rpc[s3]         # S3 storage backend
pip install vgi-rpc[gcs]        # Google Cloud Storage backend
pip install vgi-rpc[cli]        # CLI tool (typer + httpx)
pip install vgi-rpc[external]   # External storage fetch (aiohttp + zstandard)
pip install vgi-rpc[otel]       # OpenTelemetry instrumentation

Requires Python 3.13+.

Quick Start¶

Define a service as a Protocol, implement it, and call methods through a typed proxy:

from typing import Protocol

from vgi_rpc import serve_pipe


class Calculator(Protocol):
    """A simple calculator service."""

    def add(self, a: float, b: float) -> float:
        """Add two numbers."""
        ...


class CalculatorImpl:
    """Calculator implementation."""

    def add(self, a: float, b: float) -> float:
        """Add two numbers."""
        return a + b


with serve_pipe(Calculator, CalculatorImpl()) as proxy:
    print(proxy.add(a=2.0, b=3.0))  # 5.0

See the Examples page for streaming, HTTP transport, authentication, and more.

Limitations¶

vgi-rpc is designed for RPC with structured, tabular data. Some things it deliberately does not do:

No full-duplex streaming — the exchange pattern is lockstep (one request, one response, repeat), not concurrent bidirectional like gRPC.
No client streaming — the client cannot push a stream of batches to the server independently. Use exchange for bidirectional workflows.
Columnar data model — all data crosses the wire as Arrow RecordBatch objects. Scalar values are wrapped in single-row batches. If your payloads are small heterogeneous messages, a row-oriented format (protobuf, JSON) may be more natural.
No service mesh integration — no built-in load balancing, circuit breaking, or service discovery. The HTTP transport is a standard WSGI app, so you can put it behind any reverse proxy.
No async server — the server is synchronous. Streaming methods run in a blocking loop. This keeps the implementation simple but limits concurrency to one request at a time per connection (HTTP transport handles concurrency at the WSGI layer).

Next Steps¶

Browse the API Reference for detailed documentation
Explore available Transports — pipes, subprocess, Unix sockets, shared memory, HTTP
Check out the Examples for runnable scripts
Review Benchmarks for transport performance comparisons
Read Hosting for production deployment guidance
Compare with gRPC and alternatives
See the Contributing guide to get involved

🚜 Query.Farm