End-to-end Upload
The upload path streams bytes from clients to durable storage while computing content hashes and building manifests.
Pipeline overview
Upload stages
- Session negotiation – a client opens a gRPC bidi stream or HTTP session. The server allocates a shard-backed session entity with spill buffers and MultiHasher state.
- Chunking –
graviton-streamschunkers split the byte stream using either fixed-size or content-defined boundaries. Chunks feed incremental hashers fromgraviton-core. - Storage layout – the runtime derives
BinaryKeyvalues for blocks, chunks, and manifests. Locator strategies map these keys to backend-specific paths. - Persistence – the
MutableObjectStoreimplementation uploads parts, respecting backend-specific minimum sizes (for example, 5 MiB for S3 multipart uploads). Range trackers record which spans succeeded. - Manifest emission – once all blocks are written, a manifest describing order, sizes, and attributes is persisted. The blob key is returned to the client alongside metadata captured in
BlobWriteResult. - Replication – optional background jobs use
ReplicaIndexandUnionFindutilities to drive additional copies or repairs.
Transducer-based ingest (next generation)
The Transducer algebra enables a cleaner expression of the same pipeline:
scala
// Compose typed stages
val ingest = countBytes >>> hashBytes >>> rechunk(blockSize) >>> blockKeyDeriver
// Run the pipeline — get a typed summary
val (summary, blocks) = byteStream.run(ingest.toSink)
summary.totalBytes // Long
summary.digestHex // String
summary.blockCount // Long
summary.blocksKeyed // LongEvery field in the summary is accessed by name (not index), and the composition merges Record states automatically via StateMerge. See the Pipeline Explorer to experiment with different stage combinations interactively.
See also
- Binary Streaming Guide — detailed walkthrough of blocks, manifests, and attributes
- Transducer Algebra — the composable pipeline engine
- Pipeline Explorer — interactive transducer visualization
- Chunking Strategies — FastCDC, fixed, and delimiter algorithms