Mutable Types in Wire

Dec 27 2024, Friday

Mutable Types in Wire

Block's Wire is a protocol buffer compiler and implementation that is Kotlin Multiplatform ready.
To understand the motivations around why Block (Square at the time) built Wire, their blog post announcing the release of Wire 1.0 is a great read.

One of the reasons Wire has been successful is because they paid a lot of attention to the ergonomics of generated code. Here is an excerpt from their blog post (emphasis mine).

Messages should be clean, developer-friendly data objects:

They should be highly readable

They should be deeply immutable

On Immutability

Immutability is an excellent trait to have for Messages. Here is a wonderful blog post by Roman Elizarov (was Kotlin's language lead) on designing immutable abstractions that we can afford.

Immutability's trade-offs

One of the main trade-offs Immutability makes is the following:

Potential Memory Overhead
Immutable types are by definition not-modifiable. Therefore, anytime you want to make a change; you end up having to re-create objects given the inability to modify existing ones. This adds to GC pressure. For performance sensitive code, this is not ideal.
Performance Impact for frequent Updates
If your application involves very frequent updates to large data structures, the overhead of creating new objects can quickly become noticeable.

Mutable Types

I recently contributed a change to Wire to allow marking a Message as mutable.
For extremely performance sensitive code, you too might find this option useful.

I highly recommend using these types with care. These types should always be behind an abstraction that hides the fact that the underlying type is mutable. This is because you lose all the nice properties of immutable Message types.

Learnings

I really enjoyed the discussion with Wire's maintainer's Jesse Wilson, Jake Wharton and Benoît Quenaudon. One interesting potential solution that we explored for the codegen was something Jesse called a decomposed encode(...).

Let's look at an example:

syntax = "proto2";

package squareup.wire.mutable;

message Header {
  optional uint64 id = 1;
  optional string name = 2;
}

message Payload {
  optional bytes content = 1;
}

message Packet {
  optional Header header = 1;
  optional Payload payload = 2;
}

The generated code for the message Packet looks something like (prior to my changes):

public class Packet(
  @field:WireField(
    tag = 1,
    adapter = "squareup.wire.mutable.Header#ADAPTER",
    declaredName = "header",
    schemaIndex = 0,
  )
  public val header_: Header? = null,
  @field:WireField(
    tag = 2,
    adapter = "squareup.wire.mutable.Payload#ADAPTER",
    schemaIndex = 1,
  )
  public val payload: Payload? = null,
  override val unknownFields: ByteString = ByteString.EMPTY,
) : Message<Packet, Nothing>(ADAPTER, unknownFields) {
  public companion object {
    @JvmField
    public val ADAPTER: ProtoAdapter<Packet> = object : ProtoAdapter<Packet>(
      FieldEncoding.LENGTH_DELIMITED, 
      Packet::class, 
      "type.googleapis.com/squareup.wire.mutable.Packet", 
      PROTO_2, 
      null, 
      "squareup/wire/mutable_types.proto"
    ) {
      // ...
      override fun encode(writer: ProtoWriter, `value`: Packet) {
        Header.ADAPTER.encodeWithTag(writer, 1, value.header_)
        Payload.ADAPTER.encodeWithTag(writer, 2, value.payload)
        writer.writeBytes(value.unknownFields)
      }
      override fun encode(writer: ReverseProtoWriter, `value`: Packet) {
        // ...
      }
      override fun decode(reader: ProtoReader): Packet {
        // ...
      }
      override fun redact(`value`: Packet): Packet = {
        // ...
      }
    }
    private const val serialVersionUID: Long = 0L
  }
}

In particular, pay special attention to the the implementation of encode(...) in Packet.ADAPTER.

If all we needed was a more efficient way of encoding a Packet without needing to create instances of the underlying type; then we could generate an overloaded encode(...) which had the following type signature:

public companion object {
  @JvmField
  public val ADAPTER: ProtoAdapter<Packet> = object : ProtoAdapter<Packet>(
    FieldEncoding.LENGTH_DELIMITED, 
    Packet::class, 
    "type.googleapis.com/squareup.wire.mutable.Packet", 
    PROTO_2, 
    null, 
    "squareup/wire/mutable_types.proto"
  ) {
    // ...

    /*
     * Notice that the `Packet` type got decomposed to its constituent
     * `Header` and `Payload` nested messages, which finally got
     * decomposed to their respective types.
     * 
     * We could essentially decompose every single `Message` type
     * until they only referred to the base types supported by Protocol Buffers.
     */
    override fun encode(
      writer: ProtoWriter,
      header_id: Long, // id from `Header
      header_name: String, // name from `Header`
      payload_content: ByteString, // content from `Payload`
    ) {
      // ...
    }
  }
}

In the above example, we essentially decomposed the Packet message to its constituent components Header and Payload which we then recursively decomposed to id, name and content respectively.

We would essentially decompose every single Message type until they only referred to the base types supported by Protocol Buffers. This would mean that the underlying type could still be immutable !.

The reason why we did not end up doing this was because, the numbers of parameters in the generated encode method would be excessively large for complex Message types. The API would also get confusing when you had overlapping types; For e.g.

message Packet {
  optional Header header_1 = 1;
  optional Header header_2 = 2;
  optional Payload payload = 3;
}

would generate an encode that would have

public companion object {
  @JvmField
  public val ADAPTER: ProtoAdapter<Packet> = object : ProtoAdapter<Packet>(
    FieldEncoding.LENGTH_DELIMITED, 
    Packet::class, 
    "type.googleapis.com/squareup.wire.mutable.Packet", 
    PROTO_2, 
    null, 
    "squareup/wire/mutable_types.proto"
  ) {
    // ...
    override fun encode(
      writer: ProtoWriter,
      header_1_id: Long, // id from `Header`
      header_1_name: String, // name from `Header`
      header_2_id: Long, // id from `Header
      header_2_name: String, // name from `Header`
      payload_content: ByteString, // content from `Payload`
    ) {
      // ...
    }
  }
}

The parameter names header_1_id, and header_2_id etc. start to get overwhelming and it makes it easy to make mistakes.

Instead, we picked:

// Code generated by Wire protocol buffer compiler, do not edit.
// Source: squareup.wire.mutable.Packet in squareup/wire/mutable_types.proto
public class MutablePacket(
  @field:WireField(
    tag = 1,
    adapter = "squareup.wire.mutable.MutableHeader#ADAPTER",
    declaredName = "header",
    schemaIndex = 0,
  )
  public var header_: MutableHeader? = null,
  @field:WireField(
    tag = 2,
    adapter = "squareup.wire.mutable.MutablePayload#ADAPTER",
    schemaIndex = 1,
  )
  public var payload: MutablePayload? = null,
  override var unknownFields: ByteString = ByteString.EMPTY,
) : Message<MutablePacket, Nothing>(ADAPTER, unknownFields) {
  // ...
  public companion object {
    @JvmField
    public val ADAPTER: ProtoAdapter<MutablePacket> = object : ProtoAdapter<MutablePacket>(
      FieldEncoding.LENGTH_DELIMITED, 
      MutablePacket::class, 
      "type.googleapis.com/squareup.wire.mutable.Packet", 
      PROTO_2, 
      null, 
      "squareup/wire/mutable_types.proto"
    ) {
      // ...
      override fun encode(writer: ProtoWriter, `value`: MutablePacket) {
        MutableHeader.ADAPTER.encodeWithTag(writer, 1, value.header_)
        MutablePayload.ADAPTER.encodeWithTag(writer, 2, value.payload)
        writer.writeBytes(value.unknownFields)
      }
      override fun decode(reader: ProtoReader): MutablePacket {
        var header_: MutableHeader? = null
        var payload: MutablePayload? = null
        val unknownFields = reader.forEachTag { tag ->
          when (tag) {
            1 -> header_ = MutableHeader.ADAPTER.decode(reader)
            2 -> payload = MutablePayload.ADAPTER.decode(reader)
            else -> reader.readUnknownField(tag)
          }
        }
        return MutablePacket(
          header_ = header_,
          payload = payload,
          unknownFields = unknownFields
        )
      }
      // ...
    }
    private const val serialVersionUID: Long = 0L
  }
}

We generate types prefixed with the word Mutable, and all the corresponding fields are declared as public var. This allows us to express our intent a lot more naturally with the caveat that the type is fully mutable and unsafe by default.

Epilogue

I really had a lot of fun implementing this feature. This was my first real foray into code generation and working with the Wire code base was a joy. Special thanks to Jesse Wilson, Jake Wharton and Benoît Quenaudon for their patience, help and support.