How does wireshark annotate some packets with "tcp segment of a reassembled pdu"?

Briefly, Wireshark marks TCP packets with "TCP segment of a reassembled PDU" when they contain payload that is part of a longer application message or document that is completed in a later packet.

A fuller explanation than this somewhat terse answer requires a little bit of a deep-dive into the operation of TCP. Distributed applications communicate by exchanging messages and documents and, at the system level, typically do so using BSD-style sockets connected by TCP. The API that TCP sockets offer is agnostic to, and completely ignores, the structure of the application communication, and offers a completely generic model of communication as a simple continuous stream of bytes.

TCP needs to segment the data to encapsulate it in IP packets for transport across the network, but it does so in a manner completely decoupled from the application message and document boundaries. TCP will generally try to send application data as soon as it is written to the socket, however:

  • it may need to buffer data while waiting for acknowledgements from the receiver, resulting in several application messages being sent in one network packet;
  • the application may send a document larger than can fit into a single packet.

That is, the byte-stream model that the TCP socket API offers means that there can be both one-to-many and many-to-one relationships between messages and packets.

Wireshark provides a view of network packet captures that operates at both levels: it primarily shows individual packets, but is also equipped with plugins that enable it to parse application messages exchanged in those packets. Although plugins can access packets directly, they generally rely on the TCP reassembly that Wireshark implements: for each direction of a TCP connection, it takes the payload from all the packets, orders it by sequence number, and concatenates it to reconstruct the byte-stream. As each new segment of data is appended, the stream is offered to the dissector, which looks for application messages and documents contained in it. If the original TCP stack split a message or document across more than one packet, the dissector will have to wait until Wireshark processes the last packet and hands it the complete application payload before it can complete the dissection and display the content.

In particular, the dissector will not display anything in response to the initial packets that transport the as-yet incomplete payload. To flag this situation to the user, Wireshark marks each of those packets with "TCP segment of a reassembled PDU", where

  • "segment" is the TCP terminology for a chunk of payload, prepended with the matching TCP header. (In practice this is synonymous with "packet", although technically it is a distinct entity. For example, it is possible for a large TCP segment to get fragmented into multiple IP packets, although TCP tries hard to avoid this.)
  • "PDU" is an acronym for "Protocol Data Unit" - in this context, it means an application message or document as dissected by a Wireshark plugin.

Once the final packet arrives and the full payload is available for dissection, Wireshark will display the full dissection on that final packet, as well as show the raw payload bytes that arrived in all the constituent packets.

So, in summary, Wireshark marks TCP packets with "TCP segment of a reassembled PDU" when they contain payload that is part of a longer application message or document that is completed in a later packet.