Objective: Eliminate everything that isn't necessary to get the data through without causing loss, and to maintain the simplest presentation possible with the bare minimum of variations to minimise the potential for incomplete or poorly-tested implementations. With the exception of the vorbiscomment frame, all data are in big-endian format. Packet 0, BOS, 14 bytes only 8 0x80 Stream Header Packet ID 56 "STFU000" Codec identifier (negotiable) - 32 [uint] Number of sample intervals per second 6 [uint] Number of significant bits per channel per sample minus one 1 [bool] Samples are in a floating-point format if 1, or int if 0 1 '0' padding only 8 [uint] Number of channels minus one The number of significant bits can be any value from 1 to 64. If you want more than that then there's something wrong with you. If you want fewer than that then there's also something wrong with you. The number of bits stored for each sample will always be extended to the next power of two. Zero padding can not be enforced and so it is not mandatory. Readers should be aware of this if it is likely to cause them numerical misfortune. Note that this padding policy implies that several samples can be encoded in a single byte at precisions less than or equal to four, and that a precision of 24 will be padded to four bytes. Live with it. The number of channels can be any value from one to 256. If you want more channels you'll have to open a new stream for them. A compulsory vorbiscomment packet follows, formatted identically to the Theora comment header. This may contain replaygain and channel mapping information as is, or would be, defined for Vorbis content. No other header packets may exist in a valid stream. The third frame is always the beginning of the raw data. There is no room for future expansion. None at all. Samples are interleaved in the conventional .WAV way... and they're big- fucking-endian. Don't forget that. Integer sample case: All sample types are signed, except in the single-bit case where sign is undefined. [Should this be extended to declare that all precisions <= 4 should be unsigned and the 0V level be undefined?] Floating-point sample case: Precisions from 33 to 64 are IEEE 754 double-precision floating point. Part of the mantissa may be zeroed to reflect the precision accurately, but it is not mandatory and such cases are not anticipated. Precisions from 17 to 32 are IEEE 754 single-precision floating point. Part of the mantissa may be zeroed to reflect the precision accurately, but it is not mandatory and such cases are not anticipated. Precisions from 9 to 16 are IEEE 754 style floating point, but with a 1:5:10 distribution. Part of the mantissa may be zeroed to reflect the precision accurately, but it is not mandatory and such cases are not anticipated. A precision of 8 specifies A-Law encoding. A precision of 7 specifies mu-Law encoding (discouraged). Precisions less than 7 are undefined and should not be used.