Skip to content

7z Archive Format Specification

This section contains the comprehensive specification for the 7z archive format as implemented by zesven.

Purpose

This specification provides a precise, unambiguous definition of the 7z archive format suitable for:

  • Implementing compliant readers and writers
  • Validating archive correctness
  • Understanding format internals
  • Ensuring interoperability with 7-Zip and other implementations

Scope

In scope:

  • Binary format structure and layout
  • Data encoding schemes
  • Compression and filter method interfaces
  • Encryption scheme
  • File metadata representation
  • Error conditions and handling

Out of scope:

  • Compression algorithm internals (LZMA, LZMA2, etc.)
  • Cryptographic primitive implementations
  • Platform-specific extraction behaviors
  • User interface considerations

Document Conventions

Requirement Keywords

This specification uses requirement level keywords as defined in RFC 2119:

KeywordMeaning
MUSTAbsolute requirement
MUST NOTAbsolute prohibition
SHOULDRecommended but not required
SHOULD NOTNot recommended but not prohibited
MAYOptional

Byte Order

All multi-byte integers are little-endian unless explicitly stated otherwise.

Binary Notation

  • Hexadecimal bytes: 0x7A or 7A
  • Byte sequences: 37 7A BC AF 27 1C
  • Binary literals: 0b10110100
  • Bit ranges: bits 0-3 (inclusive, 0 is LSB)

Range Notation

All numeric ranges in this specification are inclusive on both ends unless otherwise noted:

  • 0-255 means values from 0 to 255 inclusive (256 total values)
  • 0x00-0x04 means values from 0x00 to 0x04 inclusive (5 total values)

For loop bounds in pseudocode:

  • for i in 0..N means i takes values 0, 1, 2, ..., N-1 (exclusive upper bound, Python/Rust style)
  • for i in 0 to N - 1 means the same range explicitly

Size Units

  • All sizes are in bytes unless otherwise specified
  • 1 KiB = 1024 bytes
  • 1 MiB = 1024 KiB
  • 1 GiB = 1024 MiB

Grammar Notation

Structure definitions use a BNF-like notation:

Structure ::= Field1 Field2 [OptionalField] Field3*
Field1 ::= BYTE
Field2 ::= NUMBER
OptionalField ::= 0x01 Data
Field3 ::= UINT32

Where:

  • [...] denotes optional elements
  • * denotes zero or more repetitions
  • + denotes one or more repetitions
  • | denotes alternatives

Document Map

Foundation (00-04)

DocumentDescription
PhilosophyDesign principles, goals, invariants
GlossaryCanonical terminology definitions
Archive StructureHigh-level archive layout
Signature Header32-byte start header format
Data EncodingNUMBER, BitField, and type encodings

Header Format (05-09)

DocumentDescription
Header StructureMain and encoded header organization
Pack InfoCompressed stream information
Unpack InfoFolder and coder definitions
Substreams InfoPer-file data within folders
Files InfoFile metadata and properties

Codecs and Filters (10-12)

DocumentDescription
Compression MethodsCompression algorithm interfaces
FiltersBCJ, Delta, and other preprocessors
EncryptionAES-256-SHA256 encryption scheme

Special Features (13-15)

DocumentDescription
Solid ArchivesSolid block compression
Multi-VolumeSplit archive handling
SFX ArchivesSelf-extracting archives

Metadata and Safety (16-18)

DocumentDescription
Timestamps & AttributesTime and attribute formats
SecuritySafety constraints and limits
Error ConditionsError handling requirements

Reference Appendices

DocumentDescription
A: Property IDsComplete property ID table
B: Method IDsComplete compression method ID table
C: CRC AlgorithmCRC-32 specification
D: CompatibilityInteroperability notes

Reading Order

For implementers new to the 7z format:

  1. Start with Philosophy for context
  2. Read Glossary to understand terminology
  3. Continue with Archive Structure for the big picture
  4. Then proceed sequentially through the remaining documents

For quick reference:

Version History

VersionDateChanges
1.0.02025-01Initial specification

Acknowledgments

This specification is derived from:

  • Official 7-Zip source code by Igor Pavlov
  • py7zr documentation by Hiroshi Miura
  • Analysis of multiple open-source implementations

License

This specification is part of the zesven project and is licensed under MIT OR Apache-2.0.

Released under MIT OR Apache-2.0 License