NTFS File System Internals
Zoran Mihail Iliev
Definition of a Sector
• Basic unit of storage and transfer to and from the device.
• Regardless of the type of the storage device ALL TRANSFERS MUST BE A MULTIPLE OF SECTOR SIZE !
Sector Size
• From 1956 until December 2009
1 Sector = 512 bytes+ (50bytes for ECC) • December 2009 – Advanced Format
Advanced Format and 512e
• 512e is a emulation mechanism. • Provided by the disk itself.
• Used by legacy hosts that can not work (doesn't know) with disk with 4096 bytes.
• The disk controller will translate a logical block number into correct physical sector.
HOST
7 6 5 4 3 2 1 0 4096 bytes PHYSICAL SECTOR
LOGICAL BLOCK
NTFS
• To obtain the NTFS version of a volume, sector size…
- open a command prompt with Administrator access rights and execute the following command:
NTFS and SSD
• T10 Trim delete notification
• In Windows 7 and Windows Server 2008 R2, for storage devices that
support T10 Trim, NTFS now sends a delete notification to the device when files are deleted.
• If a device supports T10 Trim as defined in the ATA protocol’s Data Set
Management command, NTFS sends the notification when files are deleted and it is safe to erase the storage that backs up those files.
NTFS and SSD
• NTFS will send down delete notification to the device supporting “trim”
• File system operations: Format, Delete, Truncate, Compression • OS internal processes: e.g., Snapshot, Volume Manager
• Three optimization opportunities for the device
• Enhancing device wear leveling by eliminating merge operation for all deleted data blocks
• Making early garbage collection possible for fast write
• Keeping device’s unused storage area as much as possible; more room for device wear leveling.
NTFS and SSD
• The alignment of NTFS partition to SSD geometry is important for
SSD performance in case of Windows XP and Windows XP upgrade to Windows Vista and Windows 7
• The first Windows XP partition starts at sector #63; the middle of a SSD page • Misaligned partition can degrade device’s performance down to 50% caused
NTFS
Support for file system metadata defragmenting
• Prior to Windows 7 and Windows Server 2008 R2, certain file system metadata associated with user data files (for example, reparse point or Encrypting File System (EFS) data) could not be defragmented.
• Enhancements to the defragment engine enable certain file system metadata to be defragmented.
• This change helps improve the performance of files with many reparse points and resident files.
• It can also help enable Volume Shrink to reclaim more space than was previously possible.
NTFS Check DIsk
Chkdsk performance improvements
• In Windows Server 2008 R2, enhancements to the command-line tool
Chkdsk increase the availability of volumes by reducing the amount of
time it takes to perform a Chkdsk run.
• Chkdsk scales with the amount of available RAM in the system. Running Chkdsk on a server running Windows Server 2008 R2 is
NTFS Registry settings
Setting name Location
Previous default value
Definition of a Partition
• Partition(s) are collections of contiguous sectors on a disk.
• A partition table or other disk management structure stores a partitions starting sector, size, and some other attributes.
Partitioning Schemes
• MBR • GPT
MBR
Disk
Signature
MBR Disk Signature and Windows Registry
Disk
Signature
(1B8, 4bytes)
MBR Disk Signature and Windows Registry
Disk
Signature
MBR Disk Signature and Windows Registry
Disk
Signature
(1B8, 4bytes) The last 8 bytes of the signature stored in the registry key point to the byte offset of the start of the partition itself. (after you convert the bytes into the sector
address).
MBR Disk Signature and Windows Registry
Disk
Signature
MBR Disk Signature and Windows Registry
Information about the various multifunction adapters (ISA, PNP, ACPI, etc.) and the devices on them that are detected by ntdetect.com, is stored in:
HKLM\HARDWARE\DESCRIPTION\System\MultifunctionAdapter\#.
End Of Sector Signature
GPT Disk
• Designed by Intel
• Volumes up to 18 Exabyte's in size • Up to 128 partitions per disk
GUID PARTITION TABLE
• GUID PARTITION TABLE • Global
Protective MBR in GPT Disks
Disk Signature
GPT Partition Table Header
GPT Partition Table Header
Used to identify all
EFI-compatible GPT headers.
The value must always be 45 46 49 20 50 41 52 54.
GPT Partition Table Header
Must be 0.
GPT Partition Table Header
GPT Partition Table Header
The LBA address of the backup GPT header. This value is always equal to the last LBA on the disk.GPT Partition Table Header
The first usable LBA that can be contained in a GUID partition entry. In other words, the first partition begins at this LBA.
In the 64-bit versions of Windows Server 2003+, Vista, 7 this number is always LBA 34.
First Usable LBA (28h, 8)
GPT Partition Table Header
The last usable LBA that can be contained in a GUID
partition entry.
GPT Partition Table Header
A unique number that identifies the partition table header and the disk itself.
Disk GUID (38h, 16)
GPT Partition Table Header
The starting LBA of the GUID partition entry array. This number is always LBA 2.GPT Partition Table Header
The maximum number of partition entries that can be contained in the GUID partition entry array. In the 64-bit versions of Windows Server 2003+, Vista, 7 this number is equal to 128.Number of Partition Entries (50h, 4)
GPT Partition Table Header
The size, in bytes, of each partition entry in the GUID partition entry array. Each partition entry is 128 bytes..GPT Partition Table Header
Used to verify the integrity of the GUID partition entry array. The 32-bit CRC algorithm is used to perform this calculation.Partition Entry Array CRC32 (58h, 4)
GPT Partition Table Header
Must be 0..
Partition Type GUIDs
Partition Type GUIDs
(not unique for one OS )
Partition Type GUID Value
Unused entry {00000000–0000–0000–0000–000000000000} EFI System partition {28732ac11ff8d211ba4b00a0c93ec93b}
MBR partition scheme {41ee4d02e733d3119d690008c781f39f} BIOS Boot partition {4861682149646f6e744e656564454649} Data partition (Windows/Linux) {a2a0d0ebe5b9334487c068b6b72699c7} ZFS (Mac OS X) or
Partition Type Windows GUIDs
Partition Type GUID Value
Unused entry {00000000–0000–0000–0000–000000000000} Microsoft Reserved partition {16e3c9e35c0bb84d817df92df00215ae}
Primary partition on a basic disk {a2a0d0ebe5b9334487c068b6b72699c7} LDM Metadata partition on a
dynamic disk {aac808588f7ee04285d2e1e90434cfb3} LDM Data partition on a
dynamic disk {a0609baf3114624fbc683311714a69ad} Windows Recovery Environment
(Windows) {A4bb94ded106404da16abfd50179d6ac} IBM General Parallel File System
(GPFS) partition (Windows) {90fcaf377def964e91c32d7ae055b174}
http://technet.microsoft.com
Partition Type MAC GUIDs
Partition Type GUID Value
Hierarchical File System Plus (HFS+)
Partition Type Linux GUIDs
Partition Type GUID Value
Data partition (Linux) {a2a0d0ebe5b9334487c068b6b72699c7} RAID partition (Linux) {0f889da1fc053b4da006743f0f84911e} Swap partition (Linux) {6dfd5706aba4c44384e50933c84b4f4f} Logical Volume Manager (LVM) partition
(Linux) {79d3d6e607f5c244a23c238f2a3df928} Reserved (Linux) {3933a68d0700c060c436083ac8230908}
http://technet.microsoft.com
Partition Type GUIDs
The Microsoft Reserved Partition (MSR) reserves space on each disk drive for subsequent use by operating system software.
GPT disks do not allow hidden sectors.
Software components that formerly used hidden sectors now allocate portions of the MSR for component-specific partitions.
GUID Partition Entry Attributes Used by the
64-Bit Editions of Windows
Bits Description
Bit 0
Specifies that this partition is required for the platform to function. All original equipment manufacturer (OEM) partitions must have this bit set to protect the OEM partition from being overwritten by the disk tools supplied with Windows Server 2003+, Vista, 7.
Bit 60 Marks the partition as read-only. Used only for primary basic partitions of type {EBD0A0A2-B9E5-4433-87C0-68B6B72699C7}.
Bit 62 Marks the partition as hidden. Used only for primary basic
partitions of type {EBD0A0A2-B9E5-4433-87C0-68B6B72699C7}.
Bit 63
Prevents the system from assigning a default drive letter to the partition. Used only for primary basic partitions of type
{EBD0A0A2-B9E5-4433-87C0-68B6B72699C7}.
Boot Sectors on GPT Disks
• Boot sectors on GPT disks are similar to boot sectors on MBR disks, except that EFI ignores all x86 code in the boot sector.
NTFS
NTFS BOOT SECTOR
(00h, 3)
Jump
NTFS BOOT SECTOR
(24h, 48)
NTFS BOOT SECTOR
(54h, 426)
Bootstrap Code
NTFS BOOT SECTOR
(1FEh, 2)
BPB and Extended BPB Fields on NTFS Volumes
(0B, 2)
Bytes Per Sector
BPB and Extended BPB Fields on NTFS Volumes
(0D, 1)
BPB and Extended BPB Fields on NTFS Volumes
(0Eh, 2)
Reserved Sectors.
BPB and Extended BPB Fields on NTFS Volumes
(28h, 8)
BPB and Extended BPB Fields on NTFS Volumes
(38h, 8)
BPB and Extended BPB Fields on NTFS Volumes
(41h, 3)
BPB and Extended BPB Fields on NTFS Volumes
(45h, 3)
HKLM\Software\Microsoft\WindowsNT\CurentVersion\ EMDMgmt (Volume Serial Number.)
32 Bit Key is stored in Decimal format.
BPB and Extended BPB Fields on NTFS Volumes
(50h, 4)
Not used by NTFS.
Bitlocker Volume boot Sector
MFT Zone
• To prevent the MFT becoming fragmented, Windows maintains a buffer around it.
• No new files will be created in this buffer region until the other disk space is used up.
• The buffer size is configurable.
• Each time the rest of the disk becomes full, the buffer size is halved.
MFT
NTFS Master File Table
• XP - NTFS assigns about 12% of the disk to the MFT Zone.
• Windows Vista and Windows Server 2008 use a default size of 200MB for the initial MFT zone reservation.
NTFS Master File Table
• Windows Vista and Windows Server 2008 use a default size of 200MB for the initial MFT zone reservation.
• As the MFT outgrows the default zone due to more files being added to the volume, the MFT will create another 200MB zone to grow
into.
NTFS Master File Table
• Windows 7 – Service pack 1 – Windows 10 use a default size of 256KB for the initial MFT zone reservation, 1024KB for 4096 bytes per
sectors disk.
• As the MFT outgrows the default zone due to more files being added to the volume, the MFT will create another 256KB zone to grow into. • This change to fixed amount versus a percentage was done to deal
NTFS Master File Table
• The MFT is a set of FILE records.
• Each file of the volume is completely described by one or more of these FILE Records.
• A FILE Record is built up from a header, several variable length attributes and an end marker (simply 0xFFFFFFFF).
MFT File Record Header
(00h, 4)
MFT File Record Header
(04h, 2) - Offset to the update sequence
(06h, 2) - Size in words of Update Sequence Number & Array 00 03 = 3 words = 6 bytes
MFT File Record Header
MFT File Record Header
(10h, 2)
Sequence Number - Use / Deletion count
This value is incremented each time that a file record segment is freed; it is 0 if the segment is not used. The SequenceNumber field of a file reference must match the contents of this field; if they do not match, the file reference is incorrect and probably obsolete.
MFT File Record Header
(12h, 2)
MFT File Record Header
(14h, 2)
Offset to First Attribute or
00 38h = 56 Size of the MFT File Record Header
MFT File Record Header
(16h, 2) – Flags
(16h, 2) – Flags
MFT File Record Header
MFT File Record Header
(1Ch, 4) – Allocated size of the FILE record
MFT File Record Header
MFT File Record Header
(28h, 2) – Next Attribute Id (“attributes counter”).
Every Attribute in every FILE Record has an Attribute Id. This Id is unique within the FILE Record and is used to maintain data integrity.
The Attribute Id that will be assigned to the next Attribute added to this MFT Record.
- Incremented each time it is used.
- Every time the MFT Record is reused this Id is set to zero. - The first instance number is always 0.
MFT File Record Header
MFT File Record Header
(2Ch, 4) – Number of this MFT Record
MFT File Record Header
(XXh, 2) - Update Sequence Number
16 bit sequence number that is incremented when the File Record entry is allocated.
MFT File Record Header
(XXh, 2) - Update Sequence Number
(XXh+2, YYh x 2 – 2) - Update Sequence Array
XX YY
MFT File Record Header
• Because a single sector could fail, it's important for NTFS to be able to detect errors in a cluster.
• For this purpose the sectors have Fixups, which are kept in an Update Sequence Array.
• Many important Metadata Records use fixups to protect data integrity : FILE Records in the $MFT
MFT File Record Header
• The header of each of these records contains a Update Sequence Number and a buffer.
• The last two bytes of each sector of the record are copied into the buffer and the Update Sequence Number is written in their place.
• When the record is read, the Update Sequence Number is read from the header and compared against the last two bytes of each sector.
MFT File Record Header
• If it succeeds, then it copies the bytes in the buffer back to their original places.
MFT File Record Header
Writing
Before writing a fixup-protected record:
• Add one to the Update Sequence Number (0x0000 must be skipped).
• For each sector, copy the last two bytes into the Update Sequence Array. • Write the new Update Sequence Number to the end of each sector.
MFT File Record Header
Reading
• When reading a fixup-protected record: • Read the record from disk.
• Check the magic number is correct. • Read the Update Sequence Number.
• Compare it against the last two bytes of every sector
• Copy the contents of the Update Sequence Array to the correct places.
MFT File Record Attribute
• What the users think of when they use the term “file” or “folder” in NTFS world is a set of various attributes which describe the file or directory.
• These attributes are stored in the MFT FILE record and each one can be clearly identified through the ID code of the attribute type.
MFT File Record Attribute
• Attributes in NTFS are differentiated as resident or non-resident. Resident means, the attributes reside within the FILE record itself. • In contrast, non-resident attributes store their data outside the MFT,
MFT File Record Attribute
• Resident attributes store their contents or data within the MFT
• Non-resident attributes store their contents or data outside the MFT but they reference their contents or data from within the respective MFT File Record Attribute.
MFT File Record Attribute
• The FLAG that annotates is the attribute resident or non resident is located at offset 08h from the beginning of the attribute and it is 1 byte in size.
• 0x00 resident, 0x01 non-resident.
ID Code
FLAG
NTFS $ files
• $AttrDef
• It contains a list of attributes
supported on the volume and their corresponding details
including residency, minimum and maximum size…
• Located at MFT entry 4.
Attribute Header
Attribute Header
(04h, 4) – Length of the Attribute (including header)
Attribute Header
(08h, 1) – Flag ( 0x01 non-resident, 0x00 resident )
Attribute Header
Attribute Header
(0Ah, 2) – Offset to the name (0, if not named)
Attribute Header
(0Eh, 2) – Flags
Attribute Header
(0Ch, 2) – Attribute ID
Attribute Header - Resident
Attribute Header - Resident
(14h, 2) – Offset to the attribute contents (from the start of the header)
Attribute Header - Resident
Attribute Header - Resident
(17h, 1) – Fill up (0x00)
Attribute Header – Non Resident
Attribute Header – Non Resident
(18h, 8) – Last VCN (Number of used clusters - 1)
Attribute Header – Non Resident
Attribute Header – Non Resident
(22h, 2) – Size of a compression unit
(uncompressed attributes: 0x0000)
Attribute Header – Non Resident
Attribute Header – Non Resident
(28h, 8) – Size of the attribute.
This value is an even multiple of the cluster size.
Attribute Header – Non Resident
Attribute Header – Non Resident
(38h, 8) – Initialized size of the data stream in BYTES. The sum of the allocated clusters.
Attribute Body – 0X10
$STANDARD_INFORMATION
• This attribute is present in every base file record and must be resident.
Attribute Body – 0X10
$STANDARD_INFORMATION
(00h, 8)
Attribute Body – 0X10
$STANDARD_INFORMATION
• When a file is created from scratch, all of the time stamps in this
attribute are set to current. These are the time stamps the user sees when he selects the properties of the file in Windows.
• When a file is copied, the last accessed time on the original file is
updated and the new file has updated last access time and creation time. • The MFT modified and file modified are set to the original values, which
Attribute Body – 0X10
$STANDARD_INFORMATION
(20h, 4) FLAGS
Attribute Body – 0X10
$STANDARD_INFORMATION
(30h, 4) Owner ID.
Owner ID of the user owning the file.
Attribute Body – 0X10
$STANDARD_INFORMATION
(34h, 4)
Security ID.
Attribute Body – 0X10
$STANDARD_INFORMATION
(38h, 8)
Quota Charged.
If zero, then quotas are disabled.
Attribute Body – 0X10
$STANDARD_INFORMATION
(40h, 8)
Update Sequence Number (USN).
Attribute Body – 0X20 $Attribute List
Standard Attribute Header.
Attribute Body – 0X20 $Attribute List
• A list of attributes that make up the file and the file reference of the MFT file record in which each
attribute is located.
• When there are lots of attributes and space in the MFT record is short, all those attributes that can be
made non-resident are moved out of the MFT.
• If there is still not enough room, then an $ATTRIBUTE_LIST attribute is needed.
Attribute Body – 0X20 $Attribute List
(04h, 2)
Record length.
Attribute Body – 0X20 $Attribute List
Attribute Body – 0X20 $Attribute List
(10h, 8) Base File
Attribute Body – 0X20 $Attribute List
(18h, 2)
Attribute ID.
Attribute Body – 0X20 $Attribute List
Attribute Type=80h
Attribute Body – 0X30 $File_Name
• It stores the name of the file attribute, both, short and long.
• This attribute is present in every base file record and is always resident. • It has a minimum size of 68 bytes and a maximum of 578 bytes.
Attribute Body – 0X30 $File_Name
(00h, 8)
File Reference to the base record of the parent directory.
Attribute Body – 0X30 $File_Name
(00h, 8)
MFT#=05 is the Parent Directory and its
Attribute Body – 0X30 $File_Name
(08h, 8)
Date and time of the file creation.
Attribute Body – 0X30 $File_Name
(10h, 8)
Attribute Body – 0X30 $File_Name
(18h, 8)
Date and time of the last FILE record
change.
Attribute Body – 0X30 $File_Name
(20h, 8)
Attribute Body – 0X30 $File_Name
(28h, 8)
Allocated file size in bytes.
Many files have this size set to zero.
Attribute Body – 0X30 $File_Name
(30h, 8)
Attribute Body – 0X30 $File_Name
(38h, 4) FLAGS
Flag Description 0x0001 Read-Only 0x0002 Hidden 0x0004 System 0x0020 Archive 0x0040 Device 0x0080 Normal 0x0100 Temporary 0x0200 Sparse File 0x0400 Reparse Point 0x0800 Compressed 0x1000 Offline
Attribute Body – 0X30 $File_Name
(3Ch, 4)
Used by extended
attributes and “reparse” .
Attribute Body – 0X30 $File_Name
(3Ch, 4)
If the file has EAs
Attribute Body – 0X30 $File_Name
(40h, 1)
Length of the filename in Unicode characters.
Attribute Body – 0X30 $File_Name
(42h, 2 x LFU) Filename in Unicode.
Attribute Body – 0X30 $File_Name
• When a Hard Linked file is deleted, its filename is removed from the MFT Record. When the last link is removed, then the file is really
deleted.
Attribute Body – 0X40
$Object_ID
• This attribute was introduced in Windows 2000.
• Every MFT Record is assigned a unique GUID. (A record may have a Birth Volume Id, a Birth Object Id and a Domain Id, all of which are GUIDs.)
• As defined in $AttrDef, this attribute has a no minimum size but a maximum of 256 bytes.
Attribute Body – 0X40
$Object_ID
• This is an attribute that holds an ID. This ID is used by the Distributed Link Tracking Service.
• An example of how it is used would be found in shortcuts. (Links to files on removable media are not maintained.)
Attribute Body – 0X40 $Object_ID
(00h, 16)
Unique Id assigned to file.
A record may have a
Birth Volume Id, a Birth Object Id and a Domain Id, all of which are GUIDs and all 16 bytes in size.
Attribute Body – 0X80
$Data
• This Attribute contains the file's data.
• Usually, a directory has no Data Attribute, and the Data Attribute of a file has no name.
• A file's size is the size of its unnamed Data Stream.
• As defined in $AttrDef, this attribute has a no minimum or maximum size.
•
Attribute Body – 0X80
$Data
• NTFS utilizes a structure called Data run to store the ATTRIBUTES CONTENT WHEN THE ATTRIBUTE IS NON RESIDENT.
• Data run is nothing else but a group of clusters.
• A file can have more than one data run and these can be fragmented or non-fragmented. Data runs can become very numerous depending on the level of fragmentation of the file.