-----------------------------------------------------------------------------------------------------------------
Technical Description
-----------------------------------------------------------------------------------------------------------------
At MBR offset 0x1C3 we find the CHS address of the last absolute sector in the partition. For partitions which
begin or end beyond the 1024th cylinder, the three CHS bytes should always be filled with: FE FF FF. For smaller
partitions these values should have the correct calculated bytes and NOT FE FF FF. But the Gigabyte GA-P35-DS3
Intel ICH9 AHCI Controller has always problems with the real values, at least if we also change the partitions
start offset to align it for an SSD for example and do not use the standard partition start offset of 63 sectors.
In addition the Windows Server 2003 setup has also a bug for computing the CHS address of the last absolute
sector in the partition. This seems to be calculated from the number of sectors in partition at MBR offset
0x1CA. But the setup does not use a drive geometry of 255 heads / 63 sectors. Instead Microsoft uses 240 heads /
63 sectors as geometry. Therefore if you use diskpar to create a partition the CHS and LBA values are based
on 255 heads and 63 sectors. This way the partitions LBA size and the CHS values do not match for Windows
2003 setup and it corrects the CHS value at offset 0x1C3 to fit the 240 heads / 63 sectors geometry. This
bug is also confirmed by MS and they have a patch for it at the following URLs:
http://support.microsoft.com/kb/931761/en-ushttp://support.microsoft.com/kb/931760/en-usI tried these two fixes and recognized that the CHS is now based on 255 heads and 63 sectors. But there is
still one problem left. If we cross the 1024 cylinder boundary the CHS is NOT changed to FE FF FF. Instead
the cylinder is set correctly to 1023, but the head and sectors are still calulated based on the partitions
size in sectors, which simply is wrong in my opinion. So if the setup CD would do it right we would only run
into the problem for partitions that are smaller than 8,4 GB, because the max addressable CHS value is
8.422.686.720 bytes. For bigger partitions there should be no problem. We see that XP SP3 does it right so
there is indeed still some bug left.
But not Microsoft alone can be blamed for the problems. After using the patch from MS knowledgebase, the
CHS is correct for partition sizes smaller than 8.4 GB. So I tested the complete 2003 installation on a
3 GB partition and recognized that the same problem shows again. We only see the message "Error loading
operating system" after the 1st restart of 2003 setup. I double checked the CHS values and they are
completely correct and based on 255 heads and 63 sectors drive geometry. If we correct these CHS values
again to FE FF FF, which is wrong according to the specs, the AHCI Controller will boot the HDD. The Problem
is that the Windows XP / 2003 Server MBR calls the BIOS function int13h ah=08h GET CURRENT DRIVE PARAMETERS
to get the disks geometry. But in AHCI mode this function call returns CHS head and sector values based on
offset 0x1C3 of the MBR. So the CHS last sector in partition values are taken to return the disk geometry
on executing the MBR and this results in a wrong offset for the NTFS volume boot record, which causes errors
like "Error loading operating system". As it seems the AHCI Controller uses the CHS value for the last sector
in the partition for calculating the disk geometry and returning this geometry for int13h ah=08h function call.
During tests I even saw that the disk geometries returned in AHCI and IDE Mode for the same HDD are different.
Another strange thing is that the AHCI Controller device detection can hang caused by these 3 CHS bytes
in the MBR. To be honest I used some uncommon values in the Simulation of Error Scenario 1, but if you
partition the disk several times and the drive geometry gets changed by diskpar, diskpart and
Windows Setup CD, it is possible that you end with a controller hang, because the CHS values get even
more corrupted from time to time. Not to mention if you try to edit the partition's LBA size manually in
the MBR without changing the CHS last sector in partition. I found out that the CHS values can cause a
AHCI Controller hang caused by an integer overflow at a division inside the firmware. Read on for the
firmware analysis of AHCI.bin of F14 BIOS.
-----------------------------------------------------------------------------------------------------------------
Firmware Analysis of AHCI.bin
-----------------------------------------------------------------------------------------------------------------
To analyze the firmware and find the bug I did the following steps:
- downloaded newest Gigabyte GA-P35-DS3 BIOS labeled P35DS3.F14
- download cbrom182
- attention the original BIOS file gets changed by the following actions, so you should have a copy of it
- to display all combined ROMs inside the BIOS file type:
cbrom182 P35DS3.F14 /d
- we see the following list, what we are interested in is the entry number 8:
No. Item-Name Original-Size Compressed-Size Original-File-Name
================================================================================
0. System BIOS 20000h(128.00K) 151ABh(84.42K) p35ds3.BIN
1. XGROUP CODE 0EBC0h(58.94K) 0A4F1h(41.24K) awardext.rom
2. ACPI table 04D12h(19.27K) 01909h(6.26K) ACPITBL.BIN
3. EPA LOGO 0168Ch(5.64K) 0030Dh(0.76K) AwardBmp.bmp
4. GROUP ROM[18] 02E70h(11.61K) 02003h(8.00K) ggroup.bin
5. GROUP ROM[20] 00E20h(3.53K) 00B33h(2.80K) ffgroup.bin
6. YGROUP ROM 0C100h(48.25K) 06708h(25.76K) awardeyt.rom
7. GROUP ROM[ 0] 08360h(32.84K) 02D9Eh(11.40K) _EN_CODE.BIN
8. PCI ROM[A] 04000h(16.00K) 02B46h(10.82K) AHCI.BIN
9. PCI ROM[B] 07A00h(30.50K) 04479h(17.12K) JMB59.BIN
10. MINIT 0CBC0h(50.94K) 0CBF4h(50.99K) MEMINIT.BIN
11. PCI ROM[C] 0C800h(50.00K) 079FDh(30.50K) rtegrom.lom
12. LOGO BitMap 4B30Ch(300.76K) 05CE3h(23.22K) ds3.bmp
13. LOGO1 ROM 00B64h(2.85K) 00520h(1.28K) dbios.bmp
14. GV3 022ADh(8.67K) 00BD6h(2.96K) PPMINIT.ROM
15. OEM0 CODE 028ABh(10.17K) 01E1Bh(7.53K) SBF.BIN
16. OEM2 CODE 01000h(4.00K) 00092h(0.14K) AFSC_HDR.ROM
- extract AHCI.BIN which is number 8 in the list above with the following cmds
cbrom182 P35DS3.F14 /PCI extract
cbrom182 V1.82 [04/11/07] (C)Phoenix Technologies 2001-2007
PCI ROM - - - A : AHCI.BIN
PCI ROM - - - B : JMB59.BIN
PCI ROM - - - C : rtegrom.lom
Enter a choice:a
Enter an extract file Name :(AHCI.BIN) [PCI-A] ROM is extracted to AHCI.BIN
- now we should have AHCI.BIN and the file FILE_BUF.BIN
- FILE_BUF.BIN can be deleted, it is only the compressed AHCI.BIN
- disassemble AHCI.BIN in IDA as 16 bit code
- go to address seg000:3670
- we see the following code, as example I used the adapter hang causing values 01 08 00:
seg000:3670 push es
seg000:3671 les bx, cs:dword_3B95 ; cs:dword_3B95 points to the buffer for the Master Boot Record
seg000:3676 mov cx, 1 ; ch = low eight bits of cylinder number = 0
seg000:3676 ; cl = sector number 1-63 (bits 0-5) = 1, high two bits of cylinder (bits 6-7, hard disk only)
seg000:3679 mov dh, 0 ; dh = head number = 0
seg000:367B mov dl, gs:[bp+0] ; dh = drive number (bit 7 set for hard disk)
seg000:367F mov ax, 201h ; int13 read 1 sector at offset 0, Master Boot Record
seg000:3682 call int13_sub_18BC
seg000:3685 jb short loc_36DE ; cf set on error, jump on error
seg000:3687 cmp word ptr es:[bx+1FEh], 0AA55h ; es:bx data buffer, check MBR end signature 0x55AA
seg000:368E jnz short loc_36DE ; jump if no signature was found
seg000:3690 add bx, 1BEh ; go to 1st partition table start offset 1BEh in MBR
seg000:3694 mov cx, 4 ; do this for all 4 partition tables
seg000:3697
seg000:3697 loc_3697: ; CODE XREF: seg000:36ADj
seg000:3697 mov ax, es:[bx+5] ; CHS address of last partition in sector, head = 1, sector = 8
seg000:369B inc al ; increment head count, head = 2 after this
seg000:369D and ah, 3Fh ; get only bits 0 to 5 for the sector, 6 and 7 are the upper cylinder bits
seg000:36A0 push ax ; save value 0802h, head = 2, sector = 8
seg000:36A1 mul ah ; ax = al * ah = heads * sectors = 10h
seg000:36A3 mov bp, ax ; save result 10h in bp
seg000:36A5 or ax, ax ; check if ax == 0, this means we have sectors or heads equal to zero in the MBR
seg000:36A7 pop ax ; restore ax from stack, after this ax = 0802h, head = 2, sector = 8
seg000:36A8 jnz short loc_36B1 ; we jump here, because ax != 0
seg000:36AA add bx, 10h ; go to next partition table entry
seg000:36AD loop loc_3697
seg000:36AF jmp short loc_36DE
seg000:36B1 ; ---------------------------------------------------------------------------
seg000:36B1
seg000:36B1 loc_36B1: ; CODE XREF: seg000:36A8j
seg000:36B1 push ax ; save value 0802h, head = 2, sector = 8
seg000:36B2 mov cx, gs:[si] ; gs:[si] = cylinders from geometry of disk, 1024 cylinders
seg000:36B5 mov ah, gs:[si+3] ; gs:[si+3] = sectors from geometry of disk, 63 sectors
seg000:36B9 mov al, gs:[si+2] ; gs:[si+2] = heads from geometry of disk, 255 heads
seg000:36BD mul ah ; ax = al * ah = 255 heads * 63 sectors = 16065 (3EC1h)
seg000:36BF mul cx ; dx:ax = ax * cx = 3EC1h * 400h (16065 * 1024) = FB0400h (16450560)
seg000:36C1 div bp ; dx:ax / bp = FB0400h / 10h, ax = quotient, dx = remainder, integer overflow in ax
- if we enter the code inside masm with the values heads = 1 and sectors = 8 we see that at address seg000:36C1 an integer
overflow in ax happens which results in an exception inside the firmware