[This nearly-ancient text (along with others from Undocumented DOS and Undocumented Windows) is being presented as a case study in some methodologies of software reverse engineering, applied to mass-market software. Note that this chapter appeared in the 2nd edition of the book, not in the 1st edition.]
The previous chapter showed that it possible to discover a lot about a program without resorting to what is often called reverse engineering. Simply by examining a program's outward behavior, a utility such as INTRSPY shows, for example, that Windows uses the undocumented DOS Get SysVars function, and that Microsoft's QuickC makes the weird SetPSP(0) and SetPSP(-1) calls that are discussed in chapter 4.
But such external examination of a program's behavior can take us only so far. INTRSPY can't tell us why Windows calls Get SysVars — that is, which fields it uses in the SysVars data structure) — nor can INTRSPY tell us why QuickC passes the illegal values 0 and -1 to the DOS Set PSP function. To figure out why a program behaves in a certain way, you need to actually get inside the program. This requires disassembly.
Disassembly is particularly important to understanding what goes on inside MS-DOS itself. What does DOS actually do when a program calls the Get SysVars function, for example? How does DOS carry out an INT 21h AH=4Bh EXEC request? How do DOS 5.0 and 6.0 interact with Windows? To answer questions like these, there's no substitute for looking at the DOS code. Though Microsoft does produce a DOS OEM Adaptation Kit (OAK) that we discuss later in this chapter, source code to MS-DOS is not widely available. For those of us without the DOS source code, understanding DOS requires disassembling it.
The goal of this chapter is to acquire an understanding of DOS internals, that is, to get an intuitive feel for what goes on when a program makes an INT 21h DOS call. Chapter 2 briefly presented a disassembly of two DOS functions, INT 21h AH=0Eh (Set Default Drive; see listings 2-7 and 2-8) and INT 21h AH=19h (Get Default Drive). But how did we find the code for these functions in the place? A key purpose of this chapter is to present a close look at the key part of MS-DOS, the INT 21h handler, with its function dispatch table, which contains pointers to the code that handles each individual INT 21h function. Armed with this table, you can readily consult the code for any particular DOS function whose implementation interests you. You can apply the same technique to other pieces of code, such as DR DOS or the INT 21h hook in Novell NetWare's NETX.COM (see chapter 4).
The resident DOS code is found in two files, IO.SYS and MSDOS.SYS—sometimes named IBMBIO.COM and IBMDOS.COM. DOS 6.0 and higher also has DBLSPACE.BIN, which Microsoft usually considers a third member of the DOS kernel. While there are various ways to examine the code in these files on disk, this chapter instead examines the INT 21h handler in memory, using Microsoft's own DEBUG, a primitive though handy tool that comes with MS-DOS.
Part of the reason for using DEBUG, rather than a more sophisticated debugger or disassembly tool, is to underline the point that Microsoft itself provides the means for reverse engineering DOS. Since programmers frequently have questions about the legalities of disassembly, this chapter also briefly discusses the law surrounding reverse engineering and trade secrets.
Of course, there is more to DOS than just IO.SYS and MSDOS.SYS. We also look briefly at the disassembly of external programs such as COMMAND.COM, MSCDEX.EXE, and PRINT.COM, which is probably the most heavily disassembled DOS utility and the one on which many TSR writers figured out their craft.
Whether or not you disassemble DOS depends of course on what interests you. The examination of the INT 21h dispatch code in this chapter may provide all you ever wanted to know about how DOS functions internally. On the other hand, if you absolutely, positively must know exactly what is going on inside MS-DOS and you have the money to pay for this information, you may want to license Microsoft's DOS OEM Adaptation Kit, which includes assembly language and C source code for many parts of DOS, as well as .OBJ files with full symbolic information for those parts where direct source code is not provided. We take a quick look at the OAK contents later on.
MS-DOS is a bit like pornography. Everyone knows what it is when they see it, but almost no one can define it.
of all, MS-DOS is not the C> prompt. While that infamous user interface seems practically synonymous with MS-DOS, it is not actually a necessary part of DOS. The C> prompt is provided by COMMAND.COM, which (as chapter 10 explains in more detail) anyone can easily replace. As indicated by the shell= statement in CONFIG.SYS, COMMAND.COM is just a shell around the DOS kernel. Other shells, such as 4DOS or the MKS Korn shell, are widely available. Get rid of COMMAND.COM, and you still have MS-DOS.
From a programmer's perspective, MS-DOS seems like a collection of INT 21h functions. But this isn't quite accurate either. While the INT 21h functions are the most important service provided by DOS, DOS and INT 21h are not synonymous. Several application wrappers in chapter 2 (listings 2-20 and 2-21) already showed how easy it is for a normal program to fiddle with INT 21h calls before or after DOS itself gets them. That a piece of code handles INT 21h doesn't necessarily make it part of DOS.
So if DOS ain't necessarily the C> prompt or the INT 21h interface, what then is it? And where is it?
The "what" part is difficult to answer, except to note that DOS is in many ways what textbooks on operating systems call a microkernel. DOS provides a small bare minimum of services, on top of which other, more sophisticated, services can be built. Think of DOS as a software motherboard, into which the user is free to plug in various extensions. These extensions come not only from Microsoft but also from key third-party vendors such as Novell, Quarterdeck, Qualitas, Symantec, Central Point, and Phar Lap. DOS is the arena in which all these companies' products must both compete and work together.
Well, that was vague enough!
Mercifully, the "where" part at least is easy to answer. MS-DOS consists of two files, IO.SYS and MSDOS.SYS. In both IBM PC-DOS and Novell's DR DOS, these files are called IBMBIO.COM and IBMDOS.COM. Despite the .SYS file names, these are not device drivers, but binary images. In MS-DOS 6.0, there is a third file, DBLSPACE.BIN, which Microsoft generally considers a full-fledged third member of the DOS kernel—the SYS and FORMAT /S commands in DOS 6.0 copy DBLSPACE.BIN over to a floppy, along with IO.SYS and MSDOS.SYS. Take these two or three files, and you've got DOS. Of course, you'll also need a shell such as COMMAND.COM in order to get much work done.
Among other things, MSDOS.SYS contains the DOS dispatch function, which is DOS's handler for INT 21h calls. There are other DOS functions, such as INT 25h, 26h, and 2Fh, that MSDOS.SYS and IO.SYS handle as well.
IO.SYS consists of two parts, a loader (MSLOAD.COM) and BIOS support code (MSBIO.BIN); Microsoft creates IO.SYS by concatenating these two files:
copy /b msload.com+msbio.bin io.sys
IO.SYS is not "the BIOS," as books on DOS programming frequently claim, but merely the DOS interface to the BIOS. IO.SYS contains the standard device drivers such as CON, AUX, LPT1, and COM1 (see chapter 7). These device drivers are implemented using BIOS calls. For example, the CON driver built into IO.SYS (more precisely, MSBIO.BIN) makes INT 10h and INT 16h calls to the ROM BIOS video and keyboard routines.
The MSLOAD.COM portion of IO.SYS contains a famous set of routines called SYSINIT, which is responsible for the bootstrap loading of DOS.
We won't discuss SYSINIT here, as it has already been covered elsewhere (see "How MS-DOS Is Loaded" in chapter 2 of Ray Duncan's Advanced MS-DOS Programming, and "The Components of MS-DOS" in Duncan's MS-DOS Encyclopedia ). And practically every other book on DOS programming seems to repeat this same basic material on SYSINIT. Presumably this is not just because the bootstrap loading of DOS is an interesting subject, but also because Microsoft already documents SYSINIT in the DOS OAK. Geoff Chappell provides a far more original and useful description of DOS startup in his DOS Internals, chapters 1 ("The System Configuration"), 2 ("The System Footprint"), and 3 ("The Startup Sequence"). For example, Chappell is the author to make the connection between SYSINIT and the List of Lists structure (whose actual name in the DOS source code is SysInitVars).
So the DOS boot sequence is fairly well known. What hasn't been provided before, amazingly, is any description of what DOS looks like once it is up and running. This primarily requires a description of DOS's INT 21h handler and the INT 21h dispatch table. In other words, what code runs when you make an INT 21h call to DOS? Scores of DOS programming books of course describe what this or that DOS function call does, but few describe how any of these function calls work; and none to our knowledge "aside from a brief discussion of DOS stack switching in Microsoft's MS-DOS Encyclopedia (pp. 353-355)" describes the DOS function call mechanism itself. This seems far more important than providing yet another standard description of how DOS boots up or how SYSINIT moves segments around in memory.
One of our tech reviewers writes that "parts of the boot sequence are NOT well known! In DOS 6.0 and up, there's the mechanism that IO.SYS uses to load DBLSPACE.BIN. And in DOS 7.0 (Chicago), if CONFIG.SYS contains the setting DOS=ENHANCED, there is code in IO.SYS that loads DOS386.EXE, which is a big executable similar to WIN386.EXE."
The choice between describing SYSINIT or describing the INT 21h handler is an important one, because the portion of DOS which one is interested in looking at largely determines how one goes about disassembling DOS.
To look at DOS initialization, you either have to acquire the DOS OAK (which provides assembly language source code to IO.SYS, including the SYSINIT modules), or you have to disassemble the actual IO.SYS and MSDOS.SYS files on disk. These files are hidden system files, which however can be easily unhidden:
C:\UNDOC\CHAP6 > attrib -h -s \*.sys
IO.SYS is about 32K, and MSDOS.SYS is about 37K. Once unhidden, these two files can be disassembled, even with the u (unassemble) command in the primitive DEBUG utility that comes with DOS. After running ATTRIB to unhide MSDOS.SYS or IO.SYS, type DIR to find the file's size. DEBUG loads the file at address 100h, so add 100h to the file size (converted to hexadecimal) to yield the disassembly end-range. For example, if MSDOS.SYS is 37,506 (9282h) bytes:
C:\UNDOC2\CHAP6>type msdos.scr
u 0100 9382
q
C:\UNDOC2\CHAP6>debug \msdos.sys < msdos.scr > msdos.lst
The resulting MSDOS.LST is about one megabyte in size; if you use a disassembler such as Sourcer, the file is about 800K. In some ways, the output from such a straightforward disassembly of MSDOS.SYS looks quite useful. For example, you can quite plainly see DOS's INT 21h handler inspecting the caller's function number in AH. This is the DOS code called whenever a program generates an INT 21h:
6A76:040B FA CLI
6A76:040C 80FC6C CMP AH,6C ; is function > 6Ch?
6A76:040F 77D2 JA 03E3 ; yes: error
6A76:0411 80FC33 CMP AH,33
6A76:0414 7218 JB 042E
6A76:0416 74A2 JZ 03BA
6A76:0418 80FC64 CMP AH,64
; ... etc. ...
Likewise the MSDOS.SYS INT 2Fh handler is also visible. IO.SYS has its own INT 2Fh handler, and in the last line of the code fragment below, you can see the INT 2Fh handler in MSDOS.SYS jump to the one in IO.SYS, using a hard-wired address:
1C53:07B9 FB STI
1C53:07BA 80FC11 CMP AH,11
1C53:07BD 750A JNZ 07C9
;;; Go to 07BFh if an INT 2Fh call belonging to an external
;;; program such as a redirector, SHARE, or NLSFUNC, ends up
;;; in MSDOS.SYS. This means the external program isn't loaded.
1C53:07BF 0AC0 OR AL,AL ; is AL=0?
; ... error handling ...
1C53:07C9 80FC10 CMP AH,10 ; INT 2Fh AH=10h? (SHARE)
1C53:07CC 74F1 JZ 07BF ; got here, so SHARE not loaded
1C53:07CE 80FC14 CMP AH,14 ; INT 2Fh AH=14h? (NLSFUNC)
1C53:07D1 74EC JZ 07BF ; got here, so NLSFUNC not loaded
1C53:07D3 80FC12 CMP AH,12 ; INT 2Fh AH=12h?
1C53:07D6 7503 JNZ 07DB
1C53:07D8 E99701 JMP 0972 ; handle DOS internal functions
1C53:07DB 80FC16 CMP AH,16 ; INT 2Fh AH=16h? (Windows)
1C53:07DE 740D JZ 07ED ; might be Windows broadcast
1C53:07E0 80FC46 CMP AH,46 ; INT 2Fh AH=46h?
1C53:07E3 7503 JNZ 07E8
1C53:07E5 E93E01 JMP 0926
1C53:07E8 EA05007000 JMP 0070:0005 ; see if IO.SYS wants it
But while at this looks useful, after a few minutes it becomes clear that the quality of the unassembly is unfortunately quite poor. Much better versions of these INT 21h and INT 2Fh handlers are shown later in figures 6-7 and 6-13. For example, the most important part of the INT 21h handler uses the function number in AH as an index into a dispatch table:
;;; previously moved AH func number into BX
6A76:04FE 8B9FA73E MOV BX,[BX+3EA7]
6A76:0502 36871EEA05 XCHG BX,SS:[05EA]
6A76:0507 368E1EEC05 MOV DS,SS:[05EC]
6A76:050C 36FF16EA05 CALL SS:[05EA]
Unfortunately, if you now go and look at 3EA7h, presumably the address of the all-important INT 21h function dispatch table, there turns out instead to be perfectly valid-looking code at that address, and not a table at all. Likewise, 05ECh and 05EAh are, in this context, totally bogus. This isn't a problem with DEBUG, however. A straight disassembly on disk of MSDOS.SYS or IO.SYS, even with a more sophisticated disassembler such as Sourcer, doesn't produce much better results.
The problem is that the SYSINIT process (as described in the MS-DOS Encyclopedia ) moves segments around in memory and relies heavily on segment arithmetic. Address cross-references often won't match up properly in a static disassembly of DOS on disk. To get a good disassembly of the core DOS interrupt handlers, it is much easier to disassemble DOS in memory, after the DOS initialization segment movement (which might include the DOS=HIGH movement of the DOS kernel to the high memory area, or HMA) is complete.
The only problem with disassembling DOS out of memory, rather than in the system files on disk, is that this misses the SYSINIT code, which is discarded from memory when the initialization is complete. However, as noted earlier, SYSINIT and the DOS bootstrap process have already been adequately covered elsewhere.
Again, a tech reviewer writes, "NO! You're forgetting all the "preload" stuff that IO.SYS does starting in DOS 6.0. Also, taking apart IO.SYS really isn't that difficult. To link up data with the code that uses it, you just need to subtract some fixed amount, which is easy to figure out once you have one code/data pair. Just look at the code in IO.SYS that preloads DBLSPACE.BIN." Hmm, it seems we ought to take a look at this...
It turns out that static disassembly of IO.SYS is actually pretty easy, even though at glance the results produced by a disassembler such as Sourcer look inadequate. It's true that references to data don't match up with the actual locations of the data in the file, but once you match up just one piece of data in the file with code that references it, you can figure out everything else.
For example, a Sourcer disassembly of IO.SYS from MS-DOS 6.0 contains the following data item:
54BF:8138 5C 44 42 4C 53 50 41 43 db '\DBLSPACE.BIN'
54BF:813E 45 2E 42 49 4E 00
This is followed shortly by code that, based on the surrounding context (the code calls the INT 21h AX=4B03h Load Overlay function), is probably loading DBLSPACE.BIN. However, the code does not reference offset 8138h. Instead, it references CS:3B62h:
54BF:8153 0E push cs
54BF:8154 1F pop ds
54BF:8155 BE 3B62 mov si,3B62h
If you subtract 3B62h from 8138h, you get 45D6h. If the code at 54BF:8155 really is referencing the '\DBLSPACE.BIN' string at offset 8138h, then 45D6h is the amount which you must add to other data references in this version of IO.SYS in order to locate the data itself. To confirm if this amount is accurate, just look for another data reference, and see if adding the amount onto it yields a likely-looking address. For example, a little further on in the file, IO.SYS produces an error message:
54BF:81E9 0E push cs
54BF:81EA 1F pop ds
54BF:81EB úBA 5823 mov dx,5823h
54BF:81EE B4 09 mov ah,9
54BF:81F0 CD 21 int 21h ; DOS Services ah=function 09h
; display char string at ds:dx
From the helpful comment supplied by Sourcer on how INT 21h AH=9 works, it is clear that 5823h must be the offset within CS of a string. Adding 45D6h to 5823h yields 9DF9h and there, indeed, is the error message:
54BF:9DF9 57 72 6F 6E 67 20 db 'Wrong DBLSPACE.BIN version', 0Dh
Thus, we really can pick apart IO.SYS on disk. This lets us examine the DOS boot process, in particular the recent additions such as the preloading of DBLSPACE.BIN in DOS 6 and the apparent ability to preload DOS386.EXE in DOS 7. "Preloading" means that IO.SYS looks for and loads these external programs before processing any DEVICE= statements in CONFIG.SYS. Chapter 1 discussed how Stacker 3.1 uses this interface to get itself preloaded under DOS 6. By examining IO.SYS, you can see how the interface works.
For example, after calling INT 21h AX=4B03h to load DBLSPACE.BIN, IO.SYS looks for a function pointer at offset 14h in DBLSPACE.BIN:
54BF:819F E8 FBD6 call LOAD_OVERLAY ; subr. does 21/4B03
; ...
54BF:81C6 2E: C7 06 0387 0014 mov word ptr cs:[387h],14h ; get func ptr from
54BF:81CD 2E: 8C 06 0389 mov word ptr cs:[389h],es ; offset 14h
; ... ; in DBLSPACE.BIN
IO.SYS saves away the function pointer provided by DBLSPACE.BIN, and then calls it:
54BF:81DA 0E push cs ; IO.SYS passes DBLSPACE.BIN
54BF:81DB 07 pop es ; a pointer to a buffer:
54BF:81DC BB 036A mov bx,36Ah ; 36Ah+45D6h=4940h (see below)
54BF:81DF B8 0006 mov ax,6 ; DOS version
54BF:81E2 2E: FF 1E 0387 call dword ptr cs:[387h] ; call DBLSPACE.BIN
; ... ; function ptr
54BF:8228 BB 0004 mov bx,4 ; subfunction 4
54BF:822B 2E: FF 1E 0387 call dword ptr cs:[387h]
; ...
54BF:4940 18 00 db 18h, 00h ; a communications buffer
IO.SYS also checks for a 2E2Ch signature at offset 12 in DBLSPACE.BIN. A hex dump of DBLSPACE.BIN reveals the presence of this signature:
C:\UNDOC2\CHAP6>dump \dos\dblspace.bin -bytes 32
0000 | FF FF FF FF 42 48 41 08 8B 08 01 44 42 4C 53 50 | ....BHA....DBLSP
0010 | 41 43 2C 2E E9 B2 59 00 00 EA 41 08 00 00 EA 8B | AC,...Y...A.....
Further discussion of this interface, and its possible role in the ongoing battle between Microsoft and Stac Electronics, appears in chapter 1. Here, the point is simply that all existing descriptions of the DOS boot process will need to be rewritten to take account of new additions to DOS such as DBLSPACE.BIN (and, in DOS 7, DOS386.EXE).
In any case, one topic that hasn't been covered at all is the INT 21h dispatch code, which is executed every time a program makes a DOS call (except another program that hooks INT 21h has completely intercepted the call, without chaining). As we'll see, there are many important aspects to the INT 21h dispatch code, including stack switching, use of the current PSP, incrementing and decrementing the InDOS flag, handling of critical sections, Ctrl-Break, and critical errors, checking the machine's A20 line when DOS=HIGH, and special casing for Windows Enhanced mode.
Studying DOS internals requires finding the code in DOS that handles software interrupts such as INT 21h and INT 2Fh. As we just saw, trying to do this with IO.SYS and MSDOS.SYS on disk can produce inadequate results. In memory, however, it seems like it should be trivial to find DOS's INT 21h and INT 2Fh handlers. As every PC programmer knows, there is a documented DOS function, INT 21h AH=35h, which returns (in ES:BX) a far pointer to the code that handles the interrupt given in AL.
Finding the current handlers for INT 21h and INT 2Fh is thus a simple matter of calling INT 21h AX=3521h and AX=352Fh and looking at the returned far pointer, or vector, as it is called. This can be wrapped up in a simple program to print out interrupt vectors. Add a little extra smarts, such as trying to figure out the owner of each interrupt vector and disassembling some frequently encountered instructions at the beginning of the interrupt handler, and the result is INTVECT.C, shown in listing 6-1; listing 6-2 shows MAP.C, which attempts to figure out owners.
/*
INTVECT.C
bcc intvect.c map.c
*/
#include <stdlib.h>
#include <stdio.h>
#include <dos.h>
typedef unsigned char BYTE;
typedef unsigned short WORD;
typedef unsigned long DWORD;
#define MK_LIN(fp) ((((DWORD) FP_SEG(fp)) << 4) + FP_OFF(fp))
extern char *find_owner(DWORD lin_addr); // in map.c
#define ARPL 0x63
#define IRET 0xCF
#define JMPF 0xEA
#define JMP8 0xEB
#define JMP16 0xE9
BYTE far *get_vect(int intno) // call INT 21h AH=35h
{
_asm push es
_asm mov al, byte ptr intno
_asm mov ah, 35h
_asm int 21h
_asm mov dx, es
_asm mov ax, bx
_asm pop es
// return value in DX:AX
}
void print_vect(int intno)
{
char *s;
BYTE far *fp = get_vect(intno);
printf("INT %02Xh %Fp ", intno, fp);
if (fp == 0)
{
printf("unused\n");
return;
}
s = find_owner(MK_LIN(fp));
printf("%-08s ", s? s: " ");
switch (*fp) // see if first instruction of interrupt handler
{ // is anything really obvious
case ARPL: printf("arpl -- Windows V86 breakpoint"); break;
case IRET: printf("iret -- NOP function"); break;
case JMP8: printf("jmp %Fp",
((BYTE far *) fp) + fp[1] + 2); break;
case JMP16: printf("jmp %Fp",
((BYTE far *) fp) + *((WORD far *) &fp[1]) + 3); break;
case JMPF: printf("jmp %Fp",
*((void far * far *) &fp[1])); break;
}
printf("\n");
}
main(int argc, char *argv[])
{
char *end;
int intno, i;
if (argc < 2)
for (intno=0; intno< 256; intno++)
print_vect(intno);
else for (i=1; i< argc; i++)
print_vect(strtoul(argv[i], &end, 16));
return 0;
}
For example:
C:\UNDOC2\CHAP6>intvect 21 28 2f 2f
INT 21h C0B6:0942
INT 28h 18D4:0615 PRINT
INT 29h 0070:0762 IO
INT 2Fh 1A82:000D NLSFUNC
If you run INTVECT without command line parameters, it dumps out the vectors for all 256 interrupts. This is useful, for example, in determining which interrupts Windows Enhanced mode takes over; you can run INTVECT > TMP.TMP, start Windows, run INTVECT > TMP.2 from inside a DOS box, and then use diff or a similar utility to compare the files TMP.TMP and TMP.2. The difference between these two files reveals the interrupts that Windows Enhanced mode hooks using the low memory interrupt vector table (it also hooks some interrupts using the protected mode interrupt descriptor table). Where < points to the pre-Windows DOS output from INTVECT, and> points to the output under Windows, part of the output from diff might look like this (the complete output also shows changes to INT 0, 3, 8, 10h, 15h, 1Ch, 22h, 23h, 24h, 67h, and 68h):
C:\UNDOC2\CHAP6>intvect 21 28 2f 2f
INT 21h C0B6:0942
INT 28h 18D4:0615 PRINT
INT 29h 0070:0762 IO
INT 2Fh 1A82:000D NLSFUNC
INT 28h is the DOS idle interrupt, and the Virtual DMA Services (VDS) use INT 4Bh. As you can see, INTVECT examines the byte of an interrupt handler looking for code such as the ARPL instruction, which Windows Enhanced mode uses as a V86 breakpoint, to force a transition from user (Ring 3) code to VMM (Ring 0) code. The seeming location of the Windows V86 breakpoints inside DBLSSYS$ (DoubleSpace) is misleading; this has to do with the way Windows implements V86 breakpoints (see Chappell, DOS Internals, chapter 2).
To build INTVECT, INTVECT.C should be linked with MAP.C (listing 6-2). MAP.C attempts to provide the owner's name for each interrupt vector, using code that is explained in detail in chapter 7 (see UDMEM.C, listing 7-XX). MAP.C will be reused with another program later in this chapter, INTCHAIN.C (listing 6-5). MAP can also be compiled with -DTESTING to produce a standalone program. For example, running MAP on one machine happened to produce the following output, which shows that this machine is running DoubleSpace, MSCDEX, SMARTDRV (loaded high), DOSKEY (also loaded high), and XMS and EMM servers:
C:\UNDOC2\CHAP6>map
00000700 000009A0 IO
000009A0 00001E80 DOS
00001E80 00002010 D:
00002010 00005780 MS$MOUSE
00005780 00007EA0 MSCD001
00007EA0 00012FA0 DBLSSYS$
00012FA0 000131F0 SETVERXX
000131F0 00013670 XMSXXXX0
00013670 00014950 EMMXXXX0
00014950 000188A0 MSCDEX
000189A0 0002A7E0 MAP
000CAA30 000CBBA0 COMMAND
000CBBD0 000D2C60 SMARTDRV
000CDDA2 000CDDB4 M:
000CDDB4 000DE470 J:
000DE470 000DF4A0 DOSKEY
00100000 0010FFEE HMA
/*
MAP.C
bcc intvect.c map.c
bcc intchain.c map.c
bcc -DTESTING map.c
*/
#include <stdlib.h>
#include <stdio.h>
#include < string.h >
#include
typedef unsigned char BYTE;
typedef unsigned short WORD;
typedef unsigned long DWORD;
typedef void far *FP;
#ifndef MK_FP
#define MK_FP(s,o) ((((DWORD) s) << 16) + (o))
#endif
#pragma pack(1)
typedef struct {
DWORD start, end;
char name[9];
} BLOCK;
static BLOCK *map;
static int num_block = 0;
int cmp_func(const void *b1, const void *b2)
{
if (((BLOCK *) b1)->start < ((BLOCK *) b2)->start) return -1;
else if (((BLOCK *) b1)->start > ((BLOCK *) b2)->start) return 1;
else return 0;
}
typedef struct {
BYTE type; /* 'M'=in chain; 'Z'=at end */
WORD owner; /* PSP of the owner */
WORD size; /* in 16-byte paragraphs */
BYTE unused[3];
BYTE name[8]; /* in DOS 4+ */
} MCB;
#define IS_PSP(mcb) (FP_SEG(mcb) + 1 == (mcb)->owner)
WORD get_first_mcb(void)
{
_asm mov ah, 52h
_asm int 21h
_asm mov ax, es:[bx-2]
// retval in AX
}
typedef struct DEV {
struct DEV far *next;
WORD attr, strategy, intr;
union {
BYTE name[8], blk_cnt;
} u;
} DEV;
#define IS_CHAR_DEV(dev) ((dev)->attr & (1 << 15))
DEV far *get_nul_dev(void)
{
_asm mov ah, 52h
_asm int 21h
_asm mov dx, es
_asm lea ax, [bx+22h]
// retval in DX:AX
}
int get_num_block_dev(DEV far *dev)
{
// can't rely on # block devices at SysVars[20h]?
// walk once through dev chain just to count # blk devs
int num_blk = 0;
do {
if (! IS_CHAR_DEV(dev))
num_blk += dev->u.blk_cnt;
dev = dev->next;
} while(FP_OFF(dev->next) != (WORD) -1);
return num_blk;
}
WORD get_umb_link(void)
{
_asm mov ax, 5802h
_asm int 21h
_asm xor ah, ah
// return value in AX
}
WORD set_umb_link(WORD flag)
{
_asm mov ax, 5803h
_asm mov bx, flag
_asm int 21h
_asm jc error
_asm xor ax, ax
error:;
// return 0 or error code in AX
}
WORD get_dos_ds(void)
{
_asm push ds
_asm mov ax, 1203h
_asm int 2fh
_asm mov ax, ds
_asm pop ds
// retval in AX
}
/* find IO.SYS segment with built-in drivers */
WORD get_io_seg(DEV far *dev)
{
WORD io_seg = 0;
do {
if (IS_CHAR_DEV(dev))
if (_fstrncmp(dev->u.name, "CON ", 8) == 0)
io_seg = FP_SEG(dev); // we'll take the last one
dev = dev->next;
} while(FP_OFF(dev->next) != (WORD) -1);
return io_seg;
}
static int did_init = 0;
void do_init(void)
{
MCB far *mcb;
DEV far *dev = get_nul_dev();
WORD dos_ds, io_seg, mcb_seg, next_seg, save_link;
BLOCK *block;
int blk, i;
map = (BLOCK *) calloc(100, sizeof(BLOCK));
block = map;
io_seg = get_io_seg(dev);
block->start = io_seg << 4; block->end = (DWORD) -1;
strcpy(block->name, "IO"); block++;
dos_ds = get_dos_ds();
block->start = dos_ds << 4; block->end = (DWORD) -1;
strcpy(block->name, "DOS"); block++;
// should really check if there IS an HMA!
block->start = 0x100000L; block->end = 0x10FFEEL;
strcpy(block->name, "HMA"); block++;
num_block = 3;
/* walk MCB chain, looking for PSPs, interrupt owners */
if (_osmajor >= 4)
{
mcb_seg = get_first_mcb();
mcb = (MCB far *) MK_FP(mcb_seg, 0);
if (_osmajor >= 5) // be lazy; see ch. 7 for DOS < 5
{
save_link = get_umb_link();
set_umb_link(1); // access UMBs too
}
for (;;)
{
next_seg = mcb_seg + mcb->size + 1;
if (IS_PSP(mcb))
{
block->start = ((DWORD) mcb_seg) << 4;
block->end = ((DWORD) next_seg) << 4;
_fstrncpy(block->name, mcb->name, 8);
block->name[8] = '\0';
block++; num_block++;
}
mcb_seg = next_seg;
if (mcb->type == 'M')
mcb = (MCB far *) MK_FP(next_seg, 0);
else
break;
}
}
/* walk device chain looking for non-builtin drivers */
blk = get_num_block_dev(dev);
do {
MCB far *dev_mcb;
if ((FP_SEG(dev) != dos_ds) && (FP_SEG(dev) != io_seg))
{
block->start = (((DWORD) FP_SEG(dev)) << 4) + FP_OFF(dev);
dev_mcb = (MCB far *) MK_FP(FP_SEG(dev)-1,0);
if (dev_mcb->owner == 8)
{
dev = dev->next;
continue;
}
if (dev_mcb->type == 'M')
block->end = block->start + ((DWORD) dev_mcb->size << 4);
else
block->end = (DWORD) -1;
if (IS_CHAR_DEV(dev))
{
_fstrncpy(block->name, dev->u.name, 8);
block->name[8] = '\0';
}
else
{
blk -= dev->u.blk_cnt; // block drivers in reverse order
block->name[0] = blk + 'A';
block->name[1] = ':';
block->name[2] = '\0';
}
block++; num_block++;
}
dev = dev->next;
} while(FP_OFF(dev->next) != (WORD) -1);
if (_osmajor >= 5)
set_umb_link(save_link);
qsort(map, num_block, sizeof(BLOCK), cmp_func);
for (i=0, block=map; i< num_block-1; i++, block++)
if (block->end == (DWORD) -1)
block->end = map[i+1].start;
if (block->end == (DWORD) -1) // last one
block->end = 0xFFFFFL;
did_init = 1;
}
char *find_owner(DWORD lin_addr)
{
BLOCK *block;
int i;
if (! did_init) do_init();
for (i=0, block=map; i < num_block; i++, block++)
if ((lin_addr >= block->start) &&
(lin_addr <= block->end))
return block->name;
/* still here */
return (char *) 0;
}
#ifdef TESTING
main()
{
BLOCK *block;
int i;
do_init();
for (i=0, block=map; i < num_block; i++, block++)
printf("%08lX %08lX %s\n",
block->start, block->end, block->name);
}
#endif
With the exception of unused interrupt vectors and those (such as INT 1Eh) that point to data rather than code, you can take addresses displayed by INTVECT and unassemble them to see how a given interrupt is handled. As an example, Figure 6-1 shows INT 29h, which is the undocumented Fast Console Output function, located by default in the CON driver provided by IO.SYS.
C:\UNDOC2\CHAP6>intvect 29
INT 29h 0070:0762 IO
C:\UNDOC2\CHAP6>debug
-u 70:762
0070:0762 50 PUSH AX
0070:0763 56 PUSH SI
0070:0764 57 PUSH DI
0070:0765 55 PUSH BP
0070:0766 53 PUSH BX
0070:0767 B40E MOV AH,0E
0070:0769 BB0700 MOV BX,0007
0070:076C CD10 INT 10
0070:076E 5B POP BX
0070:076F 5D POP BP
0070:0770 5F POP DI
0070:0771 5E POP SI
0070:0772 58 POP AX
0070:0773 CF IRET
That is very straightforward. INT 29h here is just a wrapper around INT 10h AH=0Eh, which is the ROM BIOS function to write a character in teletype mode.
Of course, things are never quite that simple. For example, if you install ANSI.SYS, which is a replacement CON driver, INT 29h points somewhere else:
C:\UNDOC2\CHAP6>intvect 29
INT 29h 0070:0762
C:\UNDOC2\CHAP6>\undoc2\chap7\devlod \dos\ansi.sys
C:\UNDOC2\CHAP6>intvect 29
INT 29h 6EB3:0510 DEVLOD
Because we loaded ANSI.SYS using DEVLOD, the INTVECT program shows DEVLOD as the owner of the interrupt vector; the owner, of course, is actually the new CON driver in ANSI.SYS. Now the code at 6EB3:0510 is no longer just a wrapper around an INT 10h call. Instead, it directly manipulates video memory at segment B800h and contains special handling for ANSI escape control codes. Showing the code here would take us too far afield, even for a chapter such as this that rambles more-or-less aimlessly through the DOS code. The point anyway is merely that the INTVECT program, simple as it is, can help us point DEBUG at useful segment:offset addresses to unassemble.
But there's a major problem here. Recall that we are interested in looking at the DOS INT 21h and INT 2Fh handlers. INTVECT can of course print out the addresses of the INT 21h and INT 2Fh handlers:
C:\UNDOC2\CHAP6>intvect 21 2f
INT 21h 0F93:32B6 MSCDEX
INT 2Fh 1305:0285 DOSKEY
However, as INTVECT indicates, these interrupt vectors point, not to DOS, but to DOS add-ins such as MSCDEX and DOSKEY. In fact, it is practically guaranteed that, except on the lamest, freshly booted, stripped-down system with no AUTOEXEC.BAT or CONFIG.SYS file, INT 21h, INT 2Fh, and many other DOS interrupt vectors won't point into DOS. The INT 21h and INT 2Fh vectors are pointing at one of the plug-in subsystems rather than at the DOS motherboard.
Of course, if you're interested in examining MSCDEX's INT 21h handler or DOSKEY's INT 2Fh handler, the INTVECT results are very useful. They provide all the information needed by a debugger such as DEBUG or SYMDEB (a handy debugger that Microsoft once included with the Windows SDK). For example, by using DEBUG or SYMDEB to unassemble the 1305:0285 address displayed by INTVECT for INT 2Fh, we can see that DOSKEY watches for the Windows and task-switcher initialization broadcasts (INT 2Fh AX=1605h and AX=4B05h). DOSKEY clearly uses the same piece of code (here, at offset 0299h) to handle both calls. We can also see confirmation that, as documented in Microsoft's MS-DOS Programmer's Reference, DOSKEY responds to INT 2Fh AH=48h calls:
C:\UNDOC2\CHAP6>intvect 2f
INT 2Fh 1305:0285 DOSKEY
C:\UNDOC2\CHAP6>debug
-u 1305:0285
1305:0285 3D0516 CMP AX,1605
1305:0288 740F JZ 0299
1305:028A 3D054B CMP AX,4B05
1305:028D 740A JZ 0299
1305:028F 80FC48 CMP AH,48
1305:0292 741B JZ 02AF
1305:0294 2EFF2E5F02 JMP FAR CS:[025F]
; ...
But if, for example, we want to see MSCDEX's INT 2Fh handler rather than DOSKEY's, and if DOSKEY is loaded after MSCDEX, INTVECT is of no use. (Note, however, that unlike MSDOS.SYS and IO.SYS, programs such as MSCDEX.EXE and DOSKEY.EXE are easy to disassemble on disk with a program such as Sourcer from V Communications.)
More important, INTVECT doesn't help us get the address of what we might call The One True INT 21h Handler inside MSDOS.SYS. Nor does it help with finding the original INT 2Fh handlers inside MSDOS.SYS and IO.SYS.
Why? Because interrupts are handled in a kind of last-in, -out (LIFO) stack. The point was made at the beginning of this chapter that the DOS philosophy is to provide the bare minimum operating system services, along with facilities for extending DOS. As discussed in greater detail in chapter 9 on TSRs, one of the keys to extending DOS is INT 21h AH=25h, the DOS Set Interrupt Vector function. Along with the Get Interrupt Vector function (AH=35h), the Set Vector function allows the creation of what are called interrupt chains, which are essentially linked lists (or LIFO stacks) of code. An interrupt chain consists of two or more pieces of code that handle the same interrupt. The following code fragment, adapted from the FUNC0E32 and DOSVER programs in listings 2-20 and 2-21, illustrates this:
void (interrupt far *prev)(); // ptr to previous handler in chain
prev = _dos_getvect(0x21); // call 21/35 -- get previous
_dos_setvect(0x21, my_int21_handler); // call 21/25 -- set new
// ...
void interrupt far my_int21_handler(REG_PARAMS r)
{
// look at AH to see if we're interested
// ...
_chain_intr(prev); // pass interrupt down to previous owner in chain
}
The _chain_intr() does a far JMP to the previous interrupt handler in the chain, without returning. It is important to note that sometimes interrupt handlers CALL, rather than JMP to, the previous handler. This allows a handler to post-process the interrupt after the previous handler has done its work, rather than pre-processing the interrupt beforehand, which is what happens in the more typical JMP style of interrupt chaining. Sometimes the JMP-style code is called a front-end handler, and the CALL-style code is called a back-end handler.
It is especially important that INT 21h AH=25h and 35h allow even INT 21h itself to be hooked. This is a source of tremendous flexibility in DOS, but it also makes it difficult for us to find The One True INT 21h Handler. Calling INT 21h AX=3521h returns the head of the INT 21h linked list, that is, the address of the most recently installed INT 21h handler. This might conceivably be the genuine DOS INT 21h handler, but more likely it belongs to MSCDEX, NETX, or perhaps even something as dumb as the FUNC0E32 or DOSVER programs from chapter 2. INT 21h AX=35h simply returns the head of an interrupt chain. Finding the original INT 21h or INT 2Fh handler belonging to DOS usually requires finding the chain's tail. (Usually rather than always, because there might be back-end handlers.)
How can we find the actual INT 21h and INT 2Fh handlers provided by DOS itself, when all we have is the address of the head of the INT 21h or INT 2Fh interrupt chain? There is unfortunately no function that returns the tail of an interrupt chain. And while there is an undocumented DOS function (INT 2Fh AX=1203h) to return the DOS data segment, there is no equivalent function that returns the DOS code segment (which, remember, may well be in the HMA).
One solution would of course be to boot on an absolutely bare-bones system and hope that INT 21h and INT 2Fh point to the original MS-DOS handlers, thereby bypassing the whole problem of how to follow interrupt chains. Or you could write a device driver to keep track of interrupts, and install it very early in DOS initialization. But this is ridiculous! Clearly there must be some way to follow the interrupt chain, as the processor does this many times a second.
Unfortunately, there is no standard mechanism for interrupt chaining. IBM and Microsoft at one point put forward a specification for this purpose (David Thielen described it in detail in Microsoft Systems Journal , July 1991, pp. 24-25), but unfortunately no one seems to use it. Ralf Brown has proposed an INT 2Dh protocol (described in the Interrupt List on disk) to combat the extremely long interrupt chains that currently plague INT 2Fh, but again you can't rely on programs to do the right thing and use this protocol.
It turns out that Microsoft provides, with every copy of DOS, an almost perfect solution to the problem of finding the actual DOS INT 21h and INT 2Fh handlers. The solution is none other than DEBUG.
Like most debuggers, DEBUG has an a command to assemble instructions on the fly, and a t command for tracing into (as opposed to stepping over) instructions. Even better, unlike some otherwise more sophisticated debuggers, the t command in DEBUG can trace into an INT instruction. For the purposes of trace, in other words, DEBUG does not treat INT as an atomic operation:
C:\UNDOC2\CHAP6>intvect 21
INT 21h 0F93:32B6 MSCDEX
C:\UNDOC2\CHAP6>debug
-a
19B5:0100 mov ah, 62
19B5:0102 int 21
19B5:0104 ret
19B5:0105
-t
AX=6200 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=19B5 ES=19B5 SS=19B5 CS=19B5 IP=0102 NV UP EI PL NZ NA PO NC
19B5:0102 CD21 INT 21
-t
AX=6200 BX=0000 CX=0000 DX=0000 SP=FFE8 BP=0000 SI=0000 DI=0000
DS=19B5 ES=19B5 SS=19B5 CS=0F93 IP=32B6 NV UP DI PL NZ NA PO NC
0F93:32B6 80FC60 CMP AH,60
Notice that pressing t at the INT 21h instruction, took us into the line of the handler at 0F93:32B6, rather than over it to the RET instruction at 19B5:0104. This is exactly what one might expect from pressing t rather than p (proceed); yet because of the way the single step interrupt works on Intel processors (see INTCHAIN.C at listing 6-5 later in this chapter), most debuggers don't behave this way; it's useful that every copy of DOS comes with one that does.
We can use this facility in order to follow the INT 21h or INT 2Fh chain down into the bowels of DOS itself. (Yuck!) All we must do is keep tracing (either by continuously pressing t or by telling DEBUG with a command such as t 16 to trace a certain number of instructions) until the segment:offset returns to DEBUG and our RET instruction (which, in the example above, is located at 19C7:0104). This will surely locate the actual DOS INT 21h or INT 2Fh handler.
However, the astute reader may wish to interject right now, before we go any further, that using DEBUG to trace into INT 21h "won't work" because DEBUG itself uses DOS, and DOS, as we all know, is not reentrant. This is absolutely true; a debugger that does not use DOS, such as Nu-Mega's Soft-ICE, is better suited than DEBUG to tracing through DOS.
However, there are a handful of DOS functions that are reentrant, at least for the purposes of tracing with DEBUG. By examining the DOS code for INT 21h, we will soon see precisely what this reentrancy or lack thereof means. In the meantime, simply take it on faith that the DOS INT 21h functions shown below in table 6-1, are (with an important caveat that we'll get to) reentrant, and thus traceable using DEBUG, SYMDEB, or any other debugger that uses DOS. With the exception of the undocumented INT 21h AH=64h, note that these are among the INT 21h functions that Microsoft ( MS-DOS Programmer's Reference, chapter 7) lists as callable from a critical error handler.
It is desirable for MS-DOS to single out the Get and Set PSP functions for special treatment, because this means that interrupt handlers can freely call these process-manipulation functions (see chapter 9 on TSRs). But it is not at all obvious why functions 33h and 64h merit this special attention. It would seem that other functions, such as AH=25h and AH=35h to get and set interrupt vectors, might be more useful. On the other hand, including function 33h here means that interrupt handlers can freely get and set the DOS BREAK= flag.
Let us now use DEBUG to trace into a call to one of these functions, INT 21h AH=62h (Get PSP), and see exactly what occurs when this function is called under DOS 6.0, in a configuration with a few standard DOS TSRs such as MSCDEX and DOSKEY. The documentation states that function 62h takes no parameters other than the number 62h in AH, and that the function returns the current PSP in BX. You can probably guess that the DOS implementation for this function is rather simple, doing little more than loading BX from the CURR_PSP location in the DOS data segment. This location corresponds to offset 10h in the Swappable Data Area (SDA; see INT 21h AX=5D06h in the appendix). However, as you'll see, the processor executes a lot of code before DOS eventually gets to the point of carrying out the otherwise simple Get PSP operation.
As noted earlier, the key facility DEBUG provides here is that (unlike SYMDEB, for example) it traces into the INT instruction. In Figure 6-2, comments have been added to the following DEBUG output, using ;;; to make them stand out
C:\UNDOC2\CHAP6>debug
-a
19B5:0100 mov ah, 62
19B5:0102 int 21
19B5:0104 ret
19B5:0105
-t
AX=6200 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=19B5 ES=19B5 SS=19B5 CS=19B5 IP=0102 NV UP EI PL NZ NA PO NC
19B5:0102 CD21 INT 21
-t
;;; We have to keep tracing until the segment:offset comes
back to
;;; our own code, the RET instruction at 19B5:0104.
AX=6200 BX=0000 CX=0000 DX=0000 SP=FFE8 BP=0000 SI=0000 DI=0000
DS=19B5 ES=19B5 SS=19B5 CS=0F93 IP=32B6 NV UP DI PL NZ NA PO NC
0F93:32B6 80FC60 CMP AH,60
-t
;;; Running MEM /D showed that above is MSCDEX. This is consistent
;;; with output from INTVECT program. Apparently MSCDEX is interested
;;; in the undocumented DOS INT 21h AH=60h (Truename) function. Note that
;;; we were running MSCDEX /S (for network sharing); usually MSCDEX doesn't
;;; care about the INT 21h AH=60h call.
AX=6200 BX=0000 CX=0000 DX=0000 SP=FFE8 BP=0000 SI=0000 DI=0000
DS=19B5 ES=19B5 SS=19B5 CS=0F93 IP=32B9 NV UP DI PL NZ NA PO NC
0F93:32B9 7405 JZ 32C0
-t
AX=6200 BX=0000 CX=0000 DX=0000 SP=FFE8 BP=0000 SI=0000 DI=0000
DS=19B5 ES=19B5 SS=19B5 CS=0F93 IP=32BB NV UP DI PL NZ NA PO NC
0F93:32BB 2E CS:
0F93:32BC FF2EB232 JMP FAR [32B2] CS:32B2=15FA
-t
;;; MSCDEX decided it's not interested in our call to 21/62, so it chains
;;; to the previous handler, whose address it earlier retrieved (by
;;; calling 21/35) and saved away (apparently in CS:32B2) before installing
;;; (with 21/25) its own INT 21h handler.
AX=6200 BX=0000 CX=0000 DX=0000 SP=FFE8 BP=0000 SI=0000 DI=0000
DS=19B5 ES=19B5 SS=19B5 CS=07F9 IP=15FA NV UP DI PL NZ NA PO NC
07F9:15FA 80FC3F CMP AH,3F
;;; We're now in the previous INT 21h handler. MEM /D shows that
;;; 07F9:15FA is SMARTDRV. Here, it's (reasonably enough) interested in
;;; whether we've called INT 21h AH=3Fh to read from a file (SMARTDRV)
;;; wants to see if the data we want from the file is actually
already
;;; in its cache). But we called 21/62 not 21/3F so...
Well, you get the idea. Running DEBUG this way is a bit tedious, and saving its output to a file is difficult. As an improvement, you can drive DEBUG with input scripts, such as 2162.SCR in listing 6-3, and redirect its output to a file. (For a lengthy discussion of DEBUG scripts, see PC Magazine DOS Power Tools, 2nd edition, chapter 9.) Furthermore, rather than repeatedly hitting t to trace (single step) the next instruction, you can give the trace command a numeric parameter (for example, t 16 or t 32 ) to trace a series of instructions.
C:\UNDOC2\CHAP6>type 2162.scr
a
mov ah, 62
int 21
ret
; blank line below is crucial to leave assembly mode!
t 100
q
The only problem is in guessing how many instructions to trace; if you ask DEBUG to trace too far, it starts executing garbage. You only want to trace until you return to the RET instruction you assembled, or at least not much past it. The best bet is try t 16, examine DEBUG's output to see if the traced instructions come back, then try t 32, examine the output again, and so on. In any case, t 100 happened to work here; a larger number would be needed on machines with more TSRs that hook INT 21h installed.
The only problem is in guessing how many instructions to trace; if you ask DEBUG to trace too far, it starts executing garbage. You only want to trace until you return to the RET instruction you assembled, or at least not much past it. The best bet is try t 16, examine DEBUG's output to see if the traced instructions come back, then try t 32, examine the output again, and so on. In any case, t 100 happened to work here; a larger number would be needed on machines with more TSRs that hook INT 21h installed.
Figure 6-3 shows a complete trace into an INT 21h AH=62h call, from the time we issued the INT 21h until DOS returns to us with the current PSP in BX. Normally all that you see (or want to see!) of an INT 21h call is your input and its output. But figure 6-3 views the DOS call "through the looking glass," as it were. Instead of looking down at DOS, you'll be inside DOS looking up at the INT 21h call. This can be slightly disorienting at , but in the end you'll have a much better > understanding of what DOS is all about.
< B >C:\UNDOC2\CHAP6>debug < 2162.scr > 2162.out
C:\UNDOC2\CHAP6>type 2162.out
-a
19B5:0100 mov ah, 62
19B5:0102 int 21
19B5:0104 ret
19B5:0105
-t 106
AX=6200 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=19B5 ES=19B5 SS=19B5 CS=19B5 IP=0102 NV UP EI PL NZ NA PO NC
19B5:0102 CD21 INT 21
AX=6200 BX=0000 CX=0000 DX=0000 SP=FFE8 BP=0000 SI=0000 DI=0000
DS=19B5 ES=19B5 SS=19B5 CS=0F93 IP=32B6 NV UP DI PL NZ NA PO NC
0F93:32B6 80FC60 CMP AH,60
;;; As before (figure 6-2), we're in MSCDEX /S now.
AX=6200 BX=0000 CX=0000 DX=0000 SP=FFE8 BP=0000 SI=0000 DI=0000
DS=19B5 ES=19B5 SS=19B5 CS=0F93 IP=32B9 NV UP DI PL NZ NA PO NC
0F93:32B9 7405 JZ 32C0
;;; The AX=xxxx BX=xxxx etc. dump that DEBUG shows each time usually
;;; isn't important here, so from now on we'll omit it (and blank lines)
;;; except when the register dump is useful.
0F93:32BB 2E CS:
0F93:32BC FF2EB232 JMP FAR [32B2] CS:32B2=15FA
07F9:15FA 80FC3F CMP AH,3F
;;; As before, we're in SMARTDRV now.
07F9:15FD 7414 JZ 1613
07F9:15FF 80FC0D CMP AH,0D
07F9:1602 7426 JZ 162A
07F9:1604 3D1325 CMP AX,2513
07F9:1607 7451 JZ 165A
07F9:1609 80FC68 CMP AH,68
07F9:160C 7442 JZ 1650
;;; Above provides a catalog of the DOS INT 21h function calls that
;;; SMARTDRV cares about: 3Fh (read file), 0Dh (disk reset), 2513h
;;; (set INT 13h vector), 68h (commit file). All this makes sense.
;;; For example, SMARTDRV uses 21/0D as a signal to flush the cache.
;;; For some calls such as 21/0D, SMARTDRV doesn't JMP to the previous
;;; handler; instead, it does a far CALL and examines the 21/0D on
;;; the way back.
07F9:160E 2E CS:
07F9:160F FF2E1423 JMP FAR [2314] CS:2314=0800
;;; We called 21/62, SMARTDRV doesn't care, so SMARTDRV chains to
;;; previous handler, C801:0800, which SMARTDRV earlier got from
;;; calling 21/35 before installing its own 21 handler with 21/25, and
;;; which is stored in CS:2314.
C801:0800 9C PUSHF
;;; Was running with DOS=UMB, so some INT 21h handlers are running
;;; in upper memory. Don't know who the owner of this is!
C801:0801 FB STI
C801:0802 3D0258 CMP AX,5802
C801:0805 7413 JZ 081A
C801:0807 3D0358 CMP AX,5803
C801:080A 7431 JZ 083D
C801:080C 80FC31 CMP AH,31
C801:080F 7503 JNZ 0814
C801:0814 9D POPF
;;; We can see that this handler cares about calls to INT 21h functions
;;; 5802h (Get UBM Link), 5803h (Set UMB Link), 31h (TSR). Wonder why.
;;; Anyway, we called 21/62, the handler isn't interested in that, so it
;;; chains to the previous handler.
C801:0815 2E CS:
C801:0816 FF2ECE01 JMP FAR [01CE] CS:01CE=0023
0255:0023 EA8E052ECC JMP CC2E:058E
;;; DEV shows that seg 0255h is a a block-mode device driver for
;;; D: through I: -- it is a low-memory stub for DoubleSpace, located in
;;; high memory. Stacker uses the same area; both have signatures at
;;; 0255:0000. DEV also shows that CC2E:058E is DBLSSYS$ (DoubleSpace).
CC2E:058E 9C PUSHF
CC2E:058F FB STI
CC2E:0590 FC CLD
CC2E:0591 1E PUSH DS
CC2E:0592 0E PUSH CS
CC2E:0593 1F POP DS
CC2E:0594 C606C20700 MOV BYTE PTR [07C2],00 DS:07C2=00
CC2E:0599 53 PUSH BX
CC2E:059A 8ADC MOV BL,AH
CC2E:059C 80FB6C CMP BL,6C
CC2E:059F 7759 JA 05FA
CC2E:05A1 32FF XOR BH,BH
CC2E:05A3 8A9F1305 MOV BL,[BX+0513] DS:0575=00
CC2E:05A7 FFA78005 JMP [BX+0580] DS:0580=05FA
;;; DoubleSpace is sufficiently tied into DOS that it uses a jump table to
;;; store a handler for every DOS function. The table at CC2E:0513 holds
;;; byte offsets into code at CC2E:0580. Most DOS functions (including
;;; our 21/62 call) are just passed on. Examining the table with the FTAB
;;; program from later in this chapter shows that DoubleSpace cares
;;; about the following INT 21h functions: 00, 0A, 0D, 10, 13, 17, 25, 31,
;;; 36, 39, 3A, 3E, 41, 43, 4B, 4C, 56, 57, 5D, 68. We know this from
;;; running "ftab cc2e:0513 6d DSI21 1 | grep -v 00". For example, it hooks
;;; 21/25 because (like SMARTDRV) it wants to know whenever someone sets the
;;; INT 13h (BIOS Disk) vector.
CC2E:05FA 5B POP BX
CC2E:05FB 1F POP DS
CC2E:05FC 9D POPF
CC2E:05FD 2E CS:
CC2E:05FE FF2E0005 JMP FAR [0500] CS:0500=109E
;;; Trivial handling for our 21/62 call. Just pass it on to previous
;;; handler for INT 21h...
0116:109E 90 NOP
;;; MEM /D shows that 0116h is MS-DOS. Finally!
0116:109F 90 NOP
0116:10A0 E8CC00 CALL 116F
;;; Hmm, DOS is calling some subroutine (which we've traced into):
0116:116F 9C PUSHF
0116:1170 1E PUSH DS
0116:1171 06 PUSH ES
0116:1172 51 PUSH CX
0116:1173 56 PUSH SI
0116:1174 57 PUSH DI
;;; We need to see the registers for the next few instructions.
;;; Note what happens to DS and ES
AX=6200 BX=0000 CX=0000 DX=0000 SP=FFDA BP=0000 SI=0000 DI=0000
DS=19B5 ES=19B5 SS=19B5 CS=0116 IP=1175 NV UP DI NG NZ AC PE CY
0116:1175 2E CS:
0116:1176 C5366711 LDS SI,[1167] CS:1167=0080
AX=6200 BX=0000 CX=0000 DX=0000 SP=FFDA BP=0000 SI=0080 DI=0000
DS=0000 ES=19B5 SS=19B5 CS=0116 IP=117A NV UP DI NG NZ AC PE CY
0116:117A 2E CS:
0116:117B C43E6B11 LES DI,[116B] CS:116B=0090
AX=6200 BX=0000 CX=0000 DX=0000 SP=FFDA BP=0000 SI=0080 DI=0090
DS=0000 ES=FFFF SS=19B5 CS=0116 IP=117F NV UP DI NG NZ AC PE CY
0116:117F B90400 MOV CX,0004
0116:1182 FC CLD
0116:1183 F3 REPZ
0116:1184 A7 CMPSW
0116:1185 7407 JZ 118E
;;; DOS has just compared 8 bytes (4 words) at DS:SI (0000:0080) and
;;; ES:DI (FFFF:0090). If they are identical, DOS jumps somewhere.
;;; What is this?! This particular run of DEBUG was conducted with
;;; DOS=HIGH. DOS is in the HMA, which is only reachable when the
;;; machine's A20 address line is enabled. DOS is comparing 0000:0080
;;; and FFFF:0090 because, if the 8 bytes at these two addresses are
;;; identical, it assumes that memory addresses are wrapping around, and
;;; therefore that A20 is off. DOS can't call routines in the HMA if A20
;;; is off. Thus, even when DOS=HIGH there must be a low-memory stub; the
;;; code at 0116:109E is that stub, which ensures that A20 is enabled before
;;; calling DOS in the HMA. Here, A20 was already on (0000:0080 and
;;; FFFF:0090 were different), but A20 has been off, we would
;;; have jumped to the subroutine at 0116:118E, whose job
;;; is to enable A20 (by calling XMS function 5, Local Enable A20).
;;; If that function call succeeds, DOS will jump back here, just as if
;;; A20 had been enabled all along. If that function call fails, we're
;;; in big trouble: DOS uses INT 10h AH=0Eh to display "A20 Hardware
;;; Error", and goes into a dynamic halt. We'll come back to this
;;; code later. Right now, A20 is enabled so...
0116:1187 5F POP DI
0116:1188 5E POP SI
0116:1189 59 POP CX
0116:118A 07 POP ES
0116:118B 1F POP DS
0116:118C 9D POPF
0116:118D C3 RET
0116:10A3 2E CS:
0116:10A4 FF2E6A10 JMP FAR [106A] CS:106A=40F8
;;; The low-memory stub for DOS knows it can jump to DOS in the HMA, and
;;; here we go:
FDC8:40F8 FA CLI
;;; We are now in The One True INT 21h Handler. That this is at
;;; FDC8:40F8 in this particular configuration is the one piece of
;;; information we're after here, because now we can go and unassemble
;;; (rather than trace) at that address. Static unassembly is
;;; generally easier than dynamic tracing. But let's see the thing
;;; through, to learn exactly how 21/62 is handled...
FDC8:40F9 80FC6C CMP AH,6C
FDC8:40FC 77D2 JA 40D0
;;; Any INT 21h function > 6Ch is an error. ("In DOS 7.0,
;;; the upper limit is 72h," writes one tech reviewer.)
FDC8:40FE 80FC33 CMP AH,33
FDC8:4101 7218 JB 411B
;;; Any INT 21h function < 33h will be handled at FDC8:411B.
FDC8:4103 74A2 JZ 40A7
;;; 21/33 is special: it is handled at FDC8:40A7 (in this configuration)
FDC8:4105 80FC64 CMP AH,64
FDC8:4108 7711 JA 411B
;;; Any INT 21h function > 64h will also be handled at FDC8:411B;
;;; seems like 411B is the handler for "normal" DOS calls.
FDC8:410A 74B5 JZ 40C1
;;; 21/64 is another special function, handled here at FDC8:40C1
FDC8:410C 80FC51 CMP AH,51
FDC8:410F 74A4 JZ 40B5
FDC8:4111 80FC62 CMP AH,62
FDC8:4114 749F JZ 40B5
;;; Finally! DOS sees our 21/62 call, and will handle it by jumping to
;;; FDC8:40B5. Notice that the same code also handles calls to 21/51, which
;;; makes sense, since the two functions are documented as being identical.
FDC8:40B5 1E PUSH DS
FDC8:40B6 2E CS:
FDC8:40B7 8E1EE73D MOV DS,[3DE7] CS:3DE7=0116
;;; DOS DS (0116h) is stored in a variable kept at CS:3DE7. This is
;;; the segment where things like SysVars and SDA live. This value is
;;; also returned from 2F/1203 (see appendix).
FDC8:40BB 8B1E3003 MOV BX,[0330] DS:0330=1408
;;; Believe it or not, the line above is actually the Get PSP function!
;;; We know that DOS keeps the current PSP at SDA+10h. In this
;;; configuration, 21/5D06 (Get SDA) returns 0116:0320. The Get PSP
;;; function just moves the WORD at 0116:0330 into BX. In other words,
;;; 21/62 (and 21/51) just return the WORD from SDA+10h. Duh.
FDC8:40BF 1F POP DS
FDC8:40C0 CF IRET
;;; DOS IRETs back to our code running in DEBUG
19B5:0104 C3 RET
;;; This is the RET statement in our DEBUG script.
19B5:0000 CD20 INT 20
0116:1094 90 NOP
;;; Our script has already returned to DEBUG, which did an INT 20h return
;;; to DOS. At this point, we start tracing all sorts of things we don't
;;; care about. If we trace too far, we start to make DEBUG execute
;;; garbage, which can hang the machine.
-q
The most noticeable feature of the INT 21h trace in figure 6-3 is the way that DOS extensions such as SMARTDRV and MSCDEX become indistinguishable from DOS itself. If any non-Microsoft DOS extensions such as Novell NetWare or Stacker had been running, they too would have appeared in the INT 21h chain, looking not a bit different from any of the Microsoft-provided software in the chain. The walk through the INT 21h chain in figure 6-3 thus presents an excellent illustration of what DOS really is.
As you can see, under normal circumstances with a few TSRs loaded, you have to wade through a lot just to get to the single line of code that actually performs the DOS Get PSP function. It should now be clear why INT 21h is called an interrupt "chain." As you'll see later, the INT 2Fh chain is typically much longer than the INT 21h chain. Given the overhead of INT 21h on a typical machine, programmers might even consider writing their own Get PSP calls to bypass this long interrupt chain. Seeing how DOS implements Get PSP (when it eventually gets there!), you can also see how to implement your own:
// uses get_sda() from GETSDA.C (listing 3-4a)
WORD my_get_psp(void)
{
static WORD far *psp_ptr = (WORD far *) 0;
if (! psp_ptr) // one-time init
psp_ptr = (WORD far *) (get_sda() + 0x10);
return *psp_ptr;
}
Of course, this would cut out any TSRs or drivers that might actually need to see and respond to DOS Get PSP calls.
Having already seen the code that handles the Get PSP function (INT 21h AH=51h and 62h), we might as well also examine the code for Set PSP, though we can guess what it's going to look like (we'll see later in figure 6-7 where the 40A9h address comes from):
-u fdc8:40a9
FDC8:40A9 1E PUSH DS ; save caller's DS
FDC8:40AA 2E CS:
FDC8:40AB 8E1EE73D MOV DS,[3DE7] ; switch to DOS DS
FDC8:40AF 891E3003 MOV [0330],BX ; put caller's BX into CURR_PSP
FDC8:40B3 1F POP DS ; restore caller's DS
FDC8:40B4 CF IRET ; done!
In other words, the Get and Set PSP functions just manipulate this word at offset 330h in the DOS data segment (offset 10h in the SDA). This provides a small taste of how DOS internally uses such externally-visible structures as SysVars and the SDA. Thus:
void my_set_psp(WORD psp)
{
static WORD far *psp_ptr = (WORD far *) 0;
if (! psp_ptr) // one-time init
psp_ptr = (WORD far *) (get_sda() + 0x10);
*psp_ptr = psp;
}
A glance towards the end of the DEBUG output in figure 6-3 shows that MS-DOS special-cases a handful of functions: 33h, 51h, 62h, 64h, and (not shown in figure 6-3) 50h. These functions correspond to the reentrant DOS functions listed in table 6-1 above. While we're still not quite in a position to understand what makes these functions different from all other DOS functions, we do at any rate now have a bunch of addresses that we can unassemble. Recall that this was our goal in tracing through DOS.
For example, INT 21h AH=33h is an omnibus function with a number of subfunctions relating to Ctrl-Break, the Boot Drive, and the DOS Version. For example, setting BREAK=ON ends up calling INT 21h AX=3300h with DL=1. In this configuration, code at FDC8:40A7 handles this function:
FDC8:40FE 80FC33 CMP AH,33
FDC8:4101 7218 JB 411B
FDC8:4103 74A2 JZ 40A7
We can now unassemble (rather than trace) at this address, using DEBUG or any other DOS debugger. Comments have been added to the output in figure 6-5, which has also been cleaned up slightly.
C:\UNDOC2\CHAP6>debug
-u fdc8:40a7
FDC8:40A7 EBA9 JMP 4052
-u fdc8:4052
FDC8:4052 3C06 CMP AL,06 ; functions 3300h through 3306h
FDC8:4054 7603 JBE 4059
FDC8:4056 B0FF MOV AL,FF ; error: subfunction number too high
FDC8:4058 CF IRET
FDC8:4059 1E PUSH DS ; save caller's DS
FDC8:405A 2E CS:
FDC8:405B 8E1EE73D MOV DS,[3DE7] ; switch to DOS's DS; hmm, not truly
FDC8:405F 50 PUSH AX ; reentrant after all!
FDC8:4060 56 PUSH SI
FDC8:4061 BE3703 MOV SI,0337 ; offset of break flag: SDA+17h
FDC8:4064 32E4 XOR AH,AH ; see if subfunct 0
FDC8:4066 0BC0 OR AX,AX
FDC8:4068 7504 JNZ 406E
FDC8:406A 8A14 MOV DL,[SI] ; 21/3300 -- get break flag
FDC8:406C EB35 JMP 40A3
FDC8:406E 48 DEC AX ; see if subfunct 1
FDC8:406F 7507 JNZ 4078
FDC8:4071 80E201 AND DL,01
FDC8:4074 8814 MOV [SI],DL ; 21/3301 -- set break flag
FDC8:4076 EB2B JMP 40A3
FDC8:4078 48 DEC AX ; see if subfunct 2
FDC8:4079 7507 JNZ 4082
FDC8:407B 80E201 AND DL,01
FDC8:407E 8614 XCHG DL,[SI] ; 21/3302 (UNDOC) -- get/set brk flg
FDC8:4080 EB21 JMP 40A3 ; as single atomic operation: XCHG
FDC8:4082 3D0300 CMP AX,0003 ; see if subfnc 5 (already subtracted 2)
FDC8:4085 7506 JNZ 408D
FDC8:4087 8A166900 MOV DL,[0069] ; 21/3305 -- get startup drive
FDC8:408B EB16 JMP 40A3
FDC8:408D 3D0400 CMP AX,0004 ; see if subfnc 6 (already subtracted 2)
FDC8:4090 7511 JNZ 40A3
FDC8:4092 BB0600 MOV BX,0006 ; 21/3306 -- MS-DOS version 6.0
FDC8:4095 B200 MOV DL,00
FDC8:4097 32F6 XOR DH,DH
FDC8:4099 803E111200 CMP BYTE PTR [1211],00 ; is DOS=HIGH?
FDC8:409E 7403 JZ 40A3
FDC8:40A0 80CE10 OR DH,10 ; DOSINHMA flag
FDC8:40A3 5E POP SI ; done: restore caller's regs
FDC8:40A4 58 POP AX
FDC8:40A5 1F POP DS
FDC8:40A6 CF IRET ; return to caller
In addition to showing how DOS happens to handle function 33h, the code in figure 6-5 also provides many snippets of information than can be used to understand the disassembly listing of other parts of MS-DOS. For example, Microsoft documents INT 21h AX=3306h as returning the DOSINHMA flag in DH. The end of figure 6-5 shows DOS using the byte at DOS_DS:[1211h] to set DH. Therefore, DOS_DS:[1211h] must be the DOS=HIGH indicator. This is not important by itself, but you can use this factoid to help you understand other parts of the code: anywhere you see DOS:DS:[1211h], you now know that this is the DOSINHMA flag.
Similarly, functions 3300h and 3301h are known to get and set the Ctrl-C flag; figure 6-5 shows these functions manipulating the byte at offset 0337h in the DOS data segment; this byte must then be the Ctrl-C (or break) flag. (Later on, at step in figure 6-7, we'll see how DOS uses this flag.) Finally, Microsoft documents INT 21h AX=3305h as returning the startup drive in DL, and the code in figure 6-5 clearly shows DOS setting DL from DOS_DS:[0069h]. Therefore, anywhere else in the code where you see DOS_DS:[0069h], you can now translate this to STARTUP_DRIVE. Q.E.D.
Another interesting location to examine is the function that DOS's low memory stub calls when DOS=HIGH, but the A20 line is disabled. The processor's A20 address line accesses memory above one megabyte. PCs based on 286 and higher processors disable A20 in order to emulate address wraparound on 8088 PCs. If DOS=HIGH but A20 is off, DOS must enable A20 before it can reach its code in HMA above one megabyte. But if DOS's code is located above one megabyte, how can it check A20 in the place? With a function that it keeps in low memory when DOS=HIGH. Earlier (figure 6-3) you saw this was located at 0116:118E; figure 6-6 shows what this function actually does.
-u 116:118e
0116:118E 53 PUSH BX
0116:118F 50 PUSH AX
0116:1190 8CD0 MOV AX,SS
0116:1192 2E CS:
0116:1193 A38610 MOV [1086],AX
0116:1196 2E CS:
0116:1197 89268810 MOV [1088],SP ; save caller's stack
0116:119B 8CC8 MOV AX,CS ; switch to a DOS stack; hmm, not
0116:119D 8ED0 MOV SS,AX ; reentrant at all if A20 off!
0116:119F BCA007 MOV SP,07A0 ; SDA+480h=end of Crit Err Stack
0116:11A2 B405 MOV AH,05 ; XMS func 5 = Local Enable A20
0116:11A4 2E CS:
0116:11A5 FF1E6311 CALL FAR [1163] ; XMS address from 2F/4310
0116:11A9 0BC0 OR AX,AX
0116:11AB 740F JZ 11BC ; failed: can't turn A20 on!!
;;; okay:
0116:11AD 2E CS:
0116:11AE A18610 MOV AX,[1086]
0116:11B1 8ED0 MOV SS,AX
0116:11B3 2E CS:
0116:11B4 8B268810 MOV SP,[1088] ; switch back to caller's stack
0116:11B8 58 POP AX
0116:11B9 5B POP BX
0116:11BA EBCB JMP 1187 ; jump back into normal code (fig. 6-3)
; as if A20 had been enabled all along.
;;; fail:
0116:11BC B40F MOV AH,0F ; come here if couldn't enable A20
0116:11BE CD10 INT 10 ; get video mode
0116:11C0 3C07 CMP AL,07
0116:11C2 7406 JZ 11CA
0116:11C4 32E4 XOR AH,AH
0116:11C6 B002 MOV AL,02 ; set normal text mode
0116:11C8 CD10 INT 10
0116:11CA B405 MOV AH,05
0116:11CC 32C0 XOR AL,AL ; set display page 0
0116:11CE CD10 INT 10
0116:11D0 BEB812 MOV SI,12B8 ; 12B8 -> "\nA20 Hardware Error\n$"
0116:11D3 0E PUSH CS
0116:11D4 1F POP DS
0116:11D5 FC CLD
0116:11D6 AC LODSB
0116:11D7 3C24 CMP AL,24 ; look for '$'
0116:11D9 7409 JZ 11E4
0116:11DB B40E MOV AH,0E ; write in TTY mode (use BIOS
0116:11DD BB0700 MOV BX,0007 ; since can't make DOS calls
0116:11E0 CD10 INT 10 ; here!)
0116:11E2 EBF2 JMP 11D6
0116:11E4 FB STI
0116:11E5 EBFD JMP 11E4 ; tight little loop (INTs on)
-d 116:12b8
0116:12B0 -0D 0A 41 32 30 20 48 61 ..A20 Ha
0116:12C0 72 64 77 61 72 65 20 45-72 72 6F 72 0D 0A 24 36 rdware Error..$6
Notice, by the way, that DOS leaves the A20 line on. This reduces the overhead of keeping the DOS code in the HMA: DOS probably doesn't have to call the low-memory stub in figure 6-6 very often.
That all calls to DOS in the HMA are guarded with this low- memory stub brings up an interesting question: what about data in the HMA? MS-DOS doesn't put internal data structures such as the Current Directory Structure (CDS) and System File Tables (SFT) up in the HMA, because this would break too many third-party applications that peek and poke these ostensibly-internal structures and that wouldn't know to ensure that A20 is enabled. However, DOS does keep its BUFFERS in the HMA. If a program such as BUFFERS.C in chapter 8 (see listing 8-8) accesses the DOS sector buffers ("or if some future version of DOS has FILESHIGH or LASTDRIVEHIGH statements that use HMA," adds one tech reviewer), the program would need to check and possible reenable A20, just like DOS does in figure 6-6. But since, from what we've just seen, any trivial DOS call will ensure that A20 is turned on, perhaps a program that accesses data in the HMA merely needs to preface that access with a trivial DOS call: DOS will take care of checking the A20 state and, if necessary, calling XMS function 5 to enable A20. But any TSR could turn it off! How frequently should programs that access the HMA check the A20 state? How much of a problem is this? Is the extra few kbytes gained by putting data in the HMA worth this kind of uncertainty? ("Ouch! This makes my head hurt," says one of the tech reviewers)
Of all the addresses we found through tracing the INT 21h call, the most important is that of DOS's INT 21h handler, seen above in figure 6-3 at FDC8:40F8. This is really the piece of information we wanted all along. To see exactly what happens during an INT 21h call, we can now disassemble at this address. By tracing an INT 21h AH=62h, we only saw those snippets that happen to get executed when calling the Get PSP function; we can now look at the entire function. Here it is (figure 6-7), the DOS INT 21h handler (this time we've used SYMDEB and added some labels as well as comments). In Microsoft's source code, this all important function, located in MSDISP.ASM, is called COMMAND.
-u fdc8:40f8
FDC8:40F8 FA CLI ; disable interrupts
FDC8:40F9 80FC6C CMP AH,6C
FDC8:40FC 77D2 JA 40D0 ; invalid function number
; step 1
FDC8:40FE 80FC33 CMP AH,33
FDC8:4101 7218 JB 411B ; normal DOS function
FDC8:4103 74A2 JZ 40A7 ; do 21/33 (fig. 6-5)
FDC8:4105 80FC64 CMP AH,64
FDC8:4108 7711 JA 411B ; normal DOS function
FDC8:410A 74B5 JZ 40C1 ; do 21/64
FDC8:410C 80FC51 CMP AH,51
FDC8:410F 74A4 JZ 40B5 ; do Get PSP
FDC8:4111 80FC62 CMP AH,62
FDC8:4114 749F JZ 40B5 ; do Get PSP (51==62)
FDC8:4116 80FC50 CMP AH,50
FDC8:4119 748E JZ 40A9 ; do Set PSP (fig. 6-4)
normal_DOS:
; step 2
; caller's flags, CS, and IP of course already pushed on the stack by INT
FDC8:411B 06 PUSH ES ; 10h ; Save regs on caller's stack.
FDC8:411C 1E PUSH DS ; 0Eh ; The order is important, as
FDC8:411D 55 PUSH BP ; 0Ch ; later on different INT 21h
FDC8:411E 57 PUSH DI ; 0Ah ; functions will access the
FDC8:411F 56 PUSH SI ; 08h ; caller's original registers
FDC8:4120 52 PUSH DX ; 06h ; by treating this stack frame
FDC8:4121 51 PUSH CX ; 04h ; as a structure. See 2f/1218.
FDC8:4122 53 PUSH BX ; 02h ; For example, caller's BX
FDC8:4123 50 PUSH AX ; 00h ; is at offset 2, ES at 10h.
; step 3
FDC8:4124 8CD8 MOV AX,DS
FDC8:4126 2E8E1EE73D MOV DS,CS:[3DE7] ; get DOS DS
FDC8:412B A3EC05 MOV [05EC],AX ; save caller's DS
FDC8:412E 891EEA05 MOV [05EA],BX ; save caller's BX
FDC8:4132 A18405 MOV AX,[0584] ; SDA+264h = ptr to stack frame
FDC8:4135 A3F205 MOV [05F2],AX ; containing user registers
FDC8:4138 A18605 MOV AX,[0586] ; on INT 21h
FDC8:413B A3F005 MOV [05F0],AX
; step 4
FDC8:413E 33C0 XOR AX,AX ; set AX=0
FDC8:4140 A27205 MOV [0572],AL
FDC8:4143 F606301001 TEST Byte Ptr [1030],01 ; Is Win3 Enh running?
FDC8:4148 7503 JNZ 414D
; following line only if Windows 3 Enhanced mode not running!
FDC8:414A A33E03 MOV [033E],AX ; set machine ID to zero
; step 5
FDC8:414D FE062103 INC Byte Ptr [0321] ; increment InDOS flag
; step 6
FDC8:4151 89268405 MOV [0584],SP ; SDA+264h
FDC8:4155 8C168605 MOV [0586],SS ; save current stack ptr
FDC8:4159 A13003 MOV AX,[0330] ; get current PSP
FDC8:415C A33C03 MOV [033C],AX ; SDA+1Ch = SHARE, NET PSP
FDC8:415F 8ED8 MOV DS,AX ; point DS at caller's PSP
FDC8:4161 58 POP AX
FDC8:4162 50 PUSH AX ; get back caller's AX
FDC8:4163 89262E00 MOV [002E],SP ; save current stack ptr
FDC8:4167 8C163000 MOV [0030],SS ; in caller's PSP
FDC8:416B 2E8E16E73D MOV SS,CS:[3DE7]
; INT 21h AX=5D00h (Server Function Call) jumps to here
; switch stack to 07A0h-SDA =
FDC8:4170 BCA007 MOV SP,07A0 ; SDA+480h=end of Crit Err Stk
; step 7
FDC8:4173 FB STI ; reenable interrupts
FDC8:4174 8CD3 MOV BX,SS
FDC8:4176 8EDB MOV DS,BX ; point DS at DOS_DS
FDC8:4178 93 XCHG AX,BX ; caller's AX into BX
FDC8:4179 33C0 XOR AX,AX
FDC8:417B 36A2F605 MOV SS:[05F6],AL ; extended open off?
FDC8:417F 36812611060008 AND Word Ptr SS:[0611],0800
FDC8:4186 36A25703 MOV SS:[0357],AL ; set different vars to 0
FDC8:418A 36A24C03 MOV SS:[034C],AL
FDC8:418E 36A24A03 MOV SS:[034A],AL
FDC8:4192 40 INC AX
FDC8:4193 36A25803 MOV SS:[0358],AL ; okay to do INT 28h
; step 8
FDC8:4197 93 XCHG AX,BX ; get
back caller's AX
FDC8:4198 8ADC MOV BL,AH ; DOS
func num into BL
FDC8:419A D1E3 SHL BX,1 ; make DOS func number
into word ofs
; step 9
FDC8:419C FC CLD
FDC8:419D 0AE4 OR AH,AH
FDC8:419F 7417 JZ 41B8 ; AH=0 (terminate process)
FDC8:41A1 80FC59 CMP AH,59
; if 21/59 (get critical error), bypass code that turns off critical error!
FDC8:41A4 7444 JZ 41EA ; AH=59h (get extended error)
FDC8:41A6 80FC0C CMP AH,0C
FDC8:41A9 770D JA 41B8 ; AH > 0Ch
INT21_01_THRU_0C:
; step 10
FDC8:41AB 36803E200300 CMP Byte Ptr SS:[0320],00 ; critical error set?
FDC8:41B1 7537 JNZ 41EA ; if so, stay with crit error stack
FDC8:41B3 BCA00A MOV SP,0AA0 ; SDA+780h=end of Char I/O Stack
FDC8:41B6 EB32 JMP 41EA
INT21_00:
INT21_ABOVE_0C: ;;; except (normally) 33h, 50h, 51h, 59h, 62h, 64h
; step 11
FDC8:41B8 36A33A03 MOV SS:[033A],AX
FDC8:41BC 36C606230301 MOV Byte Ptr SS:[0323],01 ; crit err locus
FDC8:41C2 36C606200300 MOV Byte Ptr SS:[0320],00 ; turn off crit error
FDC8:41C8 36C6062203FF MOV Byte Ptr SS:[0322],FF ; crit err drive#
; Windows Enhanced mode patches next four lines into a far call!
FDC8:41CE 50 PUSH AX
FDC8:41CF B482 MOV AH,82
FDC8:41D1 CD2A INT 2A ; End crit section
FDC8:41D3 58 POP AX
FDC8:41D4 36C606580300 MOV Byte Ptr SS:[0358],00 ; no INT 28h
FDC8:41DA BC2009 MOV SP,0920 ; SDA+600h = end of Disk Stack
FDC8:41DD 36F6063703FF TEST Byte Ptr SS:[0337],FF ; SDA+17h=break flag
FDC8:41E3 7405 JZ 41EA
FDC8:41E5 50 PUSH AX ; BREAK=ON, so
FDC8:41E6 E8964E CALL 907F ; check ctrl-break
FDC8:41E9 58 POP AX
; step 12
;;; next four lines are the key; call through dispatch table
;;; BX holds caller's INT 21h function number SHL 1 (word offset)
FDC8:41EA 2E8B9F9E3E MOV BX,CS:[BX+3E9E] ; get func handler addr
FDC8:41EF 36871EEA05 XCHG BX,SS:[05EA] ; move func ptr into var
FDC8:41F4 368E1EEC05 MOV DS,SS:[05EC] ; switch to caller's saved DS
FDC8:41F9 36FF16EA05 CALL SS:[05EA] ; call func handler addr!
;;; we've just called the DOS function for the specific DOS function in AH
; step 13
;;; now into cleanup preparatory to returning to caller
FDC8:41FE 3680268600FB AND Byte Ptr SS:[0086],FB
FDC8:4204 FA CLI
FDC8:4205 2E8E1EE73D MOV DS,CS:[3DE7] ; switch back to DOS DS
FDC8:420A 803E850000 CMP Byte Ptr [0085],00
FDC8:420F 7527 JNZ 4238
FDC8:4211 FE0E2103 DEC Byte Ptr [0321] ; decrement InDOS
FDC8:4215 8E168605 MOV SS,[0586] ; switch back to caller's
FDC8:4219 8B268405 MOV SP,[0584] ; stack
FDC8:421D 8BEC MOV BP,SP
FDC8:421F 884600 MOV [BP+00],AL
FDC8:4222 A1F205 MOV AX,[05F2]
FDC8:4225 A38405 MOV [0584],AX ; caller's SP
FDC8:4228 A1F005 MOV AX,[05F0]
FDC8:422B A38605 MOV [0586],AX ; caller's SS
FDC8:422E 58 POP AX ; put back caller's
FDC8:422F 5B POP BX ; registers, including
FDC8:4230 59 POP CX ; any changes the DOS
FDC8:4231 5A POP DX ; function made to them
FDC8:4232 5E POP SI
FDC8:4233 5F POP DI
FDC8:4234 5D POP BP
FDC8:4235 1F POP DS
FDC8:4236 07 POP ES
FDC8:4237 CF IRET
The dispatch function in figure 6-7 is the heart of DOS. It is executed every time a program issues an INT 21h call. The dispatch function is the DOS equivalent of the function syscall() in UNIX, which has been examined in books such as Bach's Design of the UNIX Operating System (pp. 165-168) and Andleigh's UNIX System Architecture (pp. 21-23). The discussions of syscall() in these and other UNIX books provides a useful background for to understanding the INT 21h dispatch code. However, in UNIX there is a clear separation between applications and the operating system. The discussions of syscall() emphasize the transition from user mode to kernel mode. As you can see, there is nothing like this in DOS, though DOS extenders such as Windows do maintain a separation between the application running in protected mode and DOS running in real mode. Actually, there is one important separation. DOS usually switches from the application's stack to one of its own. This important aspect of DOS will be discussed in detail below.
Near the top of the function (commented "step 1"), you see how DOS picks off a handful of special functions (33h, 64h, 51, 62h, and 50h). These of course are none other than what we've been calling the reentrant DOS functions. Here, reentrancy simply means that, while the above code is executing—after it has passed the initial CLI, and before it has executed the closing IRET—it could be interrupted by an interrupt handler, and the interrupt handler could call one of these five functions. These five functions are reentrant simply in the sense that DOS handles them before switching stacks and incrementing the InDOS flag. Thus, an interrupt handler can call these functions, even if the InDOS or critical error flag is set.
In a larger sense, of course, these functions aren't really reentrant, given the way that, for example, the Set PSP function writes to a global variable (see figure 6-4). MS-DOS's extensive reliance on global variables makes it completely non-reentrant. Furthermore, if DOS=HIGH and the A20 line is off, DOS, as figure 6-6 showed, has to switch stacks. But in any case, it should now be clear why we picked INT 21h AH=62h to trace with DEBUG and not, say, INT 21h AH=52h; DOS handles the latter function only after switching stacks.
Next (step 2), the INT 21h dispatch code pushes the caller's registers on to the caller's stack. The caller is of course simply whatever program issued the INT 21h call. This can be slightly disorienting because, of course, we're used to thinking about INT 21h from the caller's perspective and now we're looking at it from DOS's point of view. These pushed registers form a structure that many DOS functions use later on. Undocumented INT 2Fh AX=1218 (Get Caller's Registers; see appendix) returns a pointer to this structure.
At step 3 in figure 6-7, DOS saves away the caller's DS and BX again, and switches from the caller's DS to its own DS. DOS keeps DS in a variable accessible through DOS's CS. It also available by calling INT 2Fh AX=1203h (see get_dos_ds() in listing 6-2). Note that, even though DOS=HIGH and the DOS code is in the HMA, the data segment is still in low memory. This is necessary because many existing DOS programs rely on the ability to reach DOS internal data structures and wouldn't know to check the status of the A20 line. Microsoft has to introduce improvements such as DOS=HIGH without breaking existing applications.
The next interesting thing the code does (step 4) is check a variable at 1030h to see whether Windows 3.x Enhanced mode (or Windows/386 2.x) is running. Since most of us think of Windows as something that runs "on top of" DOS, it is a bit disconcerting at to learn that DOS 5.0 and higher knows about Windows. As discussed in chapter 1, however, this part at least of the intricate DOS/Windows connection is implemented using documented functionality. In its INT 2Fh handler, MSDOS.SYS monitors the AX=1605h Windows initialization and AX=1606h exit broadcasts; the code for AX=1605h sets the variable at 1030h (actually, just the byte at 1031h), and the code for AX=1606h clears it. This variable thus serves as a kind of InWindows flag. It's important to underline that this is for Enhanced mode only; DOS doesn't care one way or the other about Standard mode.
If Windows Enhanced mode is not running, then DOS zeroes out a variable at 033Eh (SDA+1Eh), used by DOS as the machine ID. If Windows Enhanced mode is running, the DOSMGR VxD (as explained in chapter 1) has smacked a virtual machine ID in here. DOS uses this VM ID to manage SFTs.
Next (step 5), the code increments the InDOS flag, which is simply a variable at 0321h (SDA+1) in the DOS data segment. The until-recently-undocumented function INT 21h AH=34h (Get InDOS Flag Address) returns a pointer to this variable.
The InDOS flag has been set, so we're now "in DOS"! Of course, we were in DOS before, but the significance of this spot is that DOS is about to switch stacks. Switching stacks requires a guard or semaphore, namely the InDOS flag. Notice, however, that while DOS increments the InDOS flag, it does not check it before proceeding. Thus, InDOS is not a true semaphore. If the processor is interrupted in the middle of this code (or, rather, a little further on when DOS reenables interrupts with an STI instruction), the code can be reentered.
In other words, DOS does nothing to enforce its requirement that only one caller at a time execute inside the INT 21h code. Obeying the InDOS flag is merely a convention. But it is vital that programs do observe this convention, because making an INT 21h call when InDOS is set will almost always cause problems. For one thing, DOS relies on many global variables. If, for example, DOS were working with a particular hard-disk cluster to service an INT 21h file I/O call, and an interrupt handler that ignored the InDOS flag made a file I/O call to DOS before DOS had finished with the one, DOS would mistakenly use the second caller's cluster to satisfy (not!) the caller's request. Global variables do not work like a last-in/-out stack. It is vital that interrupt handlers check InDOS before issuing INT 21h calls. (So why did it take Microsoft so long to document InDOS and the INT 21h AH=34h function that returns a pointer to it?)
Ignoring InDOS can cause another problem. Because the code at step 5 in figure 6-7 increments InDOS, reentering DOS means that InDOS will take on a value of two or greater. This is bad, because the internal DOS function that checks for Ctrl-C only does so when CMP Byte Ptr IN_DOS, 01. Thus, if InDOS is 2 or greater, DOS won't check Ctrl-C, even if BREAK=ON.
There is a method by which DOS can be safely reentered: if the entire DOS state (including all three DOS stacks) is saved and restored by each caller, and if each such caller observes the DOS critical sections by hooking INT 2Ah. The SDA TSR technique put forward in chapter 9 is an approximation to this method, though only an approximation because the SDA does not include the entire DOS state.
Returning to step 6 in figure 6-7, you can see the beginnings of the stack switching code. How does DOS switch away from the user's stack to one of its own? , it saves away the caller's current SS:SP. Next, DOS gets the current PSP (at 0330h, or SDA+10h) and uses it to save the caller's SS:SP at offset 2Eh in the caller's PSP. Finally, it sets SS:SP to a DOS stack. Depending on the DOS function number, it may switch again to a different DOS stack; see below.
What is the purpose of stack switching? Why not just use the caller's stack? Wouldn't that make DOS much more reentrant? Yes, it would. As it is, making an INT 21h call already uses 18h bytes on the caller's stack (see table 6-2). If the caller could be relied upon to provide a large enough stack, DOS could even be multithreaded. Unfortunately, DOS has to accommodate programs with unknown stack sizes. This complicates DOS tremendously and helps make it non-reentrant.
At the very end of step 6, where DOS points SP at the Critical Error stack, is a location (called Redisp in the source code) to which undocumented INT 21h AX=5D00h (Server Function Call) jumps. This function is a backdoor into the INT 21h dispatcher. If a network-aware program hooks this call, it can be used by one machine to do remote INT 21h calls on another machine (or perhaps to another Windows virtual machine).
Skipping over a bunch of the code in step 7, which zeroes out several variables in the DOS data segment, we come to step 8, where the code takes the caller's AH (with the crucial DOS function number) and turns it into a word offset in BX. This will be important later on.
Next (step 9), DOS examines the function number in AH. If AH=59 (Get Extended Error) is being called, DOS proceeds directly to step 12, where the code for function 59h will be called. It stays on the Critical Error Stack, bypassing more stack-switching code in steps 10 and 11, and bypassing code that would obliterate information pertaining to any pending Critical Error.
If one of the CP/M-based character I/O functions (INT 21h AH=1 through AH=0Ch) is being called, DOS (step 10) points SP at 0AA0h, which is the top of the character I/0 stack, located in the Swappable Data Area (see appendix). However, if there is a pending critical error, DOS stays with the Critical Error stack that was set initially. This is not surprising, since Microsoft documents these functions (MS-DOS Programmer's Reference) as callable from a critical error handler. Notice that DOS does not turn off critical error information for functions 1 through 0Ch. As you can see, much of the core DOS code accommodates critical errors.
Finally, if the DOS function number is 0 (Terminate Program), or anything greater than 0Ch, but not 59h, and not one of the special functions that were picked off earlier in step 1 and which DOS already processed on the caller's stack, DOS (step 11) switches to the disk stack. Thus, there are three DOS stacks:
Critical Error (or auxiliary), used for function 59h and for functions 1 through 0Ch when a critical error is pending, and used temporarily for any DOS function call if DOS=HIGH but A20 is off.For the majority of functions running on the disk stack, the code (step 11) carries out a number of tasks, turning off critical error, calling undocumented INT 2Ah AH=82h to end any critical sections (see below), and checking the Ctrl-C Check flag at SDA+17h. In figure 6-5, you saw the code for INT 21h AX=3301h that sets this flag when a user types BREAK=ON. Now you can see where DOS actually uses this flag. If BREAK=ON (that is, if the flag at SDA+17h is non-zero), DOS calls a subroutine (here located at 907Fh) to check Ctrl-C for the functions that come through here. Otherwise, DOS (elsewhere) only checks Ctrl-C for functions 1 through 0Ch. As noted earlier, the DOS internal function to check Ctrl-C will only do so if IN_DOS == 1.
What is this call to INT 2Ah AH=82h? Normally, the INT 2Ah handler in DOS does an immediate IRET, performing no operation. However, other programs can take over INT 2Ah and/or patch DOS. Windows Enhanced mode, in particular, uses INT 2Ah critical sections because it runs preemptively multitasked DOS boxes on top of a single copy of MS-DOS. Because the InDOS flag is instanced per VM (that is, each DOS box gets its own copy), it cannot be used to control access to DOS by different VMs. Nor would you want the InDOS flag to do that, as different VMs could be in different parts of DOS at the same time. What different parts? Different critical sections can be set and cleared with INT 2Ah AH=80h and 81h (see appendix). DOS's call to INT 2Ah AH=82h is a signal that a multitasking extension to DOS, such as Windows or networking software, can restart any task (VM) that was suspended because it was waiting for a critical section. For additional information on critical sections, see chapter 1 and chapter 9 (see CRITSECT.C in listing 9-XX ). Also see Microsoft's MS-DOS 6 Technical Reference (p. 41), which briefly discusses critical sections in the context of the MRCI specification.
As discussed later in this chapter, the DOSMGR VxD in Windows Enhanced mode patches this INT 2Ah AH=82h in the INT 21h dispatch, turning it into a far call into Windows. When Windows exits, of course it (hopefully) puts back the original code.
With all this talk of critical errors, Ctrl-Break, and critical sections—which do dominate the DOS dispatch code—it is important not to lose sight of the main goal, which is that a program wants to call a DOS function! As is typical of software, DOS accomplishes this main goal in only a few lines, while rarer situations such as critical errors and so on occupy the bulk of the code.
Having switched to an appropriate stack, saved the caller's registers, and so on, step 12 in figure 6-7 is the simplest and the most important. Recall that step 8 moved the function number in AH into BX and multiplied by two. DOS will now use this value as an offset into an array of function pointers, one for each DOS function. Here, the table is at CS:3E9E, so that for example the address for DOS function 0 is at CS:3E9E, function 1's address is at CS:3EA0, function 2's address is at CS:3EA2, and so on. Since this array holds two-byte words, not far pointers, you can't use it to hook individual DOS functions. All handlers must be located in a single segment (here, FDC8h). We come back to this array of function pointers momentarily; it is very important to us. In any case, having retrieved the address of the handler for the DOS function being called, DOS calls the handler. Ta da! The function the user wanted has now been called.
In step 13, having invoked the appropriate handler for the DOS function, DOS decrements the InDOS flag, switches back to the caller's stack, and pops back the caller's registers from the register image created on the stack back in step 2. As you'll see in a moment, the handler for the specific DOS function probably modifies the register image. Finally, DOS returns to the caller with an IRET. Since IRET pops the flags off the stack, the specific DOS functions have to set or clear the carry flag by modifying its image that the processor pushed on the stack as part of the initial INT (see the comment to step 2).
Seeing the DOS dispatch code in figure 6-7, it should now be clear why a DEBUG trace through an INT 21h AH=62h call works, but tracing for example through a call to INT 21h AH=52h wouldn't. A call to AH=52h would involve switching stacks, mucking with global variables in the DOS data segment, and so on. DEBUG itself uses DOS, so you would end up instead tracing through one of the DOS calls that DEBUG would be making to display our information. A complete mess! One alternative of course is to use a debugger that bypasses DOS, such as Soft-ICE (or SERMON from the edition of Undocumented DOS which, however, did not support tracing through INT instructions).
However, we really don't need to trace through INT 21h any more. We now have the address of COMMAND (The One True INT 21h Handler) and the address of the function pointer array (called Dispatch in the DOS source code) and can thus unassemble at leisure, rather than trace under pressure, so to speak.
To find the code that handles each specific DOS function, you need do nothing more than dump out the Dispatch table, which you can see from step 12 in figure 6-7 is located at FDC8:3E9E. This table of two-byte words is conveniently dumped with SYMDEB's dw command:
-dw fdc8:3e9e
FDC8:3E9E A1F6 54E0 54E9 559F 55BC 55C2 541C 544B
FDC8:3EAE 51BA 5214 5220 55D6 55E0 4DA1 4C78 5CCC
FDC8:3EBE 5688 5DDF 5E73 5625 5DCB 5DD0 5DB1 56F9
FDC8:3ECE 440D 4C73 4C68 4D2D 4D2F 440D 440D 4D71
FDC8:3EDE 440D 5DD5 5DDA 5639 560D 4C9A 4EB6 5DC6
FDC8:3EEE 5DC1 4D22 4839 4856 4876 4887 4A46 4C54
FDC8:3EFE 4A1C A19A 4D73 4052 4D59 4C8A 4C2B 4CC9
FDC8:3F0E 4A4D 60E1 6029 6065 AFE6 AF0F A72A A839
; ... etc. ...
You can double-check that all is in order by looking for a known function. Let's see what the table shows for function 62h (although we know it usually gets picked off in step 1 of figure 6-7, only coming through this table in the unlikely event of an INT 21h AX=5D00h indirect function call of AH=62h):
-dw fdc8:(3e9e+62*2)
FDC8:3F62 40B5 ......
-u fdc8:40b5
FDC8:40B5 1E PUSH DS
FDC8:40B6 2E8E1EE73D MOV DS,CS:[3DE7]
FDC8:40BB 8B1E3003 MOV BX,[0330]
FDC8:40BF 1F POP DS
FDC8:40C0 CF IRET
That's it! So you now have the DOS dispatch table and can examine at will the code for any DOS function you're interested in.
However, examination of this table and others like it is made easier with a short C program, FTAB.C, shown in listing 6-4. FTAB can display tables of bytes (1), words (2), or dwords (4).
/* FTAB.C */
#include <stdlib.h>
#include <stdio.h>
#include <dos.h>
typedef unsigned char BYTE;
typedef unsigned short WORD;
typedef unsigned long DWORD;
void fail(const char *s) { puts(s); exit(1); }
main(int argc, char *argv[])
{
char *prefix;
void far *tab;
BYTE far *btab;
WORD far *wtab;
DWORD far *dtab;
WORD seg, ofs;
int num_func, size, i;
if (argc < 4)
fail("usage: ftab < seg:ofs > < num_func | ? > [prefix][size]");
sscanf(argv[1], "%04X:%04X", &seg, &ofs);
tab = (void far *) MK_FP(seg, ofs);
if (argv[2][0] == '?')
{
num_func = *((BYTE far *) tab); /* first BYTE is #func */
tab = ((BYTE far *) tab + 1); /* then array of WORDs */
}
else
sscanf(argv[2], "%04X", &num_func);
prefix = (argc > 3) ? argv[3] : "func";
size = (argc > 4) ? atoi(argv[4]) : 2; /* default to WORD table */
switch (size)
{
case 1:
for (i=0, btab=(BYTE far *)tab; i < num_func; i++, btab++)
printf("%02X\t%s_%02X\n",
*btab, prefix, i);
break;
case 2:
for (i=0, wtab=(WORD far *)tab; i < num_func; i++, wtab++)
printf("%04X:%04X\t%s_%02X\n",
seg, *wtab, prefix, i);
break;
case 4:
for (i=0, dtab=(DWORD far *)tab; i < num_func; i++, dtab++)
printf("%Fp\t%s_%02X\n",
*dtab, prefix, i);
break;
default:
fail("size only 1 (byte), 2 (word), 4 (dword)");
}
return 0;
}
To generate a list of the 6Dh (0 through 6Ch) different DOS INT 21h function handlers ( "72h, not 6Ch, is the highest function number in the DOS 7.0 component of Chicago,"" says one tech reviewer), run FTAB on the table at FDC8:3E9E. Figure 6-8 shows sample output from FTAB.
C:\UNDOC2\CHAP6>ftab fdc8:3e9e 6D int21 2
FDC8:A1F6 int21_00
FDC8:54E0 int21_01
FDC8:54E9 int21_02
FDC8:559F int21_03
; ...
FDC8:4C9A int21_25
; ...
FDC8:4D59 int21_34
FDC8:4C8A int21_35
; ...
FDC8:AF0F INT21_3D
FDC8:A72A INT21_3E
FDC8:A839 INT21_3F
FDC8:A89F INT21_40
FDC8:B038 INT21_41
FDC8:A8A4 INT21_42
; ...
FDC8:40A9 int21_50
FDC8:40B5 int21_51
FDC8:4D65 int21_52
FDC8:4DD6 int21_53
FDC8:4A41 int21_54
FDC8:4EA5 int21_55
FDC8:B05E int21_56
FDC8:A90C int21_57
FDC8:A448 int21_58
FDC8:4CDD int21_59
FDC8:B0E9 int21_5A
FDC8:B0D1 int21_5B
FDC8:B2D8 int21_5C
FDC8:A531 int21_5D
FDC8:AA49 int21_5E
FDC8:A9AA int21_5F
FDC8:AEA8 int21_60
FDC8:440D int21_61
FDC8:40B5 int21_62
; ... etc. ...
Confirming that this table is correct, you can see that int21_51 and int21_62 are located at the same address (FDC8:40B5), as they should be.
To check that the FTAB output in figure 6-8 is really correct, examine another function that should be simple, INT 21h AH=52h, which returns a far pointer in ES:BX to SysVars. According to the FTAB output, the code to handle function 52h should be at FDC8:4D65, so can you use SYMDEB or DEBUG to unassemble at that address. Figure 6-9 shows the results.
-u fdc8:4d65
FDC8:4D65 E81AF5 CALL 4282
FDC8:4D68 C744022600 MOV Word Ptr [SI+02],0026
FDC8:4D6D 8C5410 MOV [SI+10],SS
FDC8:4D70 C3 RET
In fact, calling Get SysVars in this particular configuration does return 0116:0026, so the hardwired 0026h above does look correct. But what is going on here?! How come we don't see SS:0026 being moved into ES:BX? What are [SI+02] and [SI+10h]?
To answer these questions, let's examine the subroutine being called at 4282h:
-u fdc8:4282
FDC8:4282 2E8E1EE73D MOV DS,CS:[3DE7]
FDC8:4287 C5368405 LDS SI,[0584]
FDC8:428B C3 RET
CS:3D37h is just our old friend, the DOS data segment, whose value DOS keeps in its code segment. (DOS stores the value of DS in its code segment because, when an INT occurs, DS isn't known, but CS is.) So this subroutine is setting itself up to use DOS's DS, just as the code did back in Figures 6-3, 6-4, and 6-7 for Get PSP, Set PSP, and the INT 21h dispatch.
The subroutine then loads DS:SI with something at DOS:584h. In step 6 of the INT 21h dispatch code in figure 6-7, you saw DOS set the dword at DOS:584h to the caller's SS:SP. In other words, DOS:584h contains a pointer to the caller's stack, with all the registers that were pushed upon it during step 2 of figure 6-7 (and earlier, as part of the actual INT instruction). Sure enough, the comments to step 3 point out that DOS:584h in this configuration happens to be SDA+264h, which the appendix identifies as "a pointer to the stack frame containing the user registers on entry to the INT 21h call."
So the subroutine at FDC8:4282 loads DS:SI with a pointer to the caller's pushed register structure. Given the order in which steps pushes registers, it won't surprise you to learn that the client register structure has the format shown below in table 6-2.
00h AX
02h BX
04h CX
06h DX
08h SI
0Ah DI
0Ch BP
0Eh DS
10h ES
12h IP
14h CS
16h flags
In figure 6-9, the code for function 52h at FDC8:4D65 moves 26h into [SI+2] and DOS's SS (DS) into [SI+10h]. DS:SI points at the caller's register structure, where offset 2 is BX and offset 10h is ES. Thus, the code is actually setting an image of the caller's ES:BX to DOS_DS:0026. The register image gets popped into the actual CPU registers in the series of POPs at the end (step 13) of the INT 21h dispatch in figure 6-7. So, this is how INT 21h function 52h returns SysVars in ES:BX. (If you want to see how DOS creates SysVars in the place, you need to disassemble the DOS initialization code on disk.)
The reader may have noted from the appendix that there is an internal DOS function, INT 2Fh AX=1218h, to get the caller's register structure; it returns a pointer to the structure in DS:SI. This sounds a lot like the subfunction you viewed above at FDC8:4282. In fact, they are one and the same function. DOS calls this subroutine through a near function pointer rather than through an INT 2Fh. You'll see in a few moments that, in a table of INT 2Fh AH=12h subfunctions, 4282h duly appears as the handler for subfunction 18h.
Next, let's look at a more interesting function. From Figure 6-8, the code for INT 21h AH=3Fh (Read File) is supposed to be located at FDC8:A839. The code for this function is too extensive to examine much of it here, so let's just look at the two lines:
-u FDC8:A839
FDC8:A839 BEFD71 MOV SI,71FD ; offset of internal Read func
FDC8:A83C E82DFE CALL A66C ; see below
; ...
You know that function 3Fh expects a file handle in BX; you know furthermore that file handles are associated with the current PSP. Examining the subroutine called at A66Ch shows how DOS uses the passed-in file handle:
-u FDC8:A66C
FDC8:A66C 2E8E06E73D MOV ES,CS:[3DE7] ; get DOS DS
FDC8:A671 268E063003 MOV ES,ES:[0330] ; ES <- current PSP
FDC8:A676 263B1E3200 CMP BX,ES:[0032] ; PSP[32h] = # max open files
FDC8:A67B 7204 JB A681
FDC8:A67D B006 MOV AL,06 ; 6 = invalid handle error
FDC8:A67F F9 STC ; set carry flag
FDC8:A680 C3 RET
FDC8:A681 26C43E3400 LES DI,ES:[0034] ; PSP[34h] -> file handle tbl
FDC8:A686 03FB ADD DI,BX ; use file handle as offset
FDC8:A688 C3 RET
In other words, this subroutine uses the current PSP to convert the passed-in file handle into a pointer to the caller's Job File Table (see chapter 8). Dereferencing this pointer yields an index into the System File Table. From the SFT entry, the DOS Read function can determine what type of file the caller wants to read from. With a network file, for example, the Read function must pass the call down to a redirector (see chapter 8), whereas with a normal file, a device driver must handle the call. Of course, a Read call may never get here in the place, having already been picked off by a disk cache such as SMARTDRV. That, after all, is the whole point of a disk cache.
The subroutine at FDC8:A66C is none other than the handler for the internal DOS function INT 2Fh AX=1220h (Get Job File Table Entry; see appendix). You saw earlier that many DOS functions call use a near pointer to INT 2Fh AX=1218h to get a pointer to the client register structure. And the "MOV DS,CS:[3DE7]" code you've seen so many times sounds a lot like what INT 2Fh AX=1203h (Get DOS Data Segment) must do. We keep on running into these INT 2Fh AH=12h subfunctions; it's time to take a closer look.
To examine the code for INT 2Fh AH=12h, we're going to unassemble the DOS INT 2Fh handler, just as we did for INT 21h. Recall that we used DEBUG to trace through a simple INT 21h call so we could find the DOS INT 21h handler. We could do the same thing again for a simple call such as INT 2Fh AX=1200h (DOS internal services installation check; see appendix). But is there any way to automate what DEBUG did? Can you perhaps trace through interrupts and locate an entire interrupt chain without DEBUG's help?
Yes, but you have to understand a little of how the DEBUG trace command works. The edition of Undocumented DOS had an entire chapter by Tim Paterson on debugging, with extensive source code examples on the accompanying disk (\UNDOC\CHAP7\*.ASM) This is an excellent place to turn for a general understanding of how DEBUG works.
The trace command in debuggers such as DEBUG and SYMDEB uses the single-step feature built into all Intel 80x86 microprocessors. When the processor's trace flag (TF) is enabled, the processor issues an INT 1 for every instruction it executes. A debugger can install an INT 1 (Single Step) handler and get the effect of having a breakpoint on every instruction.
However, a single step handler contains code too, and leaving TF enabled on entry to the single step handler would produce an endless loop. For this reason, the processor temporarily disables the trace flag when it issues an interrupt, and reenables tracing when the interrupt handler returns. In fact, the processor disables single step for all interrupts.
This is why most debuggers won't trace into an INT. To trace through an INT, a debugger must do something like set a breakpoint at the instruction of the interrupt handler; and then re-enable single-step after the breakpoint is hit (see Crawford and Gelsinger, Programming the 80386 ). This is what DEBUG does. Unfortunately, the MON family of debuggers included with the edition of Undocumented DOS happened not to trace through INT instructions.
We can incorporate this knowledge into a program that single-steps through an interrupt handler. INTCHAIN.C, shown in listing 6-5, installs an INT 1 single-step handler, turns on the trace flag, calls an interrupt function specified on the program's command line, and turns off the trace flag. Because INTCHAIN.C uses a far CALL rather than an INT, the processor calls the single-step handler for each instruction in the other interrupt handler; the handler saves away CS:IP whenever CS changes, as a likely indication that the interrupt function is chaining to the previous handler. When the interrupt function returns and INTCHAIN has turned off the trace flag, INTCHAIN prints out the interrupt chain as saved by the single-step handler.
For example, consider the point made earlier in figure 6-3 that SMARTDRV does back-end handling of the DOS Disk Reset function (INT 21h AH=0Dh). This is plainly visible in an INTCHAIN trace of a call to this function, shown in figure 6-10a.
C:\UNDOC2\CHAP6>intchain 21/0d00
1387 instructions
Skipped over 4 INT
0B94:32B6 MSCDEX
07FA:15FA SMARTDRV
C801:0829
0255:0023 J:
CC2C:058E DBLSSYS$
0116:109E DOS
FDC8:40F8 HMA
0070:06F5 IO
FDC8:8653 HMA
0070:0700 IO
FFFF:0043 HMA
CC2C:0623 DBLSSYS$
07FA:1631 SMARTDRV
Notice that, after being processed by MSDOS.SYS, IO.SYS, and DBLSSYS$, the call winds up back in SMARTDRV.
For a direct comparison with the DEBUG trace in figure 6-3, Figure 6-10b presents sample INTCHAIN output when tracing an INT 21h AH=62h call.
C:\UNDOC2\CHAP6>intchain 21/6200
77 instructions
0F93:32B6 MSCDEX
07F9:15FA SMARTDRV
C801:0800
0255:0023 D:
CC2E:058E DBLSSYS$
0116:109E DOS
FDC8:40F8 HMA
Sure enough, this matches the interrupt chain we so laboriously traced back in figure 6-3. INTCHAIN uses MAP.C from listing 6-2 to try to match up CS:IP addresses with the names of resident TSRs and drivers. The addresses displayed by INTCHAIN can be passed to SYMDEB or DEBUG for unassembly (this is the whole point of the program).
INTCHAIN can also trace through an XMS function or an arbitrary segment:offset pointer. Actually the program has little to do with interrupt chains as such. Rather than generate an actual INT instruction and then have to mess with setting a breakpoint, the program just turns an INT XXh into a far call (and PUSHF) to the handler for INT XXh. Thus, INTCHAIN won't trace any INT generated inside the handler (such as the INT 2Ah call made by the INT 21h dispatch in figure 6-7); this is generally what you want anyway.
/*
INTCHAIN.C
Andrew Schulman, May 1993
Copyright (C) 1993 Andrew Schulman. All rights reserved.
bcc intchain.c map.c
Uses single-step to trace through interrupt chains
usage: intchain intno/ax/bx/cx/dx
example: intchain 21/6200
*/
#include <stdlib.h>
#include <stdio.h>
#include < string.h >
#include <dos.h>
typedef unsigned char BYTE;
typedef unsigned short WORD;
typedef unsigned long DWORD;
#ifdef __cplusplus
typedef void interrupt (far *INTRFUNC)(...);
#else
typedef void (interrupt far *INTRFUNC)(void);
#endif
typedef void (far *FARFUNC)(void);
#ifndef MK_FP
#define MK_FP(s,o) ((((DWORD) s) << 16) + (o))
#endif
#define MK_LIN(fp) ((((DWORD) FP_SEG(fp)) << 4) + FP_OFF(fp))
#pragma pack(1)
typedef struct {
#ifdef __TURBOC__
WORD bp,di,si,ds,es,dx,cx,bx,ax;
#else
WORD es,ds,di,si,bp,sp,bx,dx,cx,ax; /* same as PUSHA
*/
#endif
WORD ip,cs,flags;
} REG_PARAMS;
#define INT_INSTR 0xCD
#define TRACE_FLAG 0x100
extern char *find_owner(DWORD lin_addr); // in map.c
void fail(const char *s) { puts(s); exit(1); }
#define MAX_ADDR 512
static WORD volatile instr = 0, int_instr = 0;
static WORD prev_seg = 0, my_seg = 0;
static void far * *addr;
static int num_addr = 0;
void interrupt far single_step(REG_PARAMS r) // INT 1 handler
{
WORD seg;
BYTE far *fp;
if ((seg = r.cs) == my_seg) // ignore my own code
return;
fp = (BYTE far *) MK_FP(r.cs, r.ip);
if (fp[0] == INT_INSTR) // count INTs
int_instr++;
instr++;
if (seg != prev_seg) // if segment changed,
{ // assume we've chained
if (num_addr < MAX_ADDR)
addr[num_addr++] = (void far *) fp;
prev_seg = seg;
}
}
#define GET_FLAGS(reg) _asm { pushf } ; _asm { pop reg }
#define SET_FLAGS(reg) _asm { push reg } ; _asm { popf }
void set_flag(unsigned mask)
{
GET_FLAGS(ax);
_asm or ax, word ptr mask
SET_FLAGS(ax);
}
void clear_flag(unsigned mask)
{
GET_FLAGS(ax);
_asm mov bx, word ptr mask
_asm not bx
_asm and ax, bx
SET_FLAGS(ax);
}
FARFUNC get_xms(void)
{
_asm mov ax, 4300h
_asm int 2fh
_asm cmp al, 80h
_asm je present
absent:
fail("XMS not present!");
present:
_asm mov ax, 4310h
_asm int 2fh
_asm mov ax, bx
_asm mov dx, es
// retval in DX:AX
}
main(int argc, char *argv[])
{
static int intrfunc = 0; /* make sure not in a register
*/
INTRFUNC old_sstep;
FARFUNC func = (FARFUNC) 0;
FARFUNC xms_func = (FARFUNC) 0;
void far *fp;
char *s;
WORD intno, _ax, _bx, _cx, _dx;
int a20off = 0;
int i;
puts("INTCHAIN 1.0 -- Walks interrupt chains");
puts("From \"Undocumented DOS\", 2nd edition (Addison-Wesley, 1993)");
puts("Copyright (C) 1993 Andrew Schulman. All rights reserved.\n");
if (argc < 2)
fail("usage: intchain [-a20off] < intno|xms|seg:ofs >/ax/bx/cx/dx");
if (strcmp(strupr(argv[1]), "-A20OFF") == 0)
{
xms_func = get_xms();
a20off++;
argv++;
}
// Figure out what code they want to generate:
// an XMS call
if (strncmp(strupr(argv[1]), "XMS", 3) == 0)
{
func = get_xms();
sscanf(argv[1], "XMS/%04X/%04X/%04X/%04X",
&_ax, &_bx, &_cx, &_dx);
printf("Tracing XMS at %Fp\n", func);
}
// ... or a far (segment:offset) CALL
else if (strchr(argv[1], ':'))
{
WORD seg, ofs;
sscanf(argv[1], "%04X:%04X/%04X/%04X/%04X/%04X",
&seg, &ofs, &_bx, &_cx, &_dx);
func = (FARFUNC) MK_FP(seg, ofs);
printf("Tracing function at %Fp\n", func);
}
// ... or an INT XXh
else
{
sscanf(argv[1], "%02X/%04X/%04X/%04X/%04X",
&intno, &_ax, &_bx, &_cx, &_dx);
/* single-step doesn't go through INT, so turn the INT into
a PUSHF and far CALL */
if (! (func = (FARFUNC) _dos_getvect(intno)))
fail("INT unused");
intrfunc++; // so do PUSHF when call func
printf("Tracing INT %02X AX=%04X\n", intno, _ax);
}
if (! (addr = (void far **) calloc(MAX_ADDR, sizeof(void far *))))
fail("insufficient memory");
fp = (void far *) main;
my_seg = FP_SEG(fp);
old_sstep = _dos_getvect(1);
_dos_setvect(1, (INTRFUNC) single_step);
if (a20off)
{
_asm mov ah, 6
(*xms_func)(); // local disable A20 line
}
set_flag(TRACE_FLAG);
/* call the code */
_asm mov ax, _ax
_asm mov bx, _bx
_asm mov cx, _cx
_asm mov dx, _dx
if (intrfunc)
_asm pushf
(*func)();
clear_flag(TRACE_FLAG);
_dos_setvect(1, old_sstep);
printf("%u instructions\n", instr);
if (int_instr)
printf("Skipped over %u INT\n", int_instr);
printf("\n");
for (i=0; i< num_addr; i++)
{
s = find_owner(MK_LIN(addr[i]));
printf("%Fp\t%s\n", addr[i], s? s: " ");
}
if (num_addr == MAX_ADDR)
fail("Overflow: very long INT chain!");
return 0;
}
You can now use INTCHAIN to trace through a call to INT 2Fh AX=1200h, without using DEBUG. Figure 6-11 shows sample results. Note that the configuration was somewhat different from the one used to produce the INTCHAIN output for INT 21h AH=62h in figure 6-10b.
C:\UNDOC2\CHAP6>intchain 2f/1200
174 instructions
Skipped over 1 INT
1248:0007 NLSFUNC
109A:0980 PRINT
0F16:0943 SHARE
DB18:0285 DOSKEY
0B94:308D MSCDEX
07FA:1368 SMARTDRV
0726:019F COMMAND
0725:0135 COMMAND
C801:0696
0725:01BD COMMAND
FFFF:DFD8 HMA
C801:061E
0255:002D J:
CC2C:25ED DBLSSYS$
0255:0028 J:
CC2C:0116 DBLSSYS$
0116:10C6 DOS
FDC8:44BD HMA
Towards the goal of disassembling DOS, the essential piece of information here is the very last line, as this gives the address (FDC8:44BD) of MSDOS.SYS's INT 2Fh handler. We will come back to this in a few moments.
The most noticeable feature of figure 6-11 is the very long interrupt chain. NLSFUNC, PRINT, SHARE, DOSKEY, MSCDEX, SMARTDRV, COMMAND.COM, and DoubleSpace all take a crack at processing the call. Processing even what is (as you'll see) an absolutely trivial INT 2Fh AX=1200h call requires that every TSR and device driver camped out on INT 2Fh inspect the call to see if it interests them. INT 2Fh chains can be extremely long; they are particularly bad when any interrupt handlers written in C (such as the wrappers from chapter 2) are involved. As noted earlier, Ralf Brown has suggested an alternate INT 2Dh protocol in an attempt to shorten the long chains of handlers waiting around for INT 2Fh calls to appear.
Naturally, you can pass any of the addresses displayed by INTCHAIN to a debugger such as DEBUG or SYMDEB. For example, take the 0B94:308D handler for INT 2Fh, which INTCHAIN shows belong to the Microsoft CD-ROM Extensions:
-u b94:308d
0B94:308D 9C PUSHF
0B94:308E 80FC11 CMP AH,11
0B94:3091 7503 JNZ 3096
0B94:3093 EB6B JMP 3100
0B94:3095 90 NOP
0B94:3096 80FC15 CMP AH,15
0B94:3099 7503 JNZ 309E
0B94:309B EB09 JMP 30A6
0B94:309D 90 NOP
0B94:309E 80FC05 CMP AH,05
; ...
You can see MSCDEX checking for calls to INT 2Fh AH=11h. This makes perfect sense, as INT 2Fh AH=11h is the network redirector protocol, and MSCDEX is a network redirector (see chapter 8). MSCDEX next looks for calls to INT 2Fh AH=15h, which again makes sense since this is the documented MSCDEX API (see Ray Duncan, MS-DOS Extensions ). What about INT 2Fh AH=05h? As explained in the appendix, this is an undocumented interface that allows resident programs (network redirectors in particular) to expand critical error numbers into strings. External DOS programs such as COMMAND.COM issue INT 2Fh AH=05h calls; network redirectors such as MSCDEX handle the calls and provide the caller with strings to display (such as "CDR101: Not ready reading drive D" when you try to DIR a recording of Handel's Messiah ).
How about INT 2Fh under Windows? Figure 6-12 shows INTCHAIN output for the same configuration as figure 6-11, except that INTCHAIN is running in a DOS box under Windows Enhanced mode:
Tracing INT 2F AX=1200
175 instructions
14D4:02A7 win
12E4:0D68 WINICE
1248:0007 NLSFUNC
109A:0980 PRINT
0F16:0943 SHARE
DB18:0285 DOSKEY
0B94:308D MSCDEX
07FA:1368 SMARTDRV
0726:019F COMMAND
0725:0135 COMMAND
C801:0696
1580:0045 win386
0725:01BF COMMAND
FFFF:DFD8 HMA
C801:061E
0255:002D J:
CC2C:25ED DBLSSYS$
0255:0028 J:
CC2C:0116 DBLSSYS$
0116:10C6 DOS
FDC8:44BD HMA
Not only have WIN.COM and WINICE (the Soft-ICE/Windows debugger) added themselves to the front of the INT 2Fh chain, but notice that WIN386 has insinuated itself into the middle of the chain. This, however, isn't the half of it. Windows Enhanced mode executes large amounts of code to service interrupts from DOS boxes that never show up in INTCHAIN, at least in its present form. Many instructions, such as STI and CLI, cause a jump into the Windows Virtual Machine Manager, running in 32-bit protected mode. This jump is invisible to a real mode DOS program like INTCHAIN. In particular, Windows Enhanced mode hooks INT 2Fh using the protected mode interrupt descriptor table (IDT). A more sophisticated version of INTCHAIN would need to be written to deal with Windows Enhanced mode. The same goes for INTVECT (listing 6-1), which does however at least recognize the ARPL instruction that Windows uses as a V86 breakpoint.
However, you already have the information you want, which is the last line in figures 6-11 and 6-12. (In the next to last line, you see the low-memory stub for INT 2Fh when DOS=HIGH.) In this configuration, the MSDOS.SYS INT 2Fh handler is located at FDC8:44BD; this code is shown with comments in figure 6-13.
-u fdc8:44bd
FDC8:44BD FB STI
FDC8:44BE 80FC11 CMP AH,11 ; 2F/11 network redirector
call?
FDC8:44C1 750A JNZ 44CD ; no
;; Unsupported functions come here. Some external program like SHARE,
;; NLSFUNC, or a redirector is supposed to handle these. If we got here,
;; the external program must not be loaded, so it's an error -- except
;; if the caller is doing a 2F/??/00 install check, in which case DOS
;; will just return AX unchanged to indicate the software isn't installed.
FDC8:44C3 0AC0 OR AL,AL ; 2F/??/00 install check?
FDC8:44C5 7403 JZ 44CA ; yes: unsupported func; AX unchanged
FDC8:44C7 E8DCFF CALL 44A6 ; no -- set carry flag for error
FDC8:44CA CA0200 RETF 0002 ; sort-of IRET without changing flags
FDC8:44CD 80FC10 CMP AH,10 ; 2F/10 SHARE call?
FDC8:44D0 74F1 JZ 44C3 ; yes: error
FDC8:44D2 80FC14 CMP AH,14 ; 2F/14 NLSFUNC call?
FDC8:44D5 74EC JZ 44C3 ; yes: error
FDC8:44D7 80FC12 CMP AH,12 ; 2F/12 DOS internal
function?
FDC8:44DA 7503 JNZ 44DF ; no: keep checking
FDC8:44DC E91102 JMP 46F0 ; yes: goto fig. 6-15a
FDC8:44DF 80FC16 CMP AH,16 ; 2F/16 Windows call or broadcast?
FDC8:44E2 740D JZ 44F1 ; yes: DOS communicate with Windows
FDC8:44E4 80FC46 CMP AH,46 ; 2F/46: misc. DOS/Windows func?
FDC8:44E7 7503 JNZ 44EC ; no: jump to IO.SYS INT 2Fh handler
FDC8:44E9 E9B801 JMP 46A4 ; yes: goto 2F/46 handler
FDC8:44EC EA05007000 JMP 0070:0005 ; pass to IO.SYS (fig. 6-14)
At the very end of figure 6-13, you can see a hardwired jump to 0070:0005. Here, MSDOS.SYS has decided that it doesn't handle a particular INT 2Fh call, so it passes it down to IO.SYS, which has its own INT 2Fh handler. Geoff Chappell discusses these two DOS INT 2Fh handlers at greater length in his DOS Internals , but since we're here, we might as well steal a brief glance at the IO.SYS INT 2Fh handler, which is shown in figure 6-14. Note that when DOS=HIGH, IO.SYS can assume that A20 is already on because the only path into IO.SYS's INT 2Fh handler is through the one in MSDOS.SYS, which already took care of enabling A20 in its low memory stub (located at 0116:10C6 in figures 6-11 and 6-12).
-u 70:5
0070:0005 EA93087000 JMP 0070:0893
-u 70:893
0070:0893 2EFF2EE606 JMP FAR CS:[06E6]
-dd 70:6e6 6e6
0070:06E6 FFFF:1302
-u ffff:1302
FFFF:1302 80FC13 CMP AH,13 ; 2F/13 (set INT 13h handler) call?
FFFF:1305 7413 JZ 131A ; yes: do it
FFFF:1307 80FC08 CMP AH,08 ; 2F/08 DRIVER.SYS call?
FFFF:130A 743B JZ 1347 ; yes: do it
FFFF:130C 80FC16 CMP AH,16 ; 2F/16 Windows call?
FFFF:130F 7479 JZ 138A ; yes: IO.SYS also handles these!
FFFF:1311 80FC4A CMP AH,4A ; 2F/4A (misc. undoc func) call?
FFFF:1314 7503 JNZ 1319 ; no: return unchanged
FFFF:1316 E9A700 JMP 13C0 ; yes: do it
FFFF:1319 CF IRET
There are many interesting side roads we could explore here, including the Set INT 13h Handler (INT 2Fh AH=13h) function, and the several different AH=16h subfunctions that MSDOS.SYS and IO.SYS use to communicate with Windows. Sadly, however, we have to drive by if we are to have any chance of making it to our goal of disassembling DOS. As noted already, to do this, we must find where DOS handles the INT 2Fh AH=12h internal functions.
In figure 6-13, it is clear that FDC8:46F0 is the handler for these functions. As usual, we can pass this address to DEBUG or SYMDEB for unassembly; figure 6-15 shows the results.
-u fdc8:46f0
FDC8:46F0 2EFF36783F PUSH CS:[3F78] ; word at FDC8:3F78 = 44CAh
FDC8:46F5 2EFF367A3F PUSH CS:[3F7A] ; word at FDC8:3F7A = 3F7Ch
FDC8:46FA 50 PUSH AX ; push function/subfunction
FDC8:46FB 55 PUSH BP
FDC8:46FC 8BEC MOV BP,SP
FDC8:46FE 8B460E MOV AX,[BP+0E] ; put possible stack arg into AX
FDC8:4701 5D POP BP
FDC8:4702 E84509 CALL 504A ; call subroutine (fig. 6-15b)
FDC8:4705 E9BFFD JMP 44C7
Hmm, not very promising looking. What is this subroutine at 504A?
-u fdc8:504a
FDC8:504A 55 PUSH BP
FDC8:504B 8BEC MOV BP,SP
FDC8:504D 53 PUSH BX
FDC8:504E 8B5E06 MOV BX,[BP+06] ; address of subfunc table
FDC8:5051 2E8A1F MOV BL,CS:[BX] ; number of valid sbfuncts
FDC8:5054 385E04 CMP [BP+04],BL ; caller's subfunction number
FDC8:5057 7317 JNB 5070 ; if too high, error
FDC8:5059 8A5E04 MOV BL,[BP+04] ; get subfunction
FDC8:505C 32FF XOR BH,BH
FDC8:505E D1E3 SHL BX,1 ; turn into word offset
FDC8:5060 43 INC BX ; skip past # subfunctions
FDC8:5061 035E06 ADD BX,[BP+06] ; add in address of table
FDC8:5064 2E8B1F MOV BX,CS:[BX] ; pull out func ptr
FDC8:5067 895E06 MOV [BP+06],BX ; push on stack, RET to it
FDC8:506A 5B POP BX
FDC8:506B 5D POP BP
FDC8:506C 83C404 ADD SP,+04
FDC8:506F C3 RET ; call subfunc via RET
FDC8:5070 5B POP BX ; invalid sbfunc come here
FDC8:5071 5D POP BP
FDC8:5072 C20600 RET 0006
Despite the heading, the code in figure 6-15b is not specifically related to INT 2Fh AH=12h; other functions that have subfunctions use this same subroutine. For example, the handler for INT 21h AH=5Dh calls this same subroutine. The top of figure 6-15a shows that calling this subroutine involves pushing several values on the stack, including AX, which holds the function and subfunction that the caller wants (for example, 1200h) and the address of a table of function pointers. This table's byte holds the number of valid subfunctions; the rest of the table is an array of near function pointers to the appropriate handlers for each subfunction.
The subroutine in figure 6-15b takes the caller's subfunction number (for example, the 00h in 1200h) and compares it against the byte of the table to see if it is within range. If it is, the code shifts the subfunction number into a word and adds it onto the address of the table; the value is incremented by 1 to skip past the table's byte. The subroutine then pulls the function pointer out of the table, pushes the function pointer on the stack, and "returns" to it.
This is somewhat difficult to follow, but for our purposes, the key piece of information is simply the location of the table, as this holds pointers to every INT 2Fh AH=12h subfunction. At the top of figure 6-15a, there is a comment indicating that, in this configuration, the table is at FDC8:3F7C. The byte of this table is the number of subfunctions. This is followed immediately by an array of 30h words, holding function pointers to the various INT 2Fh AH=12h subfunctions:
-db fdc8:3f7c 3f7c
FDC8:3F70 30 0
-dw fdc8:3f7d
FDC8:3F7D 470E 6E2E 4CBE 4708 9066 54EB 9342 98EA
FDC8:3F8D 6F2F 9A9F B38F 6B6A 6B53 48CE 5030 98E3
FDC8:3F9D 5030 4FF9 5011 9011 9927 9A76 A6A3 AB12
FDC8:3FAD 4282 AABB AECD 4978 4A12 496C 4FD7 A9FC
; ...
Let's see if this is really the INT 2Fh AH=12h dispatch table. Earlier, it was noted that the subroutine at 4282h that DOS is so fond of calling is actually the code for INT 2Fh AX=1218h (Get Caller's Registers). Using SYMDEB to dump the table entry #18h confirms that this is correct:
-dw fdc8:3f7d+(18*2)
FDC8:3FAD 4282 ....
The FTAB program from listing 6-4 can produce a nicer display of this same table. In fact, FTAB has an option to display tables such as this that keep the number of subfunctions as their byte. The two commands shown in figure 6-16 are thus equivalent. So that you have a handy 2F/12 crib sheet to refer to, the entire table is shown, along with comments indicating the purpose of each subfunction.
C:\UNDOC2\CHAP6>ftab fdc8:3f7e 30 int2f12
C:\UNDOC2\CHAP6>ftab fdc8:3f7d ? int2f12
FDC8:470E int2f12_00 ; install check
FDC8:6E2E int2f12_01 ; close current file
FDC8:4CBE int2f12_02 ; get interrupt addr
FDC8:4708 int2f12_03 ; get dos data seg
FDC8:9066 int2f12_04 ; normalize path separator
FDC8:54EB int2f12_05 ; output char
FDC8:9342 int2f12_06 ; invoke crit err
FDC8:98EA int2f12_07 ; make disk buff most recently used
FDC8:6F2F int2f12_08 ; decrement sft ref count
FDC8:9A9F int2f12_09 ; flush and free disk buff
FDC8:B38F int2f12_0A ; perform crit err interrupt
FDC8:6B6A int2f12_0B ; signal share violation
FDC8:6B53 int2f12_0C ; set fcb file's owner
FDC8:48CE int2f12_0D ; get date and time
FDC8:5030 int2f12_0E ; mark all disk buffer unreferenced
FDC8:98E3 int2f12_0F ; make buffer most recently used
FDC8:5030 int2f12_10 ; find unreferenced disk buffer
FDC8:4FF9 int2f12_11 ; normalize asciiz filename
FDC8:5011 int2f12_12 ; strlen
FDC8:9011 int2f12_13 ; toupper
FDC8:9927 int2f12_14 ; _fstrcmp
FDC8:9A76 int2f12_15 ; flush buffer
FDC8:A6A3 int2f12_16 ; get address of SFT entry
FDC8:AB12 int2f12_17 ; set working drive
FDC8:4282 int2f12_18 ; get caller's registers
FDC8:AABB int2f12_19 ; set drive
FDC8:AECD int2f12_1A ; get file's drive
FDC8:4978 int2f12_1B ; set year, length of February
FDC8:4A12 int2f12_1C ; checksum memory
FDC8:496C int2f12_1D ; sum memory
FDC8:4FD7 int2f12_1E ; compare filenames
FDC8:A9FC int2f12_1F ; build CDS
FDC8:A66C int2f12_20 ; get JFT entry
FDC8:AEA8 int2f12_21 ; truename
FDC8:4434 int2f12_22 ; set extended err info
FDC8:8147 int2f12_23 ; check if char dev
FDC8:5030 int2f12_24 ; delay
FDC8:501F int2f12_25 ; strlen
FDC8:50D4 int2f12_26 ; open file
FDC8:A72A int2f12_27 ; close file
FDC8:50DA int2f12_28 ; move file pointer (lseek)
FDC8:A839 int2f12_29 ; read file
FDC8:5094 int2f12_2A ; set fastopen entry point
FDC8:5117 int2f12_2B ; ioctl
FDC8:5106 int2f12_2C ; get dev chain
FDC8:5134 int2f12_2D ; get extended err code
FDC8:5139 int2f12_2E ; get/set error table addresses
FDC8:440D int2f12_2F ; nop
The whole reason for looking at INT 2Fh AH=12h was that we expected that many of the near function calls that DOS makes internally would show up here. Indeed, you can now see clearly that the CALL 4282h that has continually popped up in these explorations is actually INT 2Fh AX=1218h. Similarly, as promised earlier, CALL 466C is actually INT 2Fh AX=1220h (Get JFT Entry). DOS internally makes extensive use of the functions in figure 6-16, but as already noted, it does so using a near CALL rather than an INT. DOS provides the INT form mostly for use by redirectors (see chapter 8). So, having this table of obscure INT 2Fh AH=12h functions definitely makes it much easier to understand the code for the INT 21h functions in which you are presumably interested.
Recall that, in figures 6-11 and 6-12, the process of locating this table started by having the INTCHAIN program call INT 2Fh AX=1200h. This function, the DOS internal services install check, does nothing more than return with AL=FFh to indicate that the services are present. The table indicates that FDC8:470E is the handler for this function. Let's unassemble at this address to check that the table makes sense:
-u fdc8:470e
FDC8:470E B0FF MOV AL,FF
FDC8:4710 C3 RET
How about INT 2Fh AX=1203h, which is supposed to return with the DOS data segment in DS?
-u fdc8:4708
FDC8:4708 2E8E1EE73D MOV DS,CS:[3DE7]
FDC8:470D C3 RET
The table seems to be accurate, so let's look at a more interesting function. According to the appendix, INT 2Fh AX=1217h sets DOS's working drive; the caller must push a zero-based drive number on the stack before calling the function. According to figure 6-16, this function is located at FDC8:AB12. Figure 6-17 shows a commented SYMDEB unassembly of this code.
-u fdc8:ab12
;;; SS points at DOS DS
;;; Here, SysVars is at DOS:0026. So DOS:0047 is SysVars+21h
FDC8:AB12 363A064700 CMP AL,SS:[0047] ; SysVars+21h = LASTDRIVE
FDC8:AB17 7202 JB AB1B ; is drive < LASTDRIVE?
FDC8:AB19 F9 STC ; no: set carry flag, fail
FDC8:AB1A C3 RET
FDC8:AB1B 53 PUSH BX ; yes
FDC8:AB1C 50 PUSH AX
FDC8:AB1D 36C5363C00 LDS SI,SS:[003C] ; SysVars+16h = CDS ptr
FDC8:AB22 B358 MOV BL,58 ; 58h = size of CDS entry
FDC8:AB24 F6E3 MUL BL
FDC8:AB26 03F0 ADD SI,AX ; DS:SI = ptr to drive's CDS
;;; Here, SDA at DOS:0320, so DOS:05A2 is SDA+282h
FDC8:AB28 368936A205 MOV SS:[05A2],SI ; move drive's CDS ptr into
FDC8:AB2D 368C1EA405 MOV SS:[05A4],DS ; DOS SDA+282h
FDC8:AB32 58 POP AX
FDC8:AB33 5B POP BX
FDC8:AB34 F8 CLC
FDC8:AB35 C3 RET
But if this function is called with the drive number on the stack, you may wonder how the code starts off with the drive number in AL. Looking back at figure 6-15a, note that the generic INT 2Fh AH=12h handler took a word off the stack (BP+0Eh, located after the caller's CS:IP and flags) and moved it into AX. In the case of those functions that don't expect a parameter on the stack, AX holds ignorable garbage. Thus, when this chapter says in various places that DOS makes an INT 2Fh AX=12xxh call, this is just a shorthand way of saying that DOS issues a near call to the code for INT 2Fh AX=12xxh, and that any parameter which, in the INT 2Fh version, would appear on the stack (see the appendix) actually appears in AX.
Everything else in this function involves fairly straightforward manipulation of DOS internal structures. The function checks the drive number against the internal value of LASTDRIVE in SysVars. If the drive number is valid, the function uses it as an index into the CDS array, a pointer to which is also contained in SysVars. The function then moves a pointer to the CDS entry into a DOS global variable. Changing this variable is basically what it means to set DOS's working drive.
It is useful to see how DOS internally uses the LASTDRIVE variable in SysVars, the CDS, and other undocumented DOS features. Discussions of undocumented DOS are often (as in the edition of the book) disconnected from any consideration of DOS internals. But the CDS, SFT, List of Lists, and other structures are not provided for our entertainment, like the hidden "gang screens" that software hobbyists and enthusiasts seem to enjoy finding. In fact, the CDS and so on are not so much undocumented DOS features, as internal DOS features that happen to be externally accessible through an undocumented interface. That undocumented DOS is often discussed without the surrounding context of DOS internals tends to obscure the real purpose of these structures. It is important to realize that the "true" form is the internal one, not the undocumented one.
For example, even though this chapter has often referred to the location of variables such as CURR_PSP as SDA+10h, or BREAK_FLAG as SDA+17h, within DOS there really is no such thing as the Swappable Data Area. The SDA is merely an externally-visible interface that Microsoft added at a rather late on top of the DOS data segment (see "Origins of the SDA" in chapter 8). Likewise, the INT 2Fh AH=12h functions are just an undocumented external interface provided on top of some internal DOS functions, for the convenience mostly of network redirectors. The internal near-call form of these functions is the true one.
What have we accomplished here? Basically, by locating the INT 2Fh AH=12h dispatch table, we now acquired names for 30h different internal DOS functions. Our earlier uncovering of the INT 21h dispatch table gave us names for 6Dh different locations in DOS. Rather than keep picking at disassembly of individual functions here and there, we can now turn around and do a full-blown disassembly of this entire code segment.
Everything we've looked at in DOS is in the same code segment, which in this particular configuration happens to be FDC8h. Of course there are other parts of DOS, but this segment seems like a good place to start. How can you disassemble the entire code segment at once, but still keep track of where the individual functions are located? For example, in a monster disassembly of segment FDC8h, you would like to know where the Set PSP function is handled, where Exec is handled, and so on.
You can use DEBUG or SYMDEB to produce a disassembly of this DOS code segment, and use the FTAB program to produce labels indicating the location of key functions within the segment. To merge the FTAB output with the disassembly, and, while we're about it, clean up and improve the disassembly in various ways, we will use a program named NICEDBG, written in AWK, a C-like pattern-matching language that is excellent for text processing tasks like this.
To unassemble the main DOS code segment, you need to know where to tell DEBUG to start and stop unassembly. You can make a preliminary stab at finding the proper unassembly range by taking the FTAB outputs for the INT 21h dispatch table (figure 6-8) and the INT 2Fh AH=12h dispatch table (figure 6-16), combining them, and sorting them by address:
C:\UNDOC2\CHAP6>type tmp.bat
@@echo off
ftab fdc8:3e9e 6d INT21 > int212f.tmp
ftab fdc8:3f7d 30 INT2F_12 >> int212f.tmp
sort < int212f.tmp > int212f.log
C:\UNDOC2\CHAP6>tmp
C:\UNDOC2\CHAP6>type int212f.log
FDC8:4052 INT21_33
FDC8:40A9 INT21_50
FDC8:40B5 INT21_51
FDC8:40B5 INT21_62
FDC8:40C1 INT21_64
FDC8:4282 INT2F_12_18
FDC8:440D INT21_18
; ... etc. ...
FDC8:B0E9 INT21_5A
FDC8:B183 INT21_6C
FDC8:B2D8 INT21_5C
FDC8:B38F INT2F_12_0A
From the and last lines of INT212F.LOG, it is clear that you want DEBUG or SYMDEB to unassemble starting at FDC8:4052 and ending somewhere a bit after FDC8:B38F. B500h is probably a good place to stop. You will probably need to adjust the unassembly range later, and rerun DEBUG, but this is fine for now. You can put the unassembly command into a tiny script file, feed it to the debugger, and redirect the debugger's output to a file:
C:\UNDOC2\CHAP6>type int212f.scr
u fdc8:4052 b500
q
C:\UNDOC2\CHAP6>debug < int212f.scr > int212f.out
Using SYMDEB rather than DEBUG produces nicer results. SYMDEB puts segment overrides in their proper place, rather than on a separate line like DEBUG. But you must use the SYMDEB /X command line to suppress SYMDEB's [more] prompt, which you wouldn't see if you redirected output to a file:
C:\UNDOC2\CHAP6>symdeb /x < int212f.scr > int212f.out
This takes a minute or so to run. The INT212F.OUT file will be about 870k bytes—much smaller if you use SYMDEB—and won't yet look very interesting. For example, there aren't yet any labels indicating where each DOS function starts. One of the things NICEDBG can do is merge the INT212F.OUT file produced by DEBUG or SYMDEB with the INT212F.LOG file that you produced using FTAB.
Actually, there's one interesting thing you can do with the raw unassembly output from DEBUG or SYMDEB. Run the DEBUG unassembly script once under MS-DOS; then start Windows Enhanced mode and rerun the DEBUG script again from inside a DOS box. Redirect DEBUG's output to a different file. This sequence gives you an easy way to examine the patches that Windows applies to MS-DOS. Just compare the two files, using diff or a similar utility. Any differences in this DOS code segment are the result of Windows patches.
C:\UNDOC2\CHAP6>debug < int212f.scr > int212f.out
C:\UNDOC2\CHAP6>win
;;; from inside DOS box:
C:\UNDOC2\CHAP6>debug < int212f.scr > int212f.win
C:\UNDOC2\CHAP6>diff int212f.out int212f.win > int212f.dif
The list of Windows patches in INT212F.DIF is incomplete, because it shows only one DOS code segment. Still, it does provide some idea of what is going on:
;; original MS-DOS code in INT 21h dispatch (see figure 6-7
above)
< FDC8:41CE 50 PUSH AX
< FDC8:41CF B482 MOV AH,82
< FDC8:41D1 CD2A INT 2A
< FDC8:41D3 58 POP AX
;; patched by Windows; 15AD belongs to WIN386.EXE
> FDC8:41CE 9A0A00AD15 CALL 15AD:000A
> FDC8:41D3 90 NOP
;; original DOS code in a frequently called internal Begin
Crit 01 function
< FDC8:514B B80180 MOV AX,8001
< FDC8:514E CD2A INT 2A
;; patched by Windows
> FDC8:514B 9A4300AD15 CALL 15AD:0043
;; original DOS code in a frequently called internal End Crit
01 function
< FDC8:516B B80181 MOV AX,8101
< FDC8:516E CD2A INT 2A
;; patched by Windows
> FDC8:516B 9A7900AD15 CALL 15AD:0079
; ... etc. ...
The DOSMGR VxD built into WIN386.EXE applies these patches. When Windows exits, DOSMGR of course backs its changes out, restoring the original DOS code. As you can see, these patches have to do with DOS critical sections; DOSMGR wants DOS to call into the Windows VMM Begin_Critical_Section and End_Critical_Section functions. It's important to note that DOSMGR scans for the INT 2Ah instructions to patch, rather than using hardwired addresses. Thus, these patches should at least theoretically also work with a different vendor's DOS.
The same before and after technique can be used to find DOS patches applied by other programs, such as MSCDEX. Programs that patch DOS can only be safely unloaded by a MARK/RELEASE type of program that knows enough about these patches to back them out.
To run NICEDBG, feed it output from DEBUG or SYMDEB. Optionally, you can supply a symbol-table file of code name/address pairs such as FTAB produces. You can also supply NICEDBG with an optional file of data name/address pairs (see below). For example:
debug < int212f.scr > int212f.out
ftab fdc8:3e9e 6d INT21 > int212f.log
ftab fdc8:3f7d 30 INT2F_12 >> int212f.log
nicedbg int212f.out int212f.log int212f.dat > int212f.lst
NICEDBG can make many improvements to the output from DEBUG or SYMDEB. The program makes several passes over the DEBUG file, replacing calls and jumps to meaningless-looking addresses such as 4282h with calls and jumps to meaningful labels supplied by the user, such as INT2F_12_18. The program also creates semi-useful labels for any other addresses that are the target of calls, loops, or jumps. If the target address itself contains a RET or JMP, NICEDBG changes the label to reflect this. The program also generates a list of cross-references to each location.
For example, a sample of output from DEBUG looks like this:
FDC8:5126 9C PUSHF
FDC8:5127 36 SS:
FDC8:5128 803E0C0D00 CMP BYTE PTR [0D0C],00
FDC8:512D 740F JZ 513E
FDC8:512F EB01 JMP 5132
FDC8:5131 CF IRET
FDC8:5132 0E PUSH CS
FDC8:5133 E8FBFF CALL 5131
FDC8:5136 50 PUSH AX
FDC8:5137 B80180 MOV AX,8001
FDC8:513A CD2A INT 2A
FDC8:513C 58 POP AX
FDC8:513D C3 RET
FDC8:513E EB01 JMP 5141
FDC8:5140 CF IRET
FDC8:5141 0E PUSH CS
FDC8:5142 E8FBFF CALL 5140
FDC8:5145 C3 RET
This is not very promising looking. But NICEDBG can transform this raw disassembly listing into something much more readable and useful:
; xref: FDC8:4304 FDC8:438B FDC8:4D7A
func_5126:
FDC8:5126 9C PUSHF
FDC8:5127 36803E0C0D00 CMP BYTE PTR SS:[0D0C],00
FDC8:512D 740F JZ jmp_513E -> loc_5141
FDC8:512F EB01 JMP loc_5132
; xref: FDC8:5133
ret_5131:
FDC8:5131 CF IRET
; xref: FDC8:512F
loc_5132:
FDC8:5132 0E PUSH CS
FDC8:5133 E8FBFF CALL ret_5131
FDC8:5136 50 PUSH AX
FDC8:5137 B80180 MOV AX,8001
FDC8:513A CD2A INT 2A
FDC8:513C 58 POP AX
FDC8:513D C3 RET
; xref: FDC8:512D
jmp_513E:
FDC8:513E EB01 JMP loc_5141
; xref: FDC8:5142
ret_5140:
FDC8:5140 CF IRET
; xref: jmp_513E
loc_5141:
FDC8:5141 0E PUSH CS
FDC8:5142 E8FBFF CALL ret_5140
FDC8:5145 C3 RET
Here are some of the changes that NICEDBG made at various offsets in the code:
NICEDBG uses loc_ to specify targets of jumps, func_ to specify targets of CALLs, loop_ to specify targets of LOOPs, ret_ to specify code that immediately returns via either RET or IRET, jmp_ to specify code that does an unconditional JMP. If the user supplies a symbol-table file of name/address pairs such as generated by FTAB, NICEDBG will use this as a source of labels.
NICEDBG.AWK (listing 6-6) is the source code for this postprocessor for output from DEBUG or SYMDEB.
Since the reader is likely to be unfamiliar with AWK, a brief explanation of listing 6-6 is probably called for. AWK reads in each line of text in one or more files and splits the line into fields. You can change the delimiters that AWK uses to decide where fields start and end, but it defaults to using white space, which is exactly what we need here. The fields are available to the program as $1, $2, and so on, up to $NF (NF is a built-in AWK variable that holds the number of fields); $0 is the original line. For example, the line "FDC8:440D INT21_1D" is $0, "FDC8:440D" is $1, and "INT21_ID" is $2 (and $NF).
Note too that AWK handles regular expressions (as also found in utilities such as grep); for example the regular expression "/[CDES]S\:/" matches "CS:", "DS:", "ES:", or "SS:", and "/\[.*\]/" matches anything within square brackets. AWK also has associative arrays (just built-in hash tables, really) that can be indexed with strings (for example, array["string"]) as well as numbers. The presence of an item in an associative array can be tested with the in operator; for example, if ("string" in array).
The standard reference is The AWK Programming Language by Alfred Aho, Brian Kernighan, and Peter Weinberger (from the letters of whose last names the language got its name). The high-level pattern-matching and array features of AWK make it possible to implement NICEDBG in about 200 lines of code.
NICEDBG.EXE on the accompanying disk was produced with the excellent AWK compiler from Thompson Automation. You can run the program without having AWK or understanding anything about it; but to modify the program, you would of course need Thompson AWK or another AWK interpreter or compiler. The popular MKS Toolkit comes with AWK, and many BBSs carry MAWK, a freely available, fast AWK interpreter by Mike Brennan.
NICEDBG processes each line in the DEBUG file. For example, consider the following line from a DEBUG listing:
FDC8:512D 740F JZ 513E
AWK breaks this line into fields, delimited by spaces. The
$1 FDC8:512D Address of the instruction
$2 740F Instruction opcode bytes
$3 JZ Instruction operator
$4 513E Instruction operand
Of course, not every instruction looks quite like this. For example:
$1 $2 $3 $4 $5 $6
FDC8:5126 9C PUSHF
FDC8:5127 36 SS:
FDC8:5128 803E0C0D00 CMP BYTE PTR [0D0C],00
In any case, NICEDBG.AWK can rely on $1 as the address of the instruction, and $3 as either the instruction operator or (when using DEBUG rather than SYMDEB) something like a segment override.
Before processing the DEBUG file, NICEDBG reads in the optional symbol-table and data files. NICEDBG uses INT212F.LOG (or any similarly formatted file) to build a table of names (called ftab) corresponding to segment:offset locations; the program runs through each line in INT212F.OUT, or any unassembly listing produced by DEBUG or SYMDEB, to see if the line's segment:offset address is in the table.
NICEDBG makes three passes over the DEBUG file:
Pass 1 : NICEDBG looks for any calls, jumps, or loops in the code, and adds the target of the call, jump, or loop to ftab, which it will later use to generate labels. Simplifying considerably, the AWK code looks like this:
if ($3 ~ /CALL/) ftab[$4] = "func_" $4;
if ($3 ~ /LOOP/) ftab[$4] = "loop_" $4;
if ($3 ~ /J.*/) ftab[$4] = "loc_" $4;
In pass 1, NICEDBG also constructs the jmptab, for resolving JMPs to JMPs:
if ($3 ~ /JMP/) jmptab[$1] = $4; # jmptab[SOURCE] = TARGET
Pass 2 : The second time through the DEBUG file, NICEDBG builds its xref table, and also improves some of the labels generated in pass 1. A label such as jmp_XXXX or ret_XXXX, indicating that location XXXX does an unconditional JMP or (I)RET, is generally more useful than a label such as loc_XXXX, indicating that XXXX is the target of a jump. Thus, if pass 1 assigned a location a name, and if this location does a JMP or a (I)RET, NICEDBG changes ftab to reflect this:
if (($3 ~ /I*RET/) && ($1 in ftab)) ftab[$1] = "ret_" $1;
if (($3 ~ /JMP/) && ($1 in ftab)) ftab[$1] = "jmp_" $1;
Also in pass 2, NICEDBG looks for code that may be "not reached," that is, not accessible from any other location in the listing (of course, the code might be called from some other place that happened not to be in the disassembly range). If the previous line of code did an unconditional JMP or (I)RET, and if there are no labels at the current address (i.e., ftab[$1] is empty, indicating that $1 is not the target of a jump, call, or loop), NICEDBG adds $1 to a not_reached array:
if ((did_jmpret == 1) && (! ($1 in ftab))) not_reached[$1]++;
did_jmpret = 0;
if ($3 ~ /I*RET|JMP/) did_jmpret = 1;
Pass 3 : In its final pass over the DEBUG listing, NICEDBG prints out the new, improved listing:
0069 STARTUP_DRV
0330 CURR_PSP
0337 BRK_FLAG
3DE7 DOS_DS
1030 IN_WIN3E
033E MACHINE_ID
0321 IN_DOS
0584 USER_SP
0586 USER_SS
0320 CRIT_ERR
1211 DOS_HIGH
But note that NICEDBG's replacements of, for example, [0330] with CURRENT_PSP are very simple-minded: the program merely does a blind global search and replace. Thus, you should be conservative about what you put in a NICEDBG .DAT file.
If DEBUG rather than SYMDEB was used to produce NICEDBG's input, NICEDBG saves away any segment override on the current line ($3 ~ /[CDES]S\:/) and uses the AWK sub() substitution function to smack it into its proper place on the next line.
# NICEDBG.AWK -- Produces nicer output from DEBUG input and symbol table
# Copyright (c) 1993 Andrew Schulman. All rights reserved.
# usage: nicedbg symtab dbgfile > lstfile
# example: nicedbg int212f.log int212f.out > int212f.lst
# get offset from seg:ofs
function get_off(addr) { split(addr, so, ":"); return so[2]; }
function mk_fp(ofs) { return seg ":" ofs; } # make seg:ofs farptr
function get_ftab_name(addr) { # get name from table
if (addr !~ SEG_OFS)
addr = mk_fp(addr); # table indexed by seg:ofs
if (! (addr in ftab))
return addr; # not there -- return unchanged
split(ftab[addr], label, ",");
return label[1]; # just return first name if > 1
}
function resolve_jmp_jmp(src) { # JMP to JMP to ...
if (! (src in jmptab))
return;
if (done[src])
return done[src];
# if get here, haven't seen this one yet
target = target2 = jmptab[src];
while (target in jmptab) {
target2 = jmptab[target];
if (target2 == target) # endless loop
break;
if (target2 == src) # cycle
break;
if (target2 in done) { # we've seen this part already
target2 = done[target2];
break;
}
target = target2;
}
done[src] = target2;
return target2;
}
function hex(x) { return 0 + ("0x" x); } # relies on Thompson AWK
BEGIN {
print "NICEDBG -- Makes nicer output from DEBUG input and symbol table";
print "From \"Undocumented DOS\", 2nd edition (Addison-Wesley, 1993)";
print "Copyright (C) 1993 Andrew Schulman. All rights reserved.\n";
if (ARGC < 2) {
print "usage: nicedbg dbgfile [symtab] [datfile] > lstfile" ;
print "example: nicedbg int212f.out int212f.log > int212f.lst" ;
did_anything = 0;
exit;
}
else did_anything = 1;
# commonly-used regular expressions
SQ_BRACK = /\[.*\]/; # anything within square brackets
SEG_OFS = /\:/; # has a : in it
SEG_OVERRIDE = /[CDES]S\:/; # CS: or DS: or ES: or SS:
CALL_OR_JUMP = /CALL|LOOP|J.*/; # CALL, LOOP, JMP, J*
# read in optional symbol-table file
# lines in symtab file look like: xxxx:yyyy name
if (ARGC > 2) {
while (getline < ARGV[2]) # for each line in symbol table
ftab[$1] = ftab[$1] $2 ","; # put name into table for seg:ofs
close(ARGV[2]);
}
# read in optional data file
# lines in data file look like: xxxx name
# example: 0321 IN_DOS
if (ARGC > 3) {
while (getline < ARGV[3])
data[$1] = $2;
close(ARGV[3]);
}
ARGC = 2; # finished with sym, dat file
dbgfile = ARGV[1]; # switch over to DEBUG file
# debug file looks like: xxxx:yyyy XXXXXX op operands
# example: FDC8:4052 3C06 CMP AL,06 ; comments
while (getline < dbgfile) { # make pass 1 through debug file
if ($1 ~ SEG_OFS) {
split($1, so, ":");
if (! seg) {
seg = so[1]; # get segment for later use
start = hex(so[2]);
}
else
stop = so[2]; # take last one
}
if ($3 ~ CALL_OR_JUMP) {
if ($4 ~ /\:|\[.*\]|FAR/) # don't do [xxxx] or xxxx:yyyy etc.
continue;
# should also ignore e.g. CALL DI
if ($3 ~ /JMP/)
jmptab[get_off($1)] = $4; # jmptab for resolving JMP JMP
if (! (mk_fp($4) in ftab)) # put call/jmp target into table
ftab[mk_fp($4)] = (($3 ~ /CALL/) ? "func_" :
($3 ~ /LOOP/) ? "loop_" : "loc_") $4;
}
}
close(dbgfile);
stop = hex(stop);
# pass 2: build cross-ref table, improve some label names, etc.
while (getline < dbgfile) {
if ((did_jmpret == 1) && (! ($1 in ftab)))
not_reached[$1]++; # prev line did JMP/RET, but no label, so
did_jmpret = 0; # "not reached"; may be data or dead code
if ($3 ~ /I*RET|JMP/) {
did_jmpret = 1;
if ($1 in ftab) # if target is a ret/jmp, change label name
ftab[$1] = (($3 ~ /JMP/) ? "jmp_" : "ret_") get_off($1);
# oops, this will also replace labels supplied in sym file!
}
# below *not* "else if" -- JMP handled both places
# build xref table and outside-range table
if (($3 ~ CALL_OR_JUMP) && ($4 !~ SQ_BRACK) && ($5 !~ SQ_BRACK)) {
if ($4 ~ /FAR/)
outside[$5]++;
else if ($4 ~ SEG_OFS)
outside[$4]++;
else {
off = hex($4);
if ((off < start) || (off > stop))
outside[off]++;
}
if ($4 !~ /\:|FAR/) # don't do [xxxx] or xxxx:yyyy
xref[mk_fp($4)] = xref[mk_fp($4)] get_ftab_name($1) " ";
}
}
close(dbgfile);
}
{ # pass 3: for each line in dbg file
while (! ($1 ~ SEG_OFS)) { # ignore any lines without xxxx:yyyy
print; getline;
if (! $0) exit;
}
jmpline = "";
# indicate if this is possible unreached (dead) code; show
# cross-reference (xref) table; show all labels for this address
if ($1 in not_reached) { # possible dead code
print ""
print ";;; not reached?";
}
else if ($1 in ftab) { # if segment:offset in table
print ""
if (xref[$1])
print "; xref: " xref[$1] # show xref
nf = split(ftab[$1], label, ",");
for (i=1; i<=nf; i++)
if (label[i]) # show all labels for this addr
printf("%24s%s:\n", " ", label[i]);
ftab_found[$1] = 1;
}
# if a CALL, LOOP, or some kind of JMP, show eventual destination
# of any JMP JMP, and possibly replace number address with string name
if ($3 ~ CALL_OR_JUMP) {
if ($4 !~ /FAR/) {
if ($4 in jmptab)
jmpline = " -> " get_ftab_name(resolve_jmp_jmp($4));
$4 = get_ftab_name($4); # replace number with name
}
}
# cheap replacement of [xxxx] with names from data file
if (match($0, SQ_BRACK)) # match sets RSTART, RLENGTH
if ((addr = substr($0, RSTART+1, RLENGTH-2)) in data)
sub(SQ_BRACK, data[addr], $0); # sub() does substitution
# get rid of DEBUG segment override ugliness
if ($3 ~ SEG_OVERRIDE) {
ovride_addr = $1; # save to use on next line
byte = $2;
override = $3;
}
else if (ovride_addr) {
$1 = ovride_addr; ovride_addr = "";
$2 = byte $2;
sub(/\[/, override "[", $0); # plug in override:
}
# print out (possibly altered) line
if (! ovride_addr) {
printf("%s\t%-15s\t", $1, $2);
for (i=3; i<=NF; i++)
printf("%s ", $i);
if (jmpline)
printf("%s", jmpline);
printf("\n");
}
}
# print list of CALL, JMP, etc. references outside disasm range
END {
if (did_anything) {
printf("\n;; outside range %s:%04X-%04X:\n", seg, start, stop);
for (x in outside)
printf(";; " ((x ~ SEG_OFS) ? "%s" : "%04X") "\n", x);
# should suppress following if within a not-reached block?
printf("\n;; possible unresolved labels:\n");
for (x in ftab)
if (! (x in ftab_found))
printf(";; %s\n", ftab[x]);
}
}
With output from DEBUG in INT212F.OUT, a symbol table produced by FTAB in INT212F.LOG, and the optional data file INT212F.DAT, you can produce a nice looking disassembly of the main MSDOS.SYS code segment, INT212F.LST, with:
nicedbg int212f.out int212f.log int212f.dat > int212f.lst
We will examine this INT212F.LST file in more detail momentarily, but the following except provides some idea of what NICEDBG produces:
INT2F_12_18:
FDC8:4282 2E8E1EE73D MOV DS,CS:DOS_DS
FDC8:4287 C5368405 LDS SI,USER_SP
FDC8:428B C3 RET
; ...
INT21_34:
FDC8:4D59 E826F5 CALL INT2F_12_18
FDC8:4D5C C744022103 MOV WORD PTR [SI+02],IN_DOS
FDC8:4D61 8C5410 MOV [SI+10],SS
FDC8:4D64 C3 RET
This is quite usable. You can see that INT 21h AH=34h (Get InDOS Flag Address) calls the code for INT 2Fh AX=1218h (Get Caller's Registers) and then moves DOS_DS:0321 into the caller's ES:BX registers. This is just as you would expect.
You could make this even more readable by going into INT212F.LOG and taking the only partially useful names, such as INT21_34 and INT2F_12_18 produced by FTAB, and replacing them with more evocative names, such as GET_INDOS_34 and GET_STACKPTR_1218. But this is left as an exercise for the reader (who may in any case know all the DOS function numbers by heart and not require such a crutch). The point is simply that you can manually change or add to INT212F.LOG as you discover new functions. For example, you can add the following two functions that you already know about from running INTCHAIN:
FDC8:40F8 INT21_DISPATCH
FDC8:44BD INT2F_DISPATCH
Please note that INT212F.LST is not included on the accompanying disk, as redistributing a large piece of MS-DOS would obviously violate Microsoft's copyright! However, it should be easy for readers to produce their own personal copies, given the instructions in this chapter. Let us quickly summarize the steps involved in producing INT212F.LST:
The last point needs an explanation. Because code and data are intermixed within DOS, DEBUG and SYMDEB are likely to encounter data that they will misinterpret as code. This invalid code can throw off the unassembly of valid code further on in memory. The result is that INT212F.LST may contain, for example, several CALLs to func_9024 but, instead of showing code at offset 9024h, there is instead some bogus-looking instruction at offset 9023h. NICEDBG will list such possibly unresolved labels at the end of the listing; you can use this to split the DEBUG or SYMDEB u command into two or more parts. For example, let's say that there are valid-looking calls to func_9024, but no func_9024 itself. If the original DEBUG script contained the following command:
u fdc8:4052 b500
you can split this in two, making DEBUG restart unassembly at offset 9024h:
u fdc8:4052 9024
u fdc8:9024 b500
At this point, of course, you may idea of postprocessing DEBUG output a little ridiculous. You may want to switch to genuine disassembler such as V Communications' Sourcer.
Remember that we've disassembled just one MSDOS.SYS code segment. You can apply the same techniques to other parts of MS-DOS (the outside range list produced by NICEDBG is helpful here), to DR DOS, or to NetWare's NETX code.
Let's look at a small portion of the MS-DOS 6.0 disassembly produced by DEBUG with a little help from FTAB, INTCHAIN, and NICEDBG. Figure 6-18 below shows the code for a few simple DOS functions.
INT21_34:
FDC8:4D59 E826F5 CALL INT2F_12_18
FDC8:4D5C C744022103 MOV Word Ptr [SI+02],0321
FDC8:4D61 8C5410 MOV [SI+10],SS
FDC8:4D64 C3 RET
INT21_52:
FDC8:4D65 E81AF5 CALL INT2F_12_18
FDC8:4D68 C744022600 MOV Word Ptr [SI+02],0026
FDC8:4D6D 8C5410 MOV [SI+10],SS
FDC8:4D70 C3 RET
INT21_1F:
FDC8:4D71 B200 MOV DL,00
INT21_32:
FDC8:4D73 16 PUSH SS
FDC8:4D74 1F POP DS
FDC8:4D75 8AC2 MOV AL,DL
FDC8:4D77 E8415D CALL INT2F_12_19
FDC8:4D7A 7222 JB loc_4D9E
FDC8:4D7C C43EA205 LES DI,[05A2]
FDC8:4D80 26F6454480 TEST Byte Ptr ES:[DI+44],80
FDC8:4D85 7517 JNZ loc_4D9E
FDC8:4D87 E8B003 CALL func_513A
FDC8:4D8A E83749 CALL func_96C4
FDC8:4D8D E8CA03 CALL func_515A
FDC8:4D90 720C JB loc_4D9E
FDC8:4D92 E8EDF4 CALL INT2F_12_18
FDC8:4D95 896C02 MOV [SI+02],BP
FDC8:4D98 8C440E MOV [SI+0E],ES
FDC8:4D9B 32C0 XOR AL,AL
FDC8:4D9D C3 RET
; xref: FDC8:4D7A FDC8:4D85 FDC8:4D90
loc_4D9E:
FDC8:4D9E B0FF MOV AL,FF
FDC8:4DA0 C3 RET
INT21_0D:
FDC8:4DA1 B0FF MOV AL,FF
FDC8:4DA3 16 PUSH SS
FDC8:4DA4 1F POP DS
FDC8:4DA5 E89203 CALL func_513A
FDC8:4DA8 830E110604 OR Word Ptr [0611],+04
FDC8:4DAD E8844C CALL func_9A34
FDC8:4DB0 83261106FB AND Word Ptr [0611],-05
FDC8:4DB5 C706B50D0000 MOV Word Ptr [0DB5],0000
FDC8:4DBB BBFFFF MOV BX,FFFF
FDC8:4DBE 891E2000 MOV [0020],BX
FDC8:4DC2 891E1E00 MOV [001E],BX
FDC8:4DC6 E89103 CALL func_515A
FDC8:4DC9 B8FFFF MOV AX,FFFF
FDC8:4DCC 50 PUSH AX
FDC8:4DCD B82011 MOV AX,1120
FDC8:4DD0 CD2F INT 2F
FDC8:4DD2 58 POP AX
FDC8:4DD3 C3 RET
off, notice our old friends INT 21h AH=34h and 52h. Except for the clarity of the code displayed in figure 6-18, these hold no surprises for us. The functions are nearly identical. They both get the caller's register structure, and return different values into the caller's BX. Perhaps NICEDBG could be improved to recognize the caller's register structure and, where appropriate (which would be the difficult part) replace expressions such as [SI+02] and [SI+10] with something like CALLER_BX and CALLER_ES. That's for version 2.0!
More interesting is the code that appears next in figure 6-18 for INT 21h functions 1Fh and 32h. These Disk Parameter Block functions have been around for a while, but Microsoft only documented them starting in DOS 5.0. Note that the code for function 1Fh simply sets DL=0 and falls into the code for function 32h. This makes sense, since function 1Fh is Get Default DPB, and function 32h is Get DPB. Get DPB takes a drive number in DL and returns the DPB in DS:BX.
Where does the DPB come from? The Get DPB code calls several subfunctions not shown here, but armed with the NICEDBG output, you can examine the code for each of these subfunctions fairly easily. In essence, INT 21h AH=1Fh and AH=32h call the internal Set Drive function (INT 2Fh AX=1219h), which in turn calls the INT 2Fh AX=1217h function that we examined in figure 6-17. As noted there, this function sets the working Current Directory Structure field at DOS:05A2h (SDA+282h). Note that this is not the same as changing drives; it merely sets up a working area in the DOS data segment. When INT 2Fh AX=1219h has returned, Get DPB pulls the CDS pointer out of the working CDS field where the INT 2Fh function just put it. It then calls a subroutine that gets the DPB pointer from offset 45h in the CDS. Having examined the different subroutines that Get DPB calls, we can decorate the code with comments:
INT21_1F:
FDC8:4D71 B200 MOV DL,00 ; 0 = default drive
; fall through!
INT21_32:
FDC8:4D73 16 PUSH SS
FDC8:4D74 1F POP DS ; get DOS DS
FDC8:4D75 8AC2 MOV AL,DL
FDC8:4D77 E8415D CALL INT2F_12_19 ; Set Drive, like 2f/1217
FDC8:4D7A 7222 JB loc_4D9E
FDC8:4D7C C43EA205 LES DI,[05A2] ; SDA+282h = curr CDS ptr
FDC8:4D80 26F6454480 TEST Byte Ptr ES:[DI+44],80 ; CDS[43-44h] = flags
FDC8:4D85 7517 JNZ loc_4D9E ; if net/redir drive, fail
FDC8:4D87 E8B003 CALL func_513A ; enter crit #1 (2A/8001)
FDC8:4D8A E83749 CALL func_96C4 ; ES:BP get DPB from CDS[45h]
FDC8:4D8D E8CA03 CALL func_515A ; exit crit #1 (2A/8101)
FDC8:4D90 720C JB loc_4D9E ; fail?
FDC8:4D92 E8EDF4 CALL INT2F_12_18 ; get caller's regs
FDC8:4D95 896C02 MOV [SI+02],BP ; caller's BX
FDC8:4D98 8C440E MOV [SI+0E],ES ; caller's DS
FDC8:4D9B 32C0 XOR AL,AL ; al = 0 for success
FDC8:4D9D C3 RET
The final function to examine back in figure 6-18 is INT 21h AH=0Dh (Disk Reset). The function does its real work inside the call to func_9A34 (not shown), which loops over all buffers, calling the internal Flush Buffer function (INT 2Fh AX=1215h). But note in figure 6-18 that Disk Reset also calls INT 2Fh AX=1120h, which is the network redirector Flush All Disk Buffers function. This provides a good illustration of how the network redirector works as a series of hooks in DOS. At various key moments, DOS issues an INT 2Fh AH=11h call; any installed redirector can pick up the call and do what it needs (see chapter 8).
One of the things that probably isn't clear from the DOS code shown in this chapter, but which becomes clear from examining the INT212F.LST file, is that hooks play an important role in DOS. In addition to the INT 2Fh AH=11h redirector interface, DOS also checks the SHARE hooks. These, however, are implemented in a totally different manner from the redirector (see SHARHOOK.C at listing 8-XX in chapter 8). Of course, many DOS functions get passed down to installable device drivers; the DOS code calls these drivers using the Strategy and Interrupt pointers in the device driver header (see chapter 7).
Remember also that external programs probably hook many of these DOS calls. You saw earlier, for example, that SMARTDRV and DBLSPACE hook the Disk Reset call. Thus, it is a little misleading to view the INT 21h AH=0Dh handler in MSDOS.SYS in isolation. When examining the code for a DOS function, it is important to remember that DOS isn't just the code in MSDOS.SYS and IO.SYS, but it is the sum total of the interactions of this code with all the DOS extensions you are likely to find on a user's machine. This not only means understanding the role of programs such as Windows, SMARTDRV, MSCDEX, DOSKEY, and DBLSPACE, but also understanding where non-Microsoft programs such as Stacker, NetWare, and 386MAX come in. A good example of this, as we saw in chapter 4, is the way that the trivially-simple Set PSP function suddenly takes on new meaning and complexity when Novell NetWare is running.
As a more extensive, but still relatively self-contained, example, let's examine the DOS Move File Pointer function (INT 21h AH=42h), frequently known after its C/Unix equivalent as lseek. We had occasion to examine the DOS code for this function while working on chapter 8 of this book. An early draft of the network-redirector specification in chapter 8, in discussing the redirector INT 2Fh AX=1121h Seek From End function, asserted that "DOS never calls this function." Since this was based merely on empirical evidence (we never seen 2F/1121 called), it made sense to examine the DOS code to verify that DOS did not contain a call to INT 2Fh AX=1121h.
To our surprise, the DOS code for lseek did contain a call to this INT 2Fh function. It turns out that DOS only calls the redirector's Seek From End function under a special set of circumstances having to do with network FCBs and various SHARE modes. Frankly, we still don't quite understand this. In any case, the rest of the code for INT 21h AH=42h is fairly straightforward, yet long enough to be a little more interesting than the feeble little examines we've seen so far. In addition, there is some interesting Windows-related code in DOS that we'll encounter along the way.
Before we examine the disassembly listing for INT 21h AH=42h, call that the function has the following specification:
Move File Pointer
Input:
AH = 42h
AL = method (0 = from beginning; 1 = from current pos; 2 = from end)
BX = file handle
CX:DX = hi:lo offset from beginning, current, or end
INT 21h
Output success:
Carry clear
DX:AX = new hi:lo position
Output failure:
Carry set
AX = error value (1 = invalid function; 6 = invalid handle)
Microsoft's DOS programmer's reference further notes that "A program should never attempt to move the file pointer to a position before the start of the file. Although this action does not generate an error during the move, it does generate an error on a subequent read or write operation. A program can move the file position beyond the end of the file. On a subsequent write operation, MS-DOS writes data to the given position in the file, filling the gap between the previous end of the file and the given position with undefined data. This is a common way to reserve file space without writing to the file."
This tends to suggest that almost any CX:DX parameters to lseek are valid. Indeed, as we're about to see, the code does little more than move the CX:DX parameter into the file's SFT entry. The hard part is getting the SFT entry. To make sense of the code listing, you'll need to know the following offsets in the SFT (for further information, see the appendix under INT 21h AH=52h):
02h WORD open mode
05h WORD device info word
11h DWORD file size
15h DWORD current file position
2Fh WORD machine number (Windows VM ID)
Figure 6-20 shows the DOS code for INT 21h AH=42h (Move File Pointer). Many explanatory comments were added by hand to the code generated by NICEDBG.
; xref: FDC8:50D5 FDC8:9D52 FDC8:9DC1 FDC8:9E9C
INT21_42:
FDC8:A845 E8E100 CALL func_A929 ; TURNS BX HANDLE INTO
; ES:DI SFT (see fig. 6-21)
; xref: FDC8:A8B4
loc_A848:
FDC8:A848 7302 JNB loc_A84C
FDC8:A84A EB9E JMP jmp_A7EA -> loc_43ED ; couldn't: fail!
; xref: loc_A848
loc_A84C: ; ES:DI=valid SFT entry
FDC8:A84C 3C02 CMP AL,02 ; which move method?
FDC8:A84E 760A JBE loc_A85A
FDC8:A850 36C606230301 MOV Byte Ptr SS:[0323],01 ; SDA+3=error locus
FDC8:A856 B001 MOV AL,01 ; 1=invalid function
; note many jmp jmp in DOS code:
; A858 -> A7EA -> A7D8 -> A7D4 -> A716 -> A6FB -> 43ED
; usually to use short jmp, but is it still worth it?
; but can it ever be changed??
; xref: jmp_A8AB
jmp_A858:
FDC8:A858 EB90 JMP jmp_A7EA -> loc_43ED ; fail!
; xref: FDC8:A84E
loc_A85A:
FDC8:A85A 3C01 CMP AL,01
FDC8:A85C 720A JB loc_A868 ; below = 0
FDC8:A85E 771B JA loc_A87B ; above = 2
; handling seek method #1: from current pos
FDC8:A860 26035515 ADD DX,ES:[DI+15] ; SFT->file_pos
FDC8:A864 26134D17 ADC CX,ES:[DI+17]
; fall through to method #0
; xref: FDC8:A85C FDC8:A88A
loc_A868: ; #0: from beginning
FDC8:A868 8BC1 MOV AX,CX
FDC8:A86A 92 XCHG AX,DX ; DX:AX <- CX:DX
; xref: FDC8:A8A9
loc_A86B:
FDC8:A86B 26894515 MOV ES:[DI+15],AX ; update SFT->file_pos
FDC8:A86F 26895517 MOV ES:[DI+17],DX
FDC8:A873 E8FF99 CALL INT2F_12_18 ; get caller's regs
FDC8:A876 895406 MOV [SI+06],DX ; move into caller's DX
;;; later on, loc_43FD does MOV [SI], AX
;;; see table 6-2 for caller reg struct
; xref: jmp_A8EF
jmp_A879:
FDC8:A879 EBA7 JMP jmp_A822 -> loc_43E4 ; success!
; xref: FDC8:A85E
loc_A87B: ; #2: from end
FDC8:A87B 26F6450680 TEST Byte Ptr ES:[DI+06],80 ; dev info: NETWORK
FDC8:A880 750A JNZ loc_A88C
; xref: FDC8:A891 FDC8:A8A2
loc_A882:
FDC8:A882 26035511 ADD DX,ES:[DI+11] ; SFT->file_size
FDC8:A886 26134D13 ADC CX,ES:[DI+13] ; CX:DX += file_size
FDC8:A88A EBDC JMP loc_A868 ; go to method #0
; xref: FDC8:A880
loc_A88C: ; this is a network drive!
;;; This is seek method #2 (from end of file), and network bit is set
;;; in SFT. DOS may call a network redirector's 2F/1121 Seek From End
;;; handler, but only if some strange conditions are met: it can't
;;; be an FCB open, and certain SHARE bits must be set.
FDC8:A88C 26F6450380 TEST Byte Ptr ES:[DI+03],80 ; open mode: FCB!
FDC8:A891 75EF JNZ loc_A882 ; an FCB open
;;; this is not an FCB open ;;;
FDC8:A893 268B4502 MOV AX,ES:[DI+02] ; open mode
FDC8:A897 25F000 AND AX,00F0
FDC8:A89A 3D4000 CMP AX,0040 ; OPEN_SHARE_DENYNONE
FDC8:A89D 7405 JZ DO_2F_1121 ; redir seek from end
FDC8:A89F 3D3000 CMP AX,0030 ; OPEN_SHARE_DENYREAD
FDC8:A8A2 75DE JNZ loc_A882 ; no: update caller's regs
; xref: FDC8:A89D
DO_2F_1121:
FDC8:A8A4 B82111 MOV AX,1121 ; Call network redirector's
FDC8:A8A7 CD2F INT 2F ; Seek from End function
FDC8:A8A9 73C0 JNB loc_A86B ; update caller's DX:AX from SFT
; xref: jmp_A8F9
jmp_A8AB:
FDC8:A8AB EBAB JMP jmp_A858 -> loc_43ED ; fail!
sft = get_sft(handle) // see below
if (seek from begin) then set sft->file_pos = new_pos
if (seek from end) then (signed) new_pos += file_size; goto seek from begin
if (seek from current) then new_pos += sft->file_pos; goto seek from begin
set caller's new_pos (DX:AX) = sft->file_pos
We haven't explained the very line of the INT 21h AH=42h handler, however, where DOS calls a subroutine, here called func_A929, to turn the caller's BX file handle into an SFT entry in ES:DI. This is shown in figure 6-21 below. The code for func_A929 turns out to be very interesting, because it shows some of MS-DOS's interaction with Windows. As indicated in the xref generated by NICEDBG, this same subroutine is also called by other parts of DOS, including the code for functions 3Eh and 68h:
; xref: INT21_3E INT21_68 FDC8:A7E5 INT21_42 FDC8:A8B1 FDC8:A907
func_A929:
; func_A62A turns BX handle
FDC8:A929 E8FEFC CALL func_A62A ; into ES:DI SFT (fig. 6-22)
FDC8:A92C 721C JB ret_A94A ; percolate error up
; valid handle, but it could be for another DOS box!
FDC8:A92E 50 PUSH AX
FDC8:A92F 36F606301001 TEST Byte Ptr SS:IN_WIN3E,01
FDC8:A935 7404 JZ loc_A93B
FDC8:A937 33C0 XOR AX,AX
FDC8:A939 EB08 JMP loc_A943
; xref: FDC8:A935
loc_A93B: ; Windows running
FDC8:A93B 36A13E03 MOV AX,SS:MACHINE_ID
FDC8:A93F 263B452F CMP AX,ES:[DI+2F] ; SFT->share_machine
; xref: FDC8:A939
loc_A943: ; okay
FDC8:A943 58 POP AX
FDC8:A944 7501 JNZ loc_A947
FDC8:A946 C3 RET
; xref: FDC8:A944
loc_A947: ; failure
FDC8:A947 B006 MOV AL,06 ; "invalid handle"
FDC8:A949 F9 STC
; xref: FDC8:A92C
ret_A94A:
FDC8:A94A C3 RET
This code deals with the fact that, under Windows Enhanced mode, it is possible to have multiple processes in different DOS boxes that happen to have the same PSP ID (though note that SYSTEM.INI has a UniqueDOSPSP= setting). Normally, the current PSP and a file handle are sufficient to specify an open file. Under Windows Enhanced mode, the current virtual machine (VM) ID is also needed to specify an open file.
In this subroutine, DOS (a) checks if Windows Enhanced mode is running (see chapter 1 to see how DOS initially sets the IN_WIN3E flag); (b) gets the current VM ID (see chapter 1 to see how the DOSMGR VxD patches DOS's MACHINE_ID word with the current VM ID); and (c) compares the current VM ID against the machine ID field at offset 2Fh in the SFT. If the SFT's machine ID doesn't match the current VM, lseeks fails with error code 6, as if the handle in BX were invalid. It wasn't invalid per se, but it belonged to another process that happened to have the same PSP in another DOS box.
We still haven't seen, though, how DOS turns a file handle in BX into an SFT entry in ES:DI. This is accomplished by func_A62A in figure 6-22 below, which turns turns the BX handle (which is really an index into the current PSP's Job File Table) into a JFT pointer (equivalent to INT 2Fh AX=1220h), then turns the JFT pointer into an SFT index, and then turns the SFT index into an SFT entry (equivalent to INT 2Fh AX=1216h). The disassembly below starts off with DOS's INT 2Fh AX=1220h handler; func_A62A appears in the middle of the listing.
; xref: FDC8:4F01 func_A62A loc_A671 loc_A6EA loc_A7DD FDC8:A90F FDC8:A924
INT2F_12_20:
FDC8:A60D 2E8E06D73D MOV ES,CS:DOS_DS ; get DOS_DS
FDC8:A612 268E063003 MOV ES,ES:CURR_PSP ; use current PSP
FDC8:A617 263B1E3200 CMP BX,ES:[0032] ; # files in JFT
FDC8:A61C 7204 JB loc_A622
FDC8:A61E B006 MOV AL,06 ; invalid handle
; xref: FDC8:A637
loc_A620: ; fail
FDC8:A620 F9 STC
FDC8:A621 C3 RET
; xref: FDC8:A61C
loc_A622: ; file handle < # files
FDC8:A622 26C43E3400 LES DI,ES:[0034] ; JFT ptr in PSP
FDC8:A627 03FB ADD DI,BX ; add on BX handle
; xref: FDC8:A62D
ret_A629:
FDC8:A629 C3 RET ; return ptr -> SFT ndx
;;; code to turn handle in BX into SFT entry in ES:DI ;;;
; xref: FDC8:4EDC INT21_4400_01 INT21_4402_03 FDC8:61DD INT21_440A \
; FDC8:757B func_A929 FDC8:B27B
func_A62A:
FDC8:A62A E8E0FF CALL INT2F_12_20 ; turn BX handle->ES:DI JFT
FDC8:A62D 72FA JB ret_A629
FDC8:A62F 26803DFF CMP Byte Ptr ES:[DI],FF ; unused!
FDC8:A633 7504 JNZ loc_A639
FDC8:A635 B006 MOV AL,06 ; invalid handle
FDC8:A637 EBE7 JMP loc_A620 ; fail
; xref: FDC8:A633
loc_A639:
FDC8:A639 53 PUSH BX
FDC8:A63A 268A1D MOV BL,ES:[DI] ; JFT entry -> SFT index
FDC8:A63D 32FF XOR BH,BH
FDC8:A63F E80200 CALL INT2F_12_16 ; SFT index -> SFT ES:DI
FDC8:A642 5B POP BX
FDC8:A643 C3 RET
; xref: FDC8:6DF1 FDC8:A516 FDC8:A63F FDC8:A686
INT2F_12_16: ; SFT ndx -> ES:DI SFT
FDC8:A644 2E8E06D73D MOV ES,CS:DOS_DS ; get DOS DS
FDC8:A649 26C43E2A00 LES DI,ES:[002A] ; SysVars+4 -> first SFT
; xref: FDC8:A65E
loc_A64E: ; walk SFT chain
FDC8:A64E 263B5D04 CMP BX,ES:[DI+04] ; SFT # files
FDC8:A652 720E JB loc_A662 ; in this table!
FDC8:A654 262B5D04 SUB BX,ES:[DI+04] ; subtract #files this SFT
FDC8:A658 26C43D LES DI,ES:[DI] ; follow linked list
FDC8:A65B 83FFFF CMP DI,-01 ; end of SFTs?
FDC8:A65E 75EE JNZ loc_A64E ; loop to next SFT
FDC8:A660 F9 STC ; invalid SFT index
FDC8:A661 C3 RET ; fail!
; xref: FDC8:A652
loc_A662: ; in this SFT
FDC8:A662 50 PUSH AX
FDC8:A663 B83B00 MOV AX,003B ; SFT each size entry
FDC8:A666 F6E3 MUL BL
FDC8:A668 03F8 ADD DI,AX ; offset of this entry
FDC8:A66A 58 POP AX
FDC8:A66B 83C706 ADD DI,+06 ; skip past SFT header
FDC8:A66E C3 RET
The basic sequence here is: BX handle -> JFT entry (2F/1220) -> SFT ndx -> SFT entry (2F/1216).
Recall that the file handle in BX is really an index into the current PSP's JFT. Thus, the code for INT 2Fh AX=1220h gets the current PSP from the familiar global DOS variable, and checks PSP:0032 (which holds the maximum number of file handles available to this PSP). If the handle in BX is < the file-handle maximum (i.e., the JFT size), then this code gets a far pointer to the JFT from PSP:0034 and adds BX onto the JFT pointer, yielding a far pointer in ES:DI to the file 's JFT entry.
Each JFT entry is a single byte that holds an index into the SFT, or FFh to indicate an unused entry. The code above ensures that the caller hasn't passed in a file handle whose corresponding JFT entry is unused.
If DOS has a valid SFT index, it passes it to a function (equivalent to INT 2Fh AX=1216h), which returns a pointer to the corresponding SFT entry. From the listing above, we can see how this code works: DOS gets a pointer to the SFT from SysVars+4, and walks the SFT chain, comparing the SFT index against the number of files in each SFT until it finds the right one. DOS then multiples the remaining SFT index by 3Bh (the size of an SFT entry) and adds it onto the start of this SFT, to form an SFT entry.
That's it. We've now examined the DOS code for lseek in its entirety. We've seen how the specification for INT 21h AH=42h is actually implemented in working code, how DOS gets from a file handle in BX to an SFT entry in ES:DI, and how it can use this SFT to get and set the current file position and size, and also to check the Windows VM ID. But remember that this is DOS, so it possible and even likely that some important third-party extensions such as NetWare hook the lseek function. Our disassembly of the DOS kernel of course neglects to deal with whatever changes these might make to the behavior of lseek.
We have only presented a fairly random selection of extremely simple DOS functions, viewed in isolation from key third-party DOS extensions. Just to properly discuss this simple DEBUG disassembly of 30 kbytes of DOS code would require an entire book. In fact, properly explaining each function, examining its interactions with resident software such as SmartDrv, Windows, and NetWare could easily be the subject of several books. For further in-depth discussions of this code, see Chappell's "DOS Internals" and Mike Podanoffsky's "DOS: The Source" (this forthcoming book is described in more detail below).
As noted earlier, NICEDBG places an "outside range" list at the end of a disassembly listing. This list indicates locations that are called or jumped to in the listing, but which don't themselves appear in the listing. This list provides additional addresses for unassembly by DEBUG or SYMDEB.
For example, the disassembly of the MSDOS.SYS code segment includes the function INT2F_DISPATCH. As you know from the earlier investigation in figure 6-13, the INT 2Fh handler in MSDOS.SYS jumps to the handler in IO.SYS. Here is how this shows up in the INT212F.LST file produced by NICEDBG:
; xref: FDC8:44DA FDC8:462F FDC8:4687 FDC8:46E0
jmp_44DF:
FDC8:44DF EA05007000 JMP 0070:0005
; ...
;; outside-range FDC8:4045-B800:
;; 0070:0005
; ...
You can use this one address, 0070:0005, as the starting point for a disassembly of the IO.SYS code:
C:\UNDOC2\CHAP6>symdeb
-u 0070:0005 0005
0070:0005 EA93087000 JMP 0070:0893
-u 0070:0893 0893
0070:0893 2EFF2EE606 JMP FAR CS:[06E6]
-dd 0070:06e6 06e6
0070:06E6 FFFF:1302
-u ffff:1302
FFFF:1302 80FC13 CMP AH,13
FFFF:1305 7413 JZ 131A
FFFF:1307 80FC08 CMP AH,08
FFFF:130A 743B JZ 1347
FFFF:130C 80FC16 CMP AH,16
FFFF:130F 7479 JZ 138A
FFFF:1311 80FC4A CMP AH,4A ;'J'
FFFF:1314 7503 JNZ 1319
FFFF:1316 E9A700 JMP 13C0
FFFF:1319 CF IRET
-q
C:\UNDOC2\CHAP6>type io.scr
u ffff:1302 1319
q
C:\UNDOC2\CHAP6>symdeb /x < io.scr > io.out
C:\UNDOC2\CHAP6>nicedbg io.out > io.lst
C:\UNDOC2\CHAP6>type io.lst
; ....
;; outside range FFFF:1302-1319:
;; 131A
;; 1347
;; 138A
;; 13C0
Now, of course, we expand the unassembly range for SYMDEB, based on the addresses in the outside range list. Also, we can start to create a file with symbolic names:
C:\UNDOC2\CHAP6>symdeb
-u 0070:0005 0005
0070:0005 EA93087000 JMP 0070:0893
-u 0070:0893 0893
0070:0893 2EFF2EE606 JMP FAR CS:[06E6]
-dd 0070:06e6 06e6
0070:06E6 FFFF:1302
-u ffff:1302
FFFF:1302 80FC13 CMP AH,13
FFFF:1305 7413 JZ 131A
FFFF:1307 80FC08 CMP AH,08
FFFF:130A 743B JZ 1347
FFFF:130C 80FC16 CMP AH,16
FFFF:130F 7479 JZ 138A
FFFF:1311 80FC4A CMP AH,4A ;'J'
FFFF:1314 7503 JNZ 1319
FFFF:1316 E9A700 JMP 13C0
FFFF:1319 CF IRET
-q
C:\UNDOC2\CHAP6>type io.scr
u ffff:1302 1319
q
C:\UNDOC2\CHAP6>symdeb /x < io.scr > io.out
C:\UNDOC2\CHAP6>nicedbg io.out > io.lst
C:\UNDOC2\CHAP6>type io.lst
; ....
;; outside range FFFF:1302-1319:
;; 131A
;; 1347
;; 138A
;; 13C0
We continue in this way until no unresolved references remain. As noted earlier, sometimes DEBUG and SYMDEB get thrown off tracks because of data residing in the middle of a code segment. Based on the NICEDBG "unresolved label" list, you may need to split a single u command in a DEBUG script into two or more separate u commands.
Of course, the techniques shown here for disassembly in memory of MSDOS.SYS and IO.SYS also work for any other resident software. In figure 6-11, for example, we saw SMARTDRV, MSCDEX, DOSKEY, SHARE, PRINT, COMMAND.COM, and so on, all camped out on the INT 2Fh chain. You can submit any of the addresses displayed by INTCHAIN to DEBUG or SYMDEB for diassembly, and process the resulting output with NICEDBG.
However, it is much easier to disassemble separate programs such as SMARTDRV, MSCDEX, COMMAND, and PRINT on disk rather than in memory, because these programs don't involve the segment-moving contortions of the DOS kernel. PRINT in particular is probably the most-disassembled piece of DOS, as this was how many TSR writers learned their craft. You can use a disassembler such as Sourcer to examine these programs.
Given the ability to reverse engineer DOS, an almost infinite amount of information on DOS programming is readily available, at your fingertips: to answer some question about DOS, look at the code running on your machine. But one obvious problem with this approach is what true in one configuration may not be true in another. Applications patch DOS; DOS changes (though not much, in truth) from one version to version. Describing software based on its source code (whether supplied or disassembled) can either be the only accurate way to find out what the software really does, or it can be dangerous, relying on features that may change. There are no certainties here. Your best bet is to examine the source code, but to realize how it may change, either because of future versions, or because of unforseen interactions with other software.
[This was written in 1993, long before the author became an attorney, and in any case should not be construed as legal advice.]
Among many programmers there seems to be some doubts about the legality of what we've been doing in this chapter. Programmers frequently think that disassembling Microsoft's code is illegal, and even that it is somehow a full-blown criminal (rather than civil) offense, punishable by a stiff prison sentence! We had better look into this now.
The following discussion of the legalities of disassembly was not written by an attorney, and should not in any way be viewed as legal advice. However, I have benefited enormously from discussions with Gene K. Landy, a partner at the law firm of Shapiro, Israel & Weiner, P.C. in Boston. Any errors and misconceptions of course remain mine.
Landy is the author of a superb book/disk package, The Software Developer's and Marketer's Legal Companion , published by Addison-Wesley (1993), which includes several extremely useful discussions of reverse engineering. Chapter 1 discusses reverse engineering in the context of copyright, including the important Sega v. Accolade case. Chapter 2 discusses software trade secrets and confidentiality agreements. Chapter 11 covers shrink-wrap licenses and warranties, and the standard shrink-wrap license limitation on reverse engineering, noting the important case of Vault v. Quaid. This is a fine book that every software developer will want to have in these troubled, legally-complex, times.
Why the typical programmer's idea that you can wind up behind bars just for having seen the CLI instruction at the beginning of the INT 21h dispatch code? Quite simply, because the standard license agreement that comes with all Microsoft products states, as plain as day:
OTHER RESTRICTIONS.... You may not reverse engineer, decompile, or disassemble the software.
The very top of the license agreement states that, "This is a legal agreement between you (either an individual or an entity) and Microsoft Corporation. By opening the sealed software packet(s) you are agreeing to be bound by the terms of this Agreement."
Well, that settles it, doesn't it? If you use any Microsoft software, you have entered into a binding legal agreement not to disassemble it, even if disassembly were otherwise a legitimate activity, right?
No. Attorneys have long questioned whether shrink-wrap licenses are binding, because of the mechanism they use. The few court cases that have decided issues of shrink-wrap licenses have spread further doubt about their effectiveness. As Landy explains in his chapter on shrink-wrap licenses,
The central concept of a shrink wrap license is its system of acceptance or rejection: If you accept the contract, you tear open the envelope; if you reject it, you return the package for a refund. But does this "tear open" concept work? Does the law really allow the licensor to force the user to this choice? ...
A fundamental idea in contract law, from its eighteenth century roots to the present, is thebargain <197>what lawyers sometimes call a "meeting of the minds." In a classic contract, the terms are bargained out, then the sale takes place as agreed. While the sale of goods in all states (except Louisiana) is now governed by a state statute, the Uniform Commercial Code, the same concept has carried over. A contract and its terms are agreed before or at time of the sale. The problem with the Shrink Wrap License is that the retail software sale is over and done with before the customer is presented with the one-sided terms of the Shrink Wrap license. After the sale is already made, it is too late to try to impose adverse terms.
Similarly, Raymond T. Nimmer's excellent textbook, The Law of Computer Technologynotes that "The attempt to alter the expectations of the common purchaser by virtue of a printed form included within the product package is unlikely to be successful."
How about the specific shrink-wrap license limitation against disassembly and reverse engineering? A number of important cases have held that shrink-wrap or tear-me-open license agreements cannot be used to outlaw reverse engineering. Both Landy's book and Nimmer's discusses the important case of Vault v. Quaid (1987-1988). The state of Louisiana had enacted special legislation to validate various aspects of shrink-wrap licenses, including the restriction on reverse engineering. Vault (a California corporation) took Quaid (a Canadian corporation) to court in Louisiana to try to take advantage of this exceptional law. Unfortunately for Vault, but fortunately for those who think that disassembly is an important consumer right, the court ruled that the Louisiana statute was preempted by federal law.
So Microsoft's shrink-wrap license limitation against disassembly probably isn't worth the paper it's printed on.
How about the law of "trade secrets"? To begin with, reverse engineering is actually one of the few legitimate ways to discover a trade secret. The Uniform Trade Secrets Act (UTSA), adapted in the mid-1980s by almost all states, says explicitly that discovery through reverse engineering is a proper means of gaining access to non-patented trade secrets. Choosing one of the many books on intellectual property more or less at random, we find (Roger E. Schechter, Unfair Trade Practices and Intellectual Property , pp. 135-136, italics added):
REVERSE ENGINEERING IS NOT IMPROPER MEANS
Many products are manufactured pursuant to plans or with technologies that are trade secrets and then sold to the public at large. In some cases the method of manufacture of these items may be discovered by careful study of the object. Typical methods of discovery include taking the product apart or performing experiments on it. This process of analysis is usually called "reverse engineering." Numerous cases hold that reverse engineering is not an improper means of learning a trade secret. Risk of discovery by reverse engineering is a risk that a firm takes when it chooses to rely on trade secret protection for a valuable commercial asset. Note that if a firm secures patent protection for a new device or manufacturing process it is protected against "reverse engineering." This is one of the most important differences between patent and trade secret protection.
Given that MS-DOS is not patented (the two patent numbers, 4,955,066 and 5,109,433, in the front of all Microsoft's manuals are for data compression, as used for example in Microsoft's help compilers), it then all seems to be quite straightforward: as far trade secret law is concerned, reverse engineering is okay. The rationale here is that trade secret law is basically about the loyalty of employees or others who receive important business information in confidence. You violate trade secret law by committing, inducing or exploiting violations of trust. One does not violate anyone's trust by disassembling a product purchased on the open market.
So far, the shrink-wrap license statement against disassembling seems ineffective, and trade secrets law says disassembly is okay. What about the fact that MS-DOS is copyrighted? Does copyright law permit us to study how DOS works internally, and then build products based on this new-found knowledge? For example, does it violate Microsoft's copyright to figure out how IO.SYS preloads DBLSPACE.BIN in MS-DOS 6, and then write a replacement for DBLSPACE.BIN that supports the same interface?
Disassembly is sometimes regarded as a form of copying (translation from one medium to another, or one language to another), and therefore as possible copyright infringement. However, disassembly for the purposes of achieving compatibility is generally regarded as "fair use." An important decision by the Court of Appeals for the Ninth Circuit in Sega v. Accolade (August 1992), overturning a lower court's ruling, held that Accolade's use of knowledge reverse-engineered from the Sega Genesis system did not violate Sega's copyright and constituted fair use. According to the court (as quoted in UNIX Review , May 1993),
We conclude that where disassembly is the only way to gain access to the ideas and functional elements embodied in a copyrighted computer program and where there is a legitimate reason for seeking such access, disassembly is a fair use of the copyrighted work, as a matter of law.
The importance of Sega v. Accolade was underlined in a comment in Microprocessor Report (December 9, 1992): "For the industry, many can breathe a deep sigh of relief. No longer are we unwitting copyright violators because we need to understand the parameters to an undocumented `Int 21' call."
Naturally, not all members of the industry breathed a sigh of relief upon hearing the appeals court's ruling. In particular, a group calling itself the Business Equipment Manufacturers, which includes IBM, Intel, and Microsoft, is seeking stronger protection against reverse engineering. Arguing for greater protection for reverse engineering is the so-called American Committee for Interoperable Systems, which includes Sun Microsystems, Amdahl, and Chips & Technologies (see "Reverse Engineering Reversals," Upside , May 1993).
If disassembly for the purposes of achieving compatibility is okay (and this, by the way, is also true in Europe under article 6 of the EC's directive on software protection), then how about this book's quotations from disassembly listings? Have we violated Microsoft's copyright by reprinting several chunks of code from MS-DOS and Windows in this book?
Again, no. For purposes of copyright, computer programs are considered to be "literary works." While it is a bogus notion that a compiled program without its source code merits being called a literary work, if the phrase "literary work" means anything at all in the context of computer software, it must include the possibility for literary criticism. Our inclusion of brief excerpts from disassembly listings is essentially a form of scholarly quotation, which is one of the oldest forms of fair use (see William S. Strong, The Copyright Book , 4th edition, chapter 8).
Remember too that throughout this chapter we have relied on DEBUG, a tool which Microsoft provides with every copy of MS-DOS. Microsoft has made no effort to secure MS-DOS against disassembly, especially given DEBUG's ability to trace into an INT 21h or INT 2Fh call.
Is there any alternative to disassembly? One alternative is of course to rely entirely on the vendor's documentation and not consider whether this documentation is an accurate reflection of the actual software. But as the reader has probably figured out by now, relying on vendor documentation has as many risks as does relying on undocumented behavior that has been discerned through disassembly.
Depending on what you are interested in, there may be another, better alternative to disassembly: source code.
For example, programmers' questions sometimes aren't really about how the operating system behaves in a certain circumstance, but about what their compiler's run-time library (RTL) does. There is a persistent confusion among many programmers of the difference between a FILE* in C and a DOS file handle. Programmers often call the DOS Set Handle Count function (INT 21h AH=67h) and then wonder why the C fopen() function still fails. Confusion such as this can be cleared up by a careful study of the RTL source code. Both Microsoft C and Borland C++ come with RTL source code.
Sometimes, rather than having specific questions about MS-DOS, programmers are just curious about how operating systems work in general. In this case, the best approach is probably to study one of the several excellent books available on the design and implementation of UNIX. Some of these, such as Bach's Design of the UNIX Operating System and Andleigh's UNIX System Architecture , present detailed pseudocode for UNIX. Others, such as Tanenbaum's wonderful Operating Systems: Design and Implementation (MINIX) and Comer's Operating System Design: The XINU Approach , come with complete source code for UNIX workalikes. Despite the numerous differences between DOS and UNIX, the books should be required reading for anyone planning to delve into DOS internals. DOS's handling of memory, processes, files, devices, and so on, can often best be understood by contrasting it with the design and implementation of a well-understood system such as UNIX.
For a more specifically DOS-like approach to operating system design and implementation, another alternative to disassembly of MS-DOS is to examine the source code that is available for several DOS workalikes. Embedded DOS from General Software (Redmond WA) has Steve Jones's superb documentation on DOS internals (for an excellent discussion of making a fully-reentrant DOS, see Steve's article "DOS Meets Real-Time" in the February 1992 Embedded Systems Programming ). General Software's Utility SDK and Device Driver SDK come with complete source code in C for versions of utilities such as CHKDSK, FORMAT, FDISK, DISKCOPY. ROM DOS 5 from Datalight (Arlington WA) is also available with source code.
Last, but not least, Mike Podanoffsky (mikep@world.std.com) has written RxDOS, an inexpensive DOS available with fully commented, assembly language source code. Podanoffsky is currently writing a full-length book on RxDOS, tentatively titled DOS: The Source , that will be available in 1994. While obviously not identical to the MS-DOS source, this source code may be more than adequate for your needs. For example, figure 6-23 below shows the implementation of INT 21h functions 50h, 51h, and 52h from RXDOS.ASM:
;''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''';
; 50h Set PSP Address ;
;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -;
; bx contains PSP address to use ;
;...............................................................;
_SetPSPAddress:
mov word ptr [ _RxDOS_CurrentPSP ], bx ; Seg Pointer to current PSP
ret
;''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''';
; 51h Get PSP Address ;
;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -;
; bx contains PSP address to use ;
;...............................................................;
_GetPSPAddress:
mov bx, word ptr [ _RxDOS_CurrentPSP ] ; Seg Pointer of current PSP
RetCallersStackFrame es, si
mov word ptr es:[ _BX ][ si ], bx
ret
;''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''';
; 52h Get Dos Data Table Pointer ;
;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -;
; es:bx returns pointer to dos device parameter block ;
; --- DOS Undocumented Feature -------------------------------- ;
;...............................................................;
_GetDosDataTablePtr:
RetCallersStackFrame es, si
mov word ptr es:[ _ExtraSegment ][ si ], ds
mov word ptr es:[ _BX ][ si ], offset _RxDOS_pDPB
clc
ret
There are no big surprises here (really, how else could Get and Set PSP be implemented, anyway?), but we can see that this accurately reflects MS-DOS, and that having this code earlier in the chapter might have saved us a lot of trouble.
More interesting, figure 6-24 below shows the RxDOS implementation of lseek, the MS-DOS implementation of which we saw earlier, in figure 6-20. The RxDOS code provides a useful guide to the MS-DOS disassembly.
;''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''';
; 42h Lseek (Move) File Pointer ;
;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -;
; al move method ;
; bx handle ;
; cx:dx distance to move pointer ;
;...............................................................;
_MoveFilePointer:
Entry
def _method, ax
def _handle, bx
ddef _moveDistance, cx, dx
ddef _newPosition
mov ax, bx ; handle
call MapAppToSysHandles ; map to internal handle info
call FindSFTbyHandle ; get corresponding SFT (es: di )
jc _moveFilePointer_36 ; if could not find -->
getdarg cx, dx, _moveDistance
mov ax, word ptr [ _method ][ bp ]
Goto SEEK_BEG, _moveFilePointer_beg
Goto SEEK_CUR, _moveFilePointer_cur
Goto SEEK_END, _moveFilePointer_end
SetError -1, _moveFilePointer_36
;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
; seek from end
;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
_moveFilePointer_end:
add dx, word ptr es:[ sftFileSize. _low ][ di ]
adc cx, word ptr es:[ sftFileSize. _high ][ di ]
jmp short _moveFilePointer_beg
;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
; seek from current position
;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
_moveFilePointer_cur:
add dx, word ptr es:[ sftFilePosition. _low ][ di ]
adc cx, word ptr es:[ sftFilePosition. _high ][ di ]
; jmp short _moveFilePointer_beg
;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
; seek from beginning
;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
_moveFilePointer_beg:
mov word ptr es:[ sftFilePosition. _low ][ di ], dx
mov word ptr es:[ sftFilePosition. _high ][ di ], cx
;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
; Return
;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
_moveFilePointer_36:
RetCallersStackFrame ds, bx
mov word ptr [ _AX ][ bx ], dx
mov word ptr [ _DX ][ bx ], cx
Return
If you want a disassembly of genuine MS-DOS, but don't want to DIY (do it yourself), and for some reason would be happy with a disassembly of DOS 1.1 or 2.1, Information Modes (Denton TX) sells inexpensive disassembly listings of these early versions of DOS. Imodes used the information gleaned from its long-ago disassembly project as part of its well-known product, The $25 Network ("Skeptical? We make believers! Over 15,000 sold"). For example, figure 6-25 below shows Imodes' rendition of the Get and Set PSP functions from D1.ASM, a disassembly dated April 1987 (it is an interesting reflection on the state of knowledge about DOS internals at the time that function 52h is labelled "get device driver list").
;........................... Set current PSP ......................... Fn 50
L10D6:
MOV CS:L0191,BX ;current PSP seg
RET_NEAR
;........................... Get current PSP ......................... Fn 51
L10DC:
CALL L0C1A ;ds:si--> user's stack
PUSH CS:L0191 ;
POP [SI+2] ;return in bx
RET_NEAR
Figure 6-26 below shows the Imodes interpretation of the lseek function from DOS 2.1, which you can compare against the MS-DOS 6.0 disassembly in figure 6-20 and the RxDOS implementation in figure 6-24.
;........................... Lseek (handle) .......................... Fn 42
;bx = handle
;cx_dx = hi_low dword offset
;al = seek mode, 0 - from file start
; 1 - from current position
; 2 - from file end
;return: cy=0, dx_ax = new position (from start)
; - or -
;return: cy=1, ax = 1 - invalid function (mode)
; 6 - invalid handle
L3BD5:
CMP AL,3 ;is method in range 0..2 ?
JC L3BDD ;no: yes-->
MOV AL,1 ;err = invalid function
L3BDB:
JMP SHORT L3BD3 ;dos error return
L3BDD:
PUSH SS
POP DS
CALL L38FB ;with bx=handle, get handle defn.
PUSH ES
POP DS
JC L3BD1 ;if handle bad--> ret, invalid handle
TEST BYTE PTR [DI+1Bh],80h ;is char device?
JZ L3BF2 ;yes: no-->
XOR AX,AX ;record = 0 always
XOR DX,DX
JMP SHORT L3C08 ;--> set random record fields
L3BF2:
DEC AL ;is method 0, from file start ?
JL L3C05 ;no: yes-->
DEC AL ;is method 1, from current position ?
JL L3C18 ;no: yes-->
;. . . . . . . . . . . . . . method 2, from end of file
XCHG DX,AX ;ax = LSWord
XCHG DX,CX ;dx = MSWord
ADD AX,[DI+13h] ;add fcb's file size
ADC DX,[DI+15h] ;
JMP SHORT L3C08 ;--> set fields
;. . . . . . . . . . . . . . method 0, from start of file
L3C05:
XCHG DX,AX ;ax = LSWord
XCHG DX,CX ;dx = MSWord
As with the PSP functions, this disassembly of lseek in DOS 2.1 bears many similarities to the disassembly of lseek in DOS 6.0. On the other hand, the DOS 2.1 version of course does not do Windows, and doesn't contain any network-redirector code.
But perhaps you care deeply and desperately about getting the genuine article: commented source code from Microsoft for MS-DOS 5.0 and higher. Microsoft does not publicize the product a great deal, but Microsoft will sell you an OEM Adaptation Kit, upon signing of a license agreement. Microsoft's OAK comes on an oddly-formatted tape cartridge, but a version on normal PC diskettes is available from Annabooks (San Diego CA).
The contents of the OAK are Microsoft confidential, so unfortunately we cannot reproduce any of it here, but we can give you some idea of its contents:
\DOS50OAK
\BIOS
msbio1.asm
msbio2.asm
sysinit1.asm
sysinit2.asm
...
\BOOT
msboot.asm
...
\CMD
\COMMAND
\FORMAT
\MODE
...
\DEV
\ANSI
\HIMEM
...
\DOS
fat.obj
getset.obj
handle.obj
msdisp.obj
...
\H
cds.h
dpb.h
sysvar.h
...
\INC
arena.inc
bpb.inc
mult.inc
pdb.inc
sysvar.inc
win386.inc
wpatch.inc
...
As you can see from this very partial directory tree, Microsoft supplies some components of the OAK in .ASM source code form, and others are supplied as .OBJ files. The idea, of course, is that the OEM will change parts of IO.SYS but not MSDOS.SYS, so IO.SYS comes with source, but MSDOS.SYS comes only with .OBJ files. Having .OBJ files is almost as good as having source code, though, since .OBJ files contain names for functions and variables. An .OBJ disassembler such as WDISASM (included with Watcom C) can basically regenerate the source code, missing only comments (which are probably out-of-date and misleading anyway).
Examination of the OAK contents mostly confirms what has already been known for many years as a result of reverse engineering. However, it is sometimes interesting to know the actual names for undocumented functions as they appear in Microsoft's source code. For example, the undocumented structure generally called the List of Lists is called SysInitVars in the DOS source because the structure is actually intended for use by SYSINIT. INT 21h AH=52h, which returns a pointer to this structure, and which is generally called Get List of Lists or Get SysVars, is called GET_IN_VARS in the DOS source. It turns out that there is little correspondence between the documented names for INT 21h functions and their actual names in the DOS source. For example, AH=1Bh is Get Default Drive Data and AH=1Ch is Get Drive Data in the MS-DOS Programmer's Reference , but in the code they are called SLEAZEFUNC and SLEAZEFUNCDL.
Looking over the OAK contents, it seems a shame that source code for MS-DOS and Windows isn't more widely available. In the same way that the old IBM PC and IBM AT technical references (for example, IBM, Technical Reference—Personal Computer AT , 1985) greatly promoted the development of innovative new software and hardware by publishing complete assembly-language listings of the system ROM BIOS, likewise Microsoft could promote greater understanding of DOS and Windows by making the source code for these fundamental technologies available. This isn't as ridiculous as it may sound. Consider that just a few years ago, compiler run-time library source code was kept proprietary too. Now almost all compilers come with RTL source.
Microsoft did at one point make some attempt at opening up DOS to closer inspection. The original MS-DOS (Versions 1.0-3.2) Technical Reference Encyclopedia (1986), one of the few books ever to be subject to a recall from the publisher, made an attempt to provide descriptions, not only of each DOS function's inputs and outputs, but also of its internal operation. Each function was accompanied by a flowchart titled, "How It Works." While an excellent idea, the execution was flawed. Some functions (such as INT 21h AH=48h Allocate Memory) were described in great detail, with the flowchart running for many pages, while others were described in only the vaguest terms such as "call internal function". The Microsoft encyclopedia carried the following warning:
Note: These flowcharts were written for MS-DOS Version 3.2. This in no way means that all future or past versions of MS-DOS will behave in the same manner. You should take care not to write programs that make use of the specific structure of the function routine, because this could result in lack of compatibility with other versions of DOS. Microsoft guarantees only that if you input the values in the registers in the specified way, you will get back the specified values. How the function actually accomplishes a task is subject to change.
In addition to the generally vague and misleading flowcharts for each DOS function, the Microsoft encyclopedia also carried an extremely detailed flowchart for COMMAND.COM. It is not clear whether the book was recalled because of the embarrassing errors it contained or because of any information on DOS internals that it inadvertently provided.
Conclusion: what have we learned here? DOS is foundation upon which everything else (including Windows 3.x) rests. You don't want to think about DOS internals every time you use a high-level C or C++ call to read or write a file. But without at least some basic understanding, you have only a vague notion of how your code works, and how it interacts with other software. Solution: "know and forget."
[Many thanks to Samuel Okei from Texas Tech Univ. for his skilled conversion and reformatting of what was a complex 25-year-old file with obscure and obsolete typesetting codes, into HTML.]