Strobe's Co-processor Product Series, Disk Caching, and RAMdisks
User's of Strobe Data's Osprey and Hawk Co-processors are often astounded to discover that the simple use of PC disk caching and/or defining a RAMdisk yield enormous performance dividends in their DEC* or DG minicomputer applications. Here's why! -- with some technical tips! |
32 Megabytes of Parity DRAM for $225!
The extraordinary performance improvements in legacy real-time and process control environments achieved by Strobe's Hawk and Osprey Co-processors is, in many cases, attributable to the use of PC host managed disk caching and RAMdisks as natural adjuncts of the co-processor environment.
Several of Strobe's competitors in the marketplace have PDP11 compatible processor products which are at least as fast as the Osprey/DX. In one case a competitor claims a substantially faster processor product. However, in comparative overall system performance benchmarks run by our customers, the Osprey/DX often outperforms even the fastest of these -- sometimes substantially.
In one documented instance an Osprey/DX, with only TWICE the basic CPU performance of a PDP11/93, executed a customer's real-time RSX application benchmark over FOUR times faster than the PDP11/93 it was replacing. Overall, this customer saw performance improvements of 1.728 over the PDP11/93, and 2.007 over the PDP11/73, as averaged over six separate real-time benchmark tests. Amazingly, most of these throughput tests included overhead associated with the actual mechanical movement of the controlled machine!
In the case of legacy commercial applications moved to the Strobe Co-processor, not only are extraordinary performance gains effortlessly achieved by using host PC-managed RAMdisks and disk caching, but the door is opened to RAID boxes and more reliable, faster backup systems.
These commercial multi-user database applications are particularly disk intensive, and for them performance improvements in the range of 4x to 10x have been reported. For one customer, a twelve hour overnight batch process was reduced to less than three hours, and in another case peak time on-line database inquiries were reduced from more then two minutes to less than 10 seconds.
But even in instances where the application is NOT obviously disk intensive, our customers are surprised to see throughput gains substantially greater than would be expected from the raw processor performance increase. Although these customers quite correctly believe that their particular application does not make use of disk overlays and are not disk intensive, hidden disk accesses which can only be understood in the historical context of the operating system environment which supports their application actually contribute substantial overhead to their performance. It is this overhead which in large part disappears with disk caching and RAMdisk.
I have taken the time to present some background on this subject because more and more of late I have noticed that new or potential co-processor customers, while very much aware of their own applications code in this regard, are rarely if ever aware of the historical context which has so much hidden effect on their systems' performance characteristics.
I cannot say at this time that Strobe has ever encountered a disk-based co-processor environment which did not benefit in some manner from disk caching or the use of RAMdisks, or both.
Why is this the case?
Historical background
To fully understand the answer to that question we must go back to the early seventies, all the way back to the beginnings of the mini-computer era. Data General, Hewlett-Packard, and Digital Equipment Corporation manufactured minicomputers using very expensive ferrite "core" memory planes. These 16 bit machines had a base memory addressing capability of only 32 K/words (64 K/bytes), forcing the disk operating systems, languages, applications designers and programmers to make do with what are, by today's standards, meager memory resources.
In the early spring of 1972, I delivered and installed a multi-user time-sharing Basic system to the Naval Arctic Research Laboratory in Barrow Alaska to support a government funded scientific research project. The hardware consisted of a Data General Nova minicomputer with 32 K/words of core memory, dual 2.5 M/byte Diablo moving head disk drives, a multi-channel serial line interface, and more ASR33 teletypes than I like to remember having had to maintain.
Even today, almost 25 years later, I can vividly remember having to carefully "tune" the disk operating system and the Basic Language Interpreter, and then re-tune them again and again, in order to reach an acceptable balance between overall system performance and memory space left available for user programs written in Basic. In the end, I think I was able to achieve 8-10 K/bytes of program space!
In those days, the disk operating system generation ("sysgen") process allowed the system manager to chose which of many modules not integral to the base operation of the disk operating system were to be "sysgenned" as disk overlays, i.e., disk resident in a contiguous file, to be loaded into machine memory on an "as needed" basis.
Data General's early implementation of the Basic Language, operating under the RDOS disk operating system, used this disk overlay technique extensively. The system manager's "tuning" process required "sysgenning" the Basic Interpreter to balance available program memory versus system performance to optimize best performance.
This memory crunch was eased somewhat when both Data General and Digital Equipment Corporation developed memory mapping hardware as a way of expanding the memory addressing capability. While adding core memory was still quite an expensive proposition, this development did make the system manager's sysgen job a little easier. Provided the funds for additional memory were available, the system manager could now strike an easier balance between system performance and available user memory space.
Under memory mapping the operating system now had a complete 32 K/word memory segment allocated for itself with additional memory allocated to the applications as user program space. Of course by this time operating system disk overlays had grown to 250 K/bytes and more, and almost all of the applications languages, compilers (Fortran, Algol) and interpreters (Basic, Cobol), required substantial disk overlays for their operations.
Fortran programmers now had the luxury of disk overlays in compiled run-time applications, and of course many users took advantage of the capability. Even Basic Language programmers could "chain" from one program segment to another, loading each program segment from disk into memory only when it was needed.
It was during this era that head-per-track disk drives came into their own in the market. These drives, with access delays a function only of rotational latency, often yielded the best price/performance ratios for the disk operating system and language implementations prevalent at the time.
By the time reliable, less expensive minicomputer semiconductor memory became available, the underlying designs had been set for most 16 bit disk operating systems. By the late eighties, the disk operating system "seeds" planted in the early seventies had grown, with few "mutations" within the "kernel", into the robust and reliable operating systems of today -- RDOS, RSX, RSTS, RT-11, UNIX, AOS, etc.
Let's consider Digital's RSX disk operating system, initially authored by none other than MicroSoft's Dave Cutler, of Windows NT composition fame, also the author of Digital's VMS. RSX may well be the most prolifically used and well known of the minicomputer disk operating systems of the eighties, so it is a useful model for our purposes.
Keep in mind that the base addressing capability of the 16 bit machines was only 64 K/bytes. Address space "context" switching using hardware memory mapping involved substantial overhead. In order to reduce the number of times one incurred the context switching burden, operating system designers constrained each individual program or application within one single 64 K/byte memory "ground".
Even when inexpensive memory became readily available for these 16 bit machines, it was still advisable from a performance standpoint to allocate a 64 K/byte "ground" for each "level" -- e.g., 64K for the operating system and a 64K memory ground for each user application. Depending on the total amount of memory available, the system manager could chose whether or not user applications were "locked" into a given "ground", or whether multiple user applications were to be served on a time slice basis by swapping user applications between memory and disk.
Often two separate memory grounds were "flip-flopped", by executing a user program in one ground while in the opposite ground the "previous" user program was being swapped to the disk in preparation for the "next" user program to be loaded into that ground from the disk.
Under Data General's RDOS, this "ground swapping" as a way of supporting multi-user environments was further enhanced by a contiguous "swapping file" containing each user ground "image" in a specific location. I believe even today MicroSoft Windows and Windows NT create and use disk resident swapping files in much the same way and for the very same reasons.
Both Digital's RSX and Data General's RDOS can load disk operating system overlays into "extra" memory space, defining "virtual" disk overlays. These virtual overlays are "re-mapped", relocating the contents into the disk operating system ground on an as-needed basis.
Digital's RSX can also define and manage unused "extra" main memory as disk caching. It also has the ability to "shadow" a disk drive by maintaining the contents of the primary operating system disk drive on another completely separate drive.
Consequences
With this historical perspective in front of us, we can see where the customer benchmark timing improvements quoted at the beginning of this article came from. The system included an Osprey/DX in a Pentium 133 system with 64 M/bytes of parity DRAM. 50 M/bytes of this memory were allocated as an RSX MSCP RAMdisk drive with initial startup contents automatically copied, prior to booting RSX, from an RSX MSCP 50 M/byte DOS "container" file. This same RSX "container" file, once RSX is up and running on the Osprey, becomes the RSX "shadow" disk drive, in order to preserve a non-volatile disk environment should power fail and the RAMdisk contents be lost. The configuration reduced the physical disk accesses to less than half (shadow disk accesses limited to write only), replacing disk overhead with memory access/move times.
Another of Strobe's customers uses the Hawk Co-processor in a 64 user airline ticket booking and reservation system. 24 M/bytes of the PC/host's 32 M/bytes is set aside as RAMdisk into which is loaded the user's disk operating system, its overlays, the application language and its overlays, the application programs, and several key index files consisting of indices into the massive database files in three 500 M/byte DOS disk "container" files. The remaining 7 M/bytes of memory is managed by the PC host as an LRU disk sector cache against the customer database in those container files.
This customer replaced a Data General MV15000 processor with more than twice the CPU performance of the Hawk. Yet the Hawk/PC achieved much greater performance than had the MV15000.
That this customer's application is very disk intensive was obvious to everyone. But if your application is based on any one of the many disk operating systems which grew from the early era of 16-bit minicomputers, it too may be disk intensive, even though it may not seem so on the surface!
The Advantages of Host-based Caching
Many of Strobe's customers have applications on operating systems already supporting disk caching and/or virtual system or application overlays. These customers have discovered for themselves that it is best to eliminate these now unduly burdensome tasks on the Osprey or Hawk processor and let the PC host take over the job when moving to Strobe co-processor environments.
A 133 Mhz Pentium host will be almost 25 times faster at managing a disk cache than even the fastest Osprey, the Osprey/DX, under RSX. Furthermore, the amount of host memory allocated to disk caching (say 2 to 8 M/bytes) could clearly be more generous than in the case of an RSX operating system environment.
PC host disk caching and RAMdisks in the Strobe co-processor environments can yield even greater improvements than the use of "virtual" disk overlays managed by the native code executing on the co-processor. Most minicomputer operating systems that we have encountered move blocks of words from virtual memory space to operating system space or application memory space, rather than re-map the virtual overlay into operating system space. Obviously rather heavy use of the memory map instructions must occur in order for a block move to be completed.
In the Strobe environment the user's native code executing on the co-processor simply initiates a disk sector read in order to load a disk resident overlay. With a host implemented RAMdisk the host is able to instantaneously transfer the requested overlay into the appropriate co-processor memory location.
The fact that "legacy" multi-tasking schedulers have undergone many years of "tuning" will also subtly influence performance. Once the co-processor instruction execution engine is freed from the overhead of managing a disk cache, moving or re-mapping cached disk blocks or virtual disk overlays, these schedulers are sophisticated enough to allocate the freed up processor time to the performance of other queued tasks! A double win!
And Now a Word About Disk Caching
Strobe has been shipping ISA bus co-processor products for almost ten years now. In that time our customers and ourselves have tried most of the types and brands of disk caching software on the market.
Back in 1987 we favored a product called SpeedCache. SpeedCache had a memory limit of 2 M/bytes and finally fell into disfavor because the manufacturer did not move quickly into the XMS memory module for implementation.
Today there are several good software caching products, our favorite being the one in Norton Utilities. The only product which we do NOT recommend is Smartdrive by MicroSoft. I doubt if there is really any serious problems with this product itself, but in Strobe's co-processor environment it seems to have a detrimental effect, in some cases a severe one. Strobe implemented its own form of multi-tasking kernel in the MS-DOS co-processor environment, and we suspect that maybe this kernel is somehow interfering with Smartdrive, causing it to purge its cache after each disk call. If that is not the problem, then perhaps our method of "clustering", i.e., building a lookup table of disk addresses for our container file image, is somehow interfering with Smartdrive.
In any case, in the Strobe co-processor environment, Smartdrive incurs the time overhead of managing the disk cache but then purges the cache, never using the disk cache contents at all, and for that reason we recommend against it.
Strobe's experience indicates that most of the disk caching programs seem to break down when more than about 2 to 4 M/bytes of memory is allocated for use. We have found that very little gain can be had from allocating much more than 4 M/bytes of memory. In one case this proved to be because the base cacheable disk sector "set" was not expanded when more memory was allocated. Instead, each "cell" was expanded to include more contiguous disk sectors -- 2 or 4 or even 8, as additional memory was allocated.
RAMdisks seem to yield additional performance when included with disk caching in our environment. In part, this is because most disk caching programs seem to use some form of LRU algorithm, the least recently used disk sectors being purged from the cache in favor of the most recent disk sector requests. In point of fact some of the disk caching programs seem to detect when a disk backup or massive file copy is occurring and "get out of the way" to avoid operating the LRU disk cache algorithm to the detriment of the backup or copy task. This is one reason why Strobe's programmers prefer Norton Utilities N-cache algorithm as one with greater universal utility.
In the case of Strobe's co-processor series, if the customer knows of disk resident files which will be often requested or required, such as disk operating system overlays, then it is wise to make these files resident in a RAMdisk, locking them into memory, rather than rely on a disk cache which might often (perhaps always, depending on the specific application), purge them in favor of later disk sector requests.
-- Willard West
President
Strobe Data Inc
*Compaq and/or Digital Equipment Corporation claim 'DEC' as a mark. Strobe Data is a separate company from Compaq and Digital Equipment Corporation.
|
|