Sunday, February 21, 2010

Novell NetWare

NetWare is a network operating system developed by Novell, Inc. It initially used cooperative multitasking to run various services on a personal computer, and the network protocols were based on the archetypal Xerox Network Systems stack.
NetWare has been superseded by Open Enterprise Server (OES). The latest version of NetWare is v6.5 Support Pack 8, which is identical to OES 2 SP1, NetWare Kernel.
History

NetWare evolved from a very simple concept: file sharing instead of disk sharing. In 1983 when the first versions of NetWare were designed, all other competing products were based on the concept of providing shared direct disk access. Novell's alternative approach was validated by IBM in 1984 and helped promote their product.
With Novell NetWare, disk space was shared in the form of NetWare volumes, comparable to DOS volumes. Clients running MS-DOS would run a special terminate and stay resident (TSR) program that allowed them to map a local drive letter to a NetWare volume. Clients had to log in to a server in order to be allowed to map volumes, and access could be restricted according to the login name. Similarly, they could connect to shared printers on the dedicated server, and print as if the printer was connected locally.
At the end of the 1990s, with Internet connectivity booming, the Internet's TCP/IP protocol became dominant on LANs. Novell had introduced limited TCP/IP support in NetWare v3.x (circa 1992) and v4.x (circa 1995), consisting mainly of FTP services and UNIX-style LPR/LPD printing (available in NetWare v3.x), and a Novell-developed webserver (in NetWare v4.x). Native TCP/IP support for the client file and print services normally associated with NetWare was introduced in NetWare v5.0 (released in 1998).
During the early-to-mid 1980s Microsoft introduced their own LAN system in LAN Manager based on the competing NBF protocol. Early attempts to muscle in on NetWare were not successful, but this changed with the inclusion of improved networking support in Windows for Workgroups, and then the hugely successful Windows NT and Windows 95. NT, in particular, offered services similar to those offered by NetWare, but on a system that could also be used on a desktop, and connected directly to other Windows desktops where NBF was now almost universal.
The rise of NetWare
The popular use and growth of Novell NetWare began in 1985 with the simultaneous release of NetWare 286 2.0a and the Intel 80286 16-bit processor. The 80286 CPU featured a new 16-bit protected mode that provided access to up to 16 MB RAM as well as new mechanisms to aid multi-tasking. Prior to the 80286 CPU servers were based on the Intel 8086/8088 8/16-bit processors, which were limited to an address space of 1MB with not more than 640 KB of directly addressable RAM.
The combination of a higher 16 MB RAM limit, 80286 processor feature utilization, and 256 MB NetWare volume size limit allowed reliable, cost-effective server-based local area networks to be built for the first time. The 16 MB RAM limit was especially important, since it made enough RAM available for disk caching to significantly improve performance. This became the key to Novell's performance while also allowing larger networks to be built.
Another significant difference of NetWare 286 was that it was hardware-independent, unlike competing server systems from 3Com. Novell servers could be assembled using any brand system with an Intel 80286 or higher CPU, any MFM, RLL, ESDI, or SCSI hard drive and any 8- or 16-bit network adapter for which Netware drivers were available.
Novell also designed a compact and simple DOS client software program that allowed DOS stations to connect to a server and access the shared server hard drive. While the NetWare server file system introduced a new, proprietary file system design, it looked like a standard DOS volume to the workstation, ensuring compatibility with all existing DOS programs.
Early years
NetWare was based on the consulting work by SuperSet Software, a group founded by the friends Drew Major, Dale Neibaur, Kyle Powell and later Mark Hurst. This work was based on their classwork at Brigham Young University in Provo, Utah, starting in October 1981.
In 1983, Raymond Noorda engaged the work by the SuperSet team. The team was originally assigned to create a CP/M disk sharing system to help network the CP/M hardware that Novell was selling at the time. The team was privately convinced that CP/M was a doomed platform and instead came up with a successful file sharing system for the newly introduced IBM-compatible PC. They also wrote an application called Snipes, a text-mode game and used it to test the new network and demonstrate its capabilities. Snipes was the first network application ever written for a commercial personal computer, and it is recognized as one of the precursors of many popular multiplayer games such as Doom and Quake.
This network operating system (NOS) was later called Novell NetWare. NetWare was based on the NetWare Core Protocol (NCP), which is a packet-based protocol that enables a client to send requests to and receive replies from a NetWare server. Initially NCP was directly tied to the IPX/SPX protocol, and NetWare communicated natively using only IPX/SPX.
The first product to bear the NetWare name was released in 1983. It was called Netware 68 (aka S-Net); it ran on the Motorola 68000 processor on a proprietary Novell-built file server and used a star network topology. This was soon joined by NetWare 86 V4.x, which was written for the Intel 8086. This was replaced in 1985 with Advanced NetWare 86 version 1.0a which allowed more than one server on the same network. In 1986, after the Intel 80286 processor became available, Novell released Advanced NetWare 286 V1.0a and subsequently V2.0B (that used IPX routing to allow up to 4 network cards in a server). In 1989, with the Intel 80386 available, Novell released NetWare 386. Later Novell consolidated the numbering of their NetWare releases, with NetWare 386 becoming NetWare 3.x.
NetWare 286 2.x
NetWare version 2 was notoriously difficult to configure, since the operating system was provided as a set of compiled object modules that required configuration and linking. Compounding this inconvenience was that the process was designed to run from multiple diskettes, which was slow and unreliable. Any change to the operating system required a re-linking of the kernel and a reboot of the system, requiring at least 20 diskette swaps. An additional complication in early versions was that the installation contained a proprietary low-level format program for MFM hard drives, which was run automatically before the software could be loaded, called COMPSURF.
NetWare was administered using text-based utilities such as SYSCON. The file system used by NetWare 2 was NetWare File System 286, or NWFS 286, supporting volumes of up to 256 MB. NetWare 286 recognized 80286 protected mode, extending NetWare's support of RAM from 1 MB to the full 16 MB addressable by the 80286. A minimum of 2 MB was required to start up the operating system; any additional RAM was used for FAT, DET and file caching. Since 16-bit protected mode was implemented the i80286 and every subsequent Intel x86 processor, NetWare 286 version 2.x would run on any 80286 or later compatible processor.
NetWare 2 implemented a number of features inspired by mainframe and minicomputer systems that were not available in other operating systems of the day. The System Fault Tolerance (SFT) features included standard read-after-write verification (SFT-I) with on-the-fly bad block re-mapping (at the time, disks did not have that feature built in) and software RAID1 (disk mirroring, SFT-II). The Transaction Tracking System (TTS) optionally protected files against incomplete updates. For single files, this required only a file attribute to be set. Transactions over multiple files and controlled roll-backs were possible by programming to the TTS API.
NetWare 286 2.x supported two modes of operation: dedicated and non-dedicated. In dedicated mode, the server used DOS only as a boot loader to execute the operating system file net$os.exe. All memory was allocated to NetWare; no DOS ran on the server. For non-dedicated operation, DOS 3.3 or higher would remain in memory, and the processor would time-slice between the DOS and NetWare programs, allowing the server computer to be used simultaneously as network file server and as a user workstation. All extended memory (RAM above 1 MB) was allocated to NetWare, so DOS was limited to only 640kB; an expanded memory manager would not work because NetWare 286 had control of 80286 protected mode and the upper RAM, both of which were required for DOS to use expanded memory. Time slicing was accomplished using the keyboard interrupt. This feature required strict compliance with the IBM PC design model, otherwise performance was affected. Non-dedicated NetWare was popular on small networks, although it was more susceptible to lockups due to DOS program problems. In some implementations, users would experience significant network slowdown when someone was using the console as a workstation. NetWare 386 3.x and later supported only dedicated operation.
Server licensing on early versions of NetWare 286 was accomplished by using a key card. The key card was designed for an 8-bit ISA bus, and had a serial number encoded on a ROM chip. The serial number had to match the serial number of the NetWare software running on the server. To broaden the hardware base, particularly to machines using the IBM MCA bus, later versions of NetWare 2.x did not require the key card; serialised license floppy disks were used in place of the key cards.
NetWare 3.x
Starting with NetWare 3.x, support for 32-bit protected mode was added, eliminating the 16 mb memory limit of NetWare 286. This allowed larger hard drives to be supported, since NetWare 3.x cached (copied) the entire file allocation table (FAT) and directory entry table (DET) into memory for improved performance.
By accident or design, the initial releases of the client TSR programs modified the high 16 bits of the 32-bit 80386 registers, making them unusable by any other program until this was fixed. The problem was noticed by Phil Katz who added a switch to his PKZIP suite of programs to enable 32-bit register use only when the Netware TSRs were not present.
NetWare version 3 eased development and administration by modularization. Each functionality was controlled by a software module called a NetWare Loadable Module (NLM) loaded either at startup or when it was needed. It was then possible to add functionality such as anti-virus software, backup software, database and web servers, long name support (standard filenames were limited to 8 characters plus a three letter extension, matching MS-DOS) or Macintosh style files.
NetWare continued to be administered using console-based utilities. The file system introduced by NetWare 3.x and used by default until NetWare 5.x was NetWare File System 386, or NWFS 386, which significantly extended volume capacity (1 TB, 4 GB files) and could handle up to 16 volume segments spanning multiple physical disk drives. Volume segments could be added while the server was in use and the volume was mounted, allowing a server to be expanded without interruption.
Initially, NetWare used Bindery services for authentication. This was a stand-alone database system where all user access and security data resided individually on each server. When an infrastructure contained more than one server, users had to log-in to each of them individually, and each server had to be configured with the list of all allowed users.
"NetWare Name Services" was a product that allowed user data to be extended across multiple servers, and the Windows "Domain" concept is functionally equivalent to NetWare v3.x Bindery services with NetWare Name Services added on (e.g. a 2-dimensional database, with a flat namespace and a static schema).
For a while, Novell also marketed an OEM version of NetWare 3, called Portable NetWare, together with OEMs such as Hewlett-Packard, DEC and Data General, who ported Novell source code to run on top of their Unix operating systems. Portable NetWare did not sell well.
While Netware 3.x was current, Novell introduced its first high-availability clustering system, named NetWare SFT-III, which allowed a logical server to be completely mirrored to a separate physical machine. Implemented as a shared-nothing cluster, under SFT-III the OS was logically split into an interrupt-driven I/O engine and the event-driven OS core. The I/O engines serialized their interrupts (disk, network etc.) into a combined event stream that was fed to two identical copies of the system engine through a fast (typically 100 Mbit/s) inter-server link. Because of its non-preemptive nature, the OS core, stripped of non-deterministic I/O, behaves deterministically, like a large finite state machine.
The outputs of the two system engines were compared to ensure proper operation, and two copies fed back to the I/O engines. Using the existing SFT-II software RAID functionality present in the core, disks could be mirrored between the two machines without special hardware. The two machines could be separated as far as the server-to-server link would permit. In case of a server or disk failure, the surviving server could take over client sessions transparently after a short pause since it had full state information and did not, for example, have to re-mount the volumes - a process at which NetWare was notoriously slow. SFT-III was the first NetWare version able to make use of SMP hardware - the I/O engine could optionally be run on its own CPU. The modern incarnation of NetWare's clustering, Novell Cluster Services (introduced in NetWare v5.0), is very different from SFT-III. NetWare SFT-III, ahead of its time in several ways, was a mixed success.
NetWare 386 3.x was designed to run all applications on the server at the same level of processor memory protection, known as "ring 0". While this provided the best possible performance, it sacrificed reliability. The result was that crashing (known as abends, short for abnormal ends) were possible and would result in stopping the system. Starting with NetWare 5.x, software modules (NetWare Loadable Modules or NLM's) could be assigned to run in different processor protection rings, ensuring that a software error would not crash the system
Strategic mistakes
Novell's strategy with NetWare 286 2.x and 3.x was very successful; before the arrival of Windows NT Server, Novell claimed 90% of the market for PC based servers.
While the design of NetWare 3.x and later involved a DOS partition to load NetWare server files, this feature became a liability as new users preferred the Windows graphical interface to learning DOS commands necessary to build and control a NetWare server. Novell could have eliminated this technical liability by retaining the design of NetWare 286, which installed the server file into a Novell partition and allowed the server to boot from the Novell partition without creating a bootable DOS partition. Novell finally added support for this in a Support Pack for NetWare 6.5.
As Novell used IPX/SPX instead of TCP/IP, they were poorly positioned to take advantage of the Internet in 1995. This resulted in Novell servers being bypassed for routing and Internet access, in favor of hardware routers, Unix-based operating systems such as FreeBSD, and SOCKS and HTTP Proxy Servers on Windows and other operating systems.
NetWare 4.1x and NetWare for Small Business: Novell begins to recover
Novell priced NetWare 4.10 similarly to NetWare 3.12, allowing customers who resisted NDS (typically small businesses) to try it at no cost.
Later Novell released NetWare version 4.11 in 1996 which included many enhancements that made the operating system easier to install, easier to operate, faster, and more stable. It also included the first full 32-bit client for Microsoft Windows-based workstations, SMP support and the NetWare Administrator (NWADMIN or NWADMN32), a GUI-based administration tool for NetWare. Previous administration tools used the Cworthy interface, the character-based GUI tools such as SYSCON and PCONSOLE with blue text-based background. Some of these tools survive to this day, for instance MONITOR.NLM.
Novell packaged NetWare 4.11 with its Web server, TCP/IP support and Netscape browser into a bundle dubbed IntranetWare (also written as intraNetWare). A version designed for networks of 25 or fewer users was named IntranetWare for Small Business and contained a limited version of NDS and tried to simplify NDS administration. The intranetWare name was dropped in NetWare 5.
During this time Novell also began to leverage its directory service, NDS, by tying their other products into the directory. Their e-mail system, GroupWise, was integrated with NDS, and Novell released many other directory-enabled products such as ZENworks and BorderManager.
NetWare still required IPX/SPX as NCP used it, but Novell started to acknowledge the demand for TCP/IP with NetWare 4.11 by including tools and utilities that made it easier to create intranets and link networks to the Internet. Novell bundled tools, such as the IPX/IP gateway, to ease the connection between IPX workstations and IP networks. It also began integrating Internet technologies and support through features such as a natively hosted web server.
NetWare 5.x
With the release of NetWare 5 in October 1998, Novell finally acknowledged the prominence of the Internet by switching its primary NCP interface from the IPX/SPX network protocol to TCP/IP. IPX/SPX was still supported, but the emphasis shifted to TCP/IP. Novell also added a GUI to NetWare. Other new features were:
• Novell Storage Services (NSS), a new file system to replace the traditional NetWare File System - which was still supported
• Java virtual machine for NetWare
• Novell Distributed Print Services (NDPS)
• ConsoleOne, a new Java-based GUI administration console
• directory-enabled Public key infrastructure services (PKIS)
• directory-enabled DNS and DHCP servers
• support for Storage Area Networks (SANs)
• Novell Cluster Services (NCS)
• Oracle 8i with a 5-user license
The Cluster Services were a major advance over SFT-III, as NCS does not require specialized hardware or identical server configurations.
NetWare 5 was released during a time when NetWare market share dropped precipitously; many companies and organizations were replacing their NetWare servers with servers running Microsoft's Windows NT operating system. Novell also released their last upgrade to the NetWare 4 operating system, NetWare 4.2.
NetWare 5.1 was released in January 2000, shortly after its predecessor. It introduced a number of useful tools, such as:
• IBM WebSphere Application Server
• NetWare Management Portal (later renamed Novell Remote Manager), web-based management of the operating system
• FTP, NNTP and streaming media servers
• NetWare Web Search Server
• WebDAV support
NetWare 6.0
NetWare 6 was released in October 2001. This version has a simplified licensing scheme based on users, not servers. This allows unlimited connections per user.
NetWare 6.5
NetWare 6.5 was released in August 2003. Some of the new features in this version were:
• more open-source products such as PHP, MySQL and OpenSSH
• a port of the Bash shell and a lot of traditional Unix utilities such as wget, grep, awk and sed to provide additional capabilities for scripting
• iSCSI support (both target and initiator)
• Virtual Office - an "out of the box" web portal for end users providing access to e-mail, personal file storage, company address book, etc.
• Domain controller functionality
• Universal password
• DirXML Starter Pack - synchronization of user accounts with another eDirectory tree, a Windows NT domain or Active Directory.
• exteNd Application Server - a J2EE 1.3-compatible application server
• support for customized printer driver profiles and printer usage auditing
• NX bit support
• support for USB storage devices
• support for encrypted volumes
Open Enterprise Server
1.0
In 2003, Novell announced the successor product to NetWare: Open Enterprise Server (OES). First released in March 2005, OES completes the separation of the services traditionally associated with NetWare (e.g. Directory Services, file-and-print) from the platform underlying the delivery of those services. OES is essentially a set of applications (eDirectory, NetWare Core Protocol services, iPrint, etc.) that can run atop either a Linux or a NetWare kernel platform. Clustered OES implementations can even migrate services from Linux to NetWare and back again, making Novell one of the very few vendors to offer a multi-platform clustering solution.
Consequent to Novell's acquisitions of Ximian and SuSE, a German Linux distributor, it is widely observed that Novell is moving away from NetWare and shifting its focus towards Linux. Much recent marketing seems to be focussed on getting faithful NetWare users to move to the Linux platform in future releases.The clearest indication of this direction is Novell's controversial decision to release Open Enterprise Server in Linux form only. Novell later watered down this decision and stated that NetWare's 90 million users would be supported until at least 2015. Some of Novell's more perverse NetWare supporters have taken it upon themselves to petition Novell to keep NetWare in development.
[edit] 2.0
OES 2 was released on October 8, 2007. It includes NetWare 6.5 SP7, which supports running as a paravirtualized guest inside the Xen hypervisor and new Linux based version using SLES10.
New features include
• 64bit support
• Virtualization
• Dynamic Storage Technology, which provide Shadow Volumes
• Domain services for Windows (provided in OES 2 service pack 1)
Current NetWare situation
While Novell NetWare is still used by some organizations, its ongoing decline in popularity began in the mid-1990s, when NetWare was the de facto standard for file and print software for the Intel x86 server platform. Modern (2009) NetWare and OES installations are used by larger organizations that may need the added flexibility they provide.
Microsoft successfully shifted market share away from NetWare products toward their own in the late-1990s. Microsoft's more aggressive marketing was aimed directly to management through major magazines; Novell NetWare's was through IT specialist magazines with distribution limited to select IT personnel.
Novell did not adapt their pricing structure accordingly and NetWare sales suffered at the hands of those corporate decision makers whose valuation was based on initial licensing fees. As a result organizations that still use NetWare, eDirectory, and Novell software often have a hybrid infrastructure of NetWare, Linux, and Windows servers.
Netware Lite / Personal Netware
In 1991 Novell introduced a radically different and cheaper product - Netware Lite in answer to Artisoft's similar LANtastic. Both were peer to peer systems, where no specialist server was required, but instead all PCs on the network could share their resources.
The product line became Personal Netware in 1993.
Performance
NetWare dominated the network operating system (NOS) market from the mid-80s through the mid- to late-90s due to its extremely high performance relative to other NOS technologies. Most benchmarks during this period demonstrated a 5:1 to 10:1 performance advantage over products from Microsoft, Banyan, and others. One noteworthy benchmark NetWare 3.x running NFS services over TCP/IP (not NetWare's native IPX protocol) to a dedicated Auspex NFS server and a SCO Unix server running NFS service. NetWare NFS outperformed both 'native' NFS systems and claimed a 2:1 performance advantage over SCO Unix NFS on the same hardware.
There were several reasons for NetWare's performance.
File service instead of disk service
At the time NetWare was first developed, nearly all LAN storage was based on the disk server model. This meant that if a client computer wanted to read a particular block from a particular file it would have to issue the following requests across the relatively slow LAN:
1. Read first block of directory
2. Continue reading subsequent directory blocks until the directory block containing the information on the desired file was found, could be many directory blocks
3. Read through multiple file entry blocks until the block containing the location of the desired file block was found, could be many directory blocks
4. Read the desired data block
NetWare, since it was based on a file service model, interacted with the client at the file API level:
1. Send file open request (if this hadn't already been done)
2. Send a request for the desired data from the file
All of the work of searching the directory to figure out where the desired data was physically located on the disk was performed at high speed locally on the server. By the mid-1980s, most NOS products had shifted from the disk service to the file service model. Today, the disk service model is making a comeback, see SAN.
Aggressive caching
From the start, NetWare was designed to be used on servers with copious amounts of RAM. The entire file allocation table (FAT) was read into RAM when a volume was mounted, thereby requiring a minimum amount of RAM proportional to online disk space; adding a disk to a server would often require a RAM upgrade as well. Unlike most competing network operating systems prior to Windows NT, NetWare automatically used all otherwise unused RAM for caching active files, employing delayed write-backs to facilitate re-ordering of disk requests (elevator seeks). An unexpected shutdown could therefore corrupt data, making an uninterruptible power supply practically a mandatory part of a server installation.
The default dirty cache delay time was fixed at 2.2 seconds in NetWare 286 versions 2.x. Starting with NetWare 386 3.x, the dirty disk cache delay time and dirty directory cache delay time settings controlled the amount of time the server would cache changed ("dirty") data before saving (flushing) the data to a hard drive. The default setting of 3.3 seconds could be decreased to 0.5 seconds but not reduced to zero, while the maximum delay was 10 seconds. The option to increase the cache delay to 10 seconds provided a significant performance boost. Windows 2000 and 2003 server do not allow adjustment to the cache delay time. Instead, they use an algorithm that adjusts cache delay.

Efficiency of NetWare Core Protocol (NCP)
Most network protocols in use at the time NetWare was developed didn't trust the network to deliver messages. A typical client file read would work something like this:
1. Client sends read request to server
2. Server acknowledges request
3. Client acknowledges acknowledgement
4. Server sends requested data to client
5. Client acknowledges data
6. Server acknowledges acknowledgement
In contrast, NCP was based on the idea that networks worked perfectly most of the time, so the reply to a request served as the acknowledgement. Here is an example of a client read request using this model:
1. Client sends read request to server
2. Server sends requested data to client
All requests contained a sequence number, so if the client didn't receive a response within an appropriate amount of time it would re-send the request with the same sequence number. If the server had already processed the request it would resend the cached response, if it had not yet had time to process the request it would only send a "positive acknowledgement". The bottom line to this 'trust the network' approach was a 2/3 reduction in network transactions and the associated latency.
Non-preemptive OS designed for network services
One of the raging debates of the 90s was whether it was more appropriate for network file service to be performed by a software layer running on top of a general purpose operating system, or by a special purpose operating system. NetWare was a special purpose operating system, not a timesharing OS. It was written from the ground up as a platform for client-server processing services. Initially it focused on file and print services, but later demonstrated its flexibility by running database, email, web and other services as well. It also performed efficiently as a router, supporting IPX, TCP/IP, and Appletalk, though it never offered the flexibility of a 'hardware' router.
In 4.x and earlier versions, NetWare did not support preemption, virtual memory, graphical user interfaces, etc. Processes and services running under the NetWare OS were expected to be cooperative, that is to process a request and return control to the OS in a timely fashion. On the down side, this trust of application processes to manage themselves could lead to a misbehaving application bringing down the server.
By comparison, general purpose operating systems such as Unix or Microsoft Windows were based on an interactive, time-sharing model where competing programs would consume all available resources if not held in check by the Operating System. Such environments operated by preemption, memory virtualization, etc., generating significant overhead because there were never enough resources to do everything every application desired. These systems improved over time as network services shed their “application” stigma and moved deeper into the kernel of the “general purpose” OS, but they never equaled the efficiency of NetWare.
Probably the single greatest reason for Novell's success during the 80's and 90's was the efficiency of NetWare compared to general purpose operating systems. However, as microprocessors increased in power, efficiency became less and less of an issue. With the introduction of the Pentium processor, NetWare's performance advantage began to be outweighed by the complexity of managing and developing applications for the NetWare environment.
NetWare is a network operating system developed by Novell, Inc. It initially used cooperative multitasking to run various services on a personal computer, and the network protocols were based on the archetypal Xerox Network Systems stack.
NetWare has been superseded by Open Enterprise Server (OES). The latest version of NetWare is v6.5 Support Pack 8, which is identical to OES 2 SP1, NetWare Kernel.
History

NetWare evolved from a very simple concept: file sharing instead of disk sharing. In 1983 when the first versions of NetWare were designed, all other competing products were based on the concept of providing shared direct disk access. Novell's alternative approach was validated by IBM in 1984 and helped promote their product.
With Novell NetWare, disk space was shared in the form of NetWare volumes, comparable to DOS volumes. Clients running MS-DOS would run a special terminate and stay resident (TSR) program that allowed them to map a local drive letter to a NetWare volume. Clients had to log in to a server in order to be allowed to map volumes, and access could be restricted according to the login name. Similarly, they could connect to shared printers on the dedicated server, and print as if the printer was connected locally.
At the end of the 1990s, with Internet connectivity booming, the Internet's TCP/IP protocol became dominant on LANs. Novell had introduced limited TCP/IP support in NetWare v3.x (circa 1992) and v4.x (circa 1995), consisting mainly of FTP services and UNIX-style LPR/LPD printing (available in NetWare v3.x), and a Novell-developed webserver (in NetWare v4.x). Native TCP/IP support for the client file and print services normally associated with NetWare was introduced in NetWare v5.0 (released in 1998).
During the early-to-mid 1980s Microsoft introduced their own LAN system in LAN Manager based on the competing NBF protocol. Early attempts to muscle in on NetWare were not successful, but this changed with the inclusion of improved networking support in Windows for Workgroups, and then the hugely successful Windows NT and Windows 95. NT, in particular, offered services similar to those offered by NetWare, but on a system that could also be used on a desktop, and connected directly to other Windows desktops where NBF was now almost universal.
The rise of NetWare
The popular use and growth of Novell NetWare began in 1985 with the simultaneous release of NetWare 286 2.0a and the Intel 80286 16-bit processor. The 80286 CPU featured a new 16-bit protected mode that provided access to up to 16 MB RAM as well as new mechanisms to aid multi-tasking. Prior to the 80286 CPU servers were based on the Intel 8086/8088 8/16-bit processors, which were limited to an address space of 1MB with not more than 640 KB of directly addressable RAM.
The combination of a higher 16 MB RAM limit, 80286 processor feature utilization, and 256 MB NetWare volume size limit allowed reliable, cost-effective server-based local area networks to be built for the first time. The 16 MB RAM limit was especially important, since it made enough RAM available for disk caching to significantly improve performance. This became the key to Novell's performance while also allowing larger networks to be built.
Another significant difference of NetWare 286 was that it was hardware-independent, unlike competing server systems from 3Com. Novell servers could be assembled using any brand system with an Intel 80286 or higher CPU, any MFM, RLL, ESDI, or SCSI hard drive and any 8- or 16-bit network adapter for which Netware drivers were available.
Novell also designed a compact and simple DOS client software program that allowed DOS stations to connect to a server and access the shared server hard drive. While the NetWare server file system introduced a new, proprietary file system design, it looked like a standard DOS volume to the workstation, ensuring compatibility with all existing DOS programs.
Early years
NetWare was based on the consulting work by SuperSet Software, a group founded by the friends Drew Major, Dale Neibaur, Kyle Powell and later Mark Hurst. This work was based on their classwork at Brigham Young University in Provo, Utah, starting in October 1981.
In 1983, Raymond Noorda engaged the work by the SuperSet team. The team was originally assigned to create a CP/M disk sharing system to help network the CP/M hardware that Novell was selling at the time. The team was privately convinced that CP/M was a doomed platform and instead came up with a successful file sharing system for the newly introduced IBM-compatible PC. They also wrote an application called Snipes, a text-mode game and used it to test the new network and demonstrate its capabilities. Snipes was the first network application ever written for a commercial personal computer, and it is recognized as one of the precursors of many popular multiplayer games such as Doom and Quake.
This network operating system (NOS) was later called Novell NetWare. NetWare was based on the NetWare Core Protocol (NCP), which is a packet-based protocol that enables a client to send requests to and receive replies from a NetWare server. Initially NCP was directly tied to the IPX/SPX protocol, and NetWare communicated natively using only IPX/SPX.
The first product to bear the NetWare name was released in 1983. It was called Netware 68 (aka S-Net); it ran on the Motorola 68000 processor on a proprietary Novell-built file server and used a star network topology. This was soon joined by NetWare 86 V4.x, which was written for the Intel 8086. This was replaced in 1985 with Advanced NetWare 86 version 1.0a which allowed more than one server on the same network. In 1986, after the Intel 80286 processor became available, Novell released Advanced NetWare 286 V1.0a and subsequently V2.0B (that used IPX routing to allow up to 4 network cards in a server). In 1989, with the Intel 80386 available, Novell released NetWare 386. Later Novell consolidated the numbering of their NetWare releases, with NetWare 386 becoming NetWare 3.x.
NetWare 286 2.x
NetWare version 2 was notoriously difficult to configure, since the operating system was provided as a set of compiled object modules that required configuration and linking. Compounding this inconvenience was that the process was designed to run from multiple diskettes, which was slow and unreliable. Any change to the operating system required a re-linking of the kernel and a reboot of the system, requiring at least 20 diskette swaps. An additional complication in early versions was that the installation contained a proprietary low-level format program for MFM hard drives, which was run automatically before the software could be loaded, called COMPSURF.
NetWare was administered using text-based utilities such as SYSCON. The file system used by NetWare 2 was NetWare File System 286, or NWFS 286, supporting volumes of up to 256 MB. NetWare 286 recognized 80286 protected mode, extending NetWare's support of RAM from 1 MB to the full 16 MB addressable by the 80286. A minimum of 2 MB was required to start up the operating system; any additional RAM was used for FAT, DET and file caching. Since 16-bit protected mode was implemented the i80286 and every subsequent Intel x86 processor, NetWare 286 version 2.x would run on any 80286 or later compatible processor.
NetWare 2 implemented a number of features inspired by mainframe and minicomputer systems that were not available in other operating systems of the day. The System Fault Tolerance (SFT) features included standard read-after-write verification (SFT-I) with on-the-fly bad block re-mapping (at the time, disks did not have that feature built in) and software RAID1 (disk mirroring, SFT-II). The Transaction Tracking System (TTS) optionally protected files against incomplete updates. For single files, this required only a file attribute to be set. Transactions over multiple files and controlled roll-backs were possible by programming to the TTS API.
NetWare 286 2.x supported two modes of operation: dedicated and non-dedicated. In dedicated mode, the server used DOS only as a boot loader to execute the operating system file net$os.exe. All memory was allocated to NetWare; no DOS ran on the server. For non-dedicated operation, DOS 3.3 or higher would remain in memory, and the processor would time-slice between the DOS and NetWare programs, allowing the server computer to be used simultaneously as network file server and as a user workstation. All extended memory (RAM above 1 MB) was allocated to NetWare, so DOS was limited to only 640kB; an expanded memory manager would not work because NetWare 286 had control of 80286 protected mode and the upper RAM, both of which were required for DOS to use expanded memory. Time slicing was accomplished using the keyboard interrupt. This feature required strict compliance with the IBM PC design model, otherwise performance was affected. Non-dedicated NetWare was popular on small networks, although it was more susceptible to lockups due to DOS program problems. In some implementations, users would experience significant network slowdown when someone was using the console as a workstation. NetWare 386 3.x and later supported only dedicated operation.
Server licensing on early versions of NetWare 286 was accomplished by using a key card. The key card was designed for an 8-bit ISA bus, and had a serial number encoded on a ROM chip. The serial number had to match the serial number of the NetWare software running on the server. To broaden the hardware base, particularly to machines using the IBM MCA bus, later versions of NetWare 2.x did not require the key card; serialised license floppy disks were used in place of the key cards.
NetWare 3.x
Starting with NetWare 3.x, support for 32-bit protected mode was added, eliminating the 16 mb memory limit of NetWare 286. This allowed larger hard drives to be supported, since NetWare 3.x cached (copied) the entire file allocation table (FAT) and directory entry table (DET) into memory for improved performance.
By accident or design, the initial releases of the client TSR programs modified the high 16 bits of the 32-bit 80386 registers, making them unusable by any other program until this was fixed. The problem was noticed by Phil Katz who added a switch to his PKZIP suite of programs to enable 32-bit register use only when the Netware TSRs were not present.
NetWare version 3 eased development and administration by modularization. Each functionality was controlled by a software module called a NetWare Loadable Module (NLM) loaded either at startup or when it was needed. It was then possible to add functionality such as anti-virus software, backup software, database and web servers, long name support (standard filenames were limited to 8 characters plus a three letter extension, matching MS-DOS) or Macintosh style files.
NetWare continued to be administered using console-based utilities. The file system introduced by NetWare 3.x and used by default until NetWare 5.x was NetWare File System 386, or NWFS 386, which significantly extended volume capacity (1 TB, 4 GB files) and could handle up to 16 volume segments spanning multiple physical disk drives. Volume segments could be added while the server was in use and the volume was mounted, allowing a server to be expanded without interruption.
Initially, NetWare used Bindery services for authentication. This was a stand-alone database system where all user access and security data resided individually on each server. When an infrastructure contained more than one server, users had to log-in to each of them individually, and each server had to be configured with the list of all allowed users.
"NetWare Name Services" was a product that allowed user data to be extended across multiple servers, and the Windows "Domain" concept is functionally equivalent to NetWare v3.x Bindery services with NetWare Name Services added on (e.g. a 2-dimensional database, with a flat namespace and a static schema).
For a while, Novell also marketed an OEM version of NetWare 3, called Portable NetWare, together with OEMs such as Hewlett-Packard, DEC and Data General, who ported Novell source code to run on top of their Unix operating systems. Portable NetWare did not sell well.
While Netware 3.x was current, Novell introduced its first high-availability clustering system, named NetWare SFT-III, which allowed a logical server to be completely mirrored to a separate physical machine. Implemented as a shared-nothing cluster, under SFT-III the OS was logically split into an interrupt-driven I/O engine and the event-driven OS core. The I/O engines serialized their interrupts (disk, network etc.) into a combined event stream that was fed to two identical copies of the system engine through a fast (typically 100 Mbit/s) inter-server link. Because of its non-preemptive nature, the OS core, stripped of non-deterministic I/O, behaves deterministically, like a large finite state machine.
The outputs of the two system engines were compared to ensure proper operation, and two copies fed back to the I/O engines. Using the existing SFT-II software RAID functionality present in the core, disks could be mirrored between the two machines without special hardware. The two machines could be separated as far as the server-to-server link would permit. In case of a server or disk failure, the surviving server could take over client sessions transparently after a short pause since it had full state information and did not, for example, have to re-mount the volumes - a process at which NetWare was notoriously slow. SFT-III was the first NetWare version able to make use of SMP hardware - the I/O engine could optionally be run on its own CPU. The modern incarnation of NetWare's clustering, Novell Cluster Services (introduced in NetWare v5.0), is very different from SFT-III. NetWare SFT-III, ahead of its time in several ways, was a mixed success.
NetWare 386 3.x was designed to run all applications on the server at the same level of processor memory protection, known as "ring 0". While this provided the best possible performance, it sacrificed reliability. The result was that crashing (known as abends, short for abnormal ends) were possible and would result in stopping the system. Starting with NetWare 5.x, software modules (NetWare Loadable Modules or NLM's) could be assigned to run in different processor protection rings, ensuring that a software error would not crash the system
Strategic mistakes
Novell's strategy with NetWare 286 2.x and 3.x was very successful; before the arrival of Windows NT Server, Novell claimed 90% of the market for PC based servers.
While the design of NetWare 3.x and later involved a DOS partition to load NetWare server files, this feature became a liability as new users preferred the Windows graphical interface to learning DOS commands necessary to build and control a NetWare server. Novell could have eliminated this technical liability by retaining the design of NetWare 286, which installed the server file into a Novell partition and allowed the server to boot from the Novell partition without creating a bootable DOS partition. Novell finally added support for this in a Support Pack for NetWare 6.5.
As Novell used IPX/SPX instead of TCP/IP, they were poorly positioned to take advantage of the Internet in 1995. This resulted in Novell servers being bypassed for routing and Internet access, in favor of hardware routers, Unix-based operating systems such as FreeBSD, and SOCKS and HTTP Proxy Servers on Windows and other operating systems.
NetWare 4.1x and NetWare for Small Business: Novell begins to recover
Novell priced NetWare 4.10 similarly to NetWare 3.12, allowing customers who resisted NDS (typically small businesses) to try it at no cost.
Later Novell released NetWare version 4.11 in 1996 which included many enhancements that made the operating system easier to install, easier to operate, faster, and more stable. It also included the first full 32-bit client for Microsoft Windows-based workstations, SMP support and the NetWare Administrator (NWADMIN or NWADMN32), a GUI-based administration tool for NetWare. Previous administration tools used the Cworthy interface, the character-based GUI tools such as SYSCON and PCONSOLE with blue text-based background. Some of these tools survive to this day, for instance MONITOR.NLM.
Novell packaged NetWare 4.11 with its Web server, TCP/IP support and Netscape browser into a bundle dubbed IntranetWare (also written as intraNetWare). A version designed for networks of 25 or fewer users was named IntranetWare for Small Business and contained a limited version of NDS and tried to simplify NDS administration. The intranetWare name was dropped in NetWare 5.
During this time Novell also began to leverage its directory service, NDS, by tying their other products into the directory. Their e-mail system, GroupWise, was integrated with NDS, and Novell released many other directory-enabled products such as ZENworks and BorderManager.
NetWare still required IPX/SPX as NCP used it, but Novell started to acknowledge the demand for TCP/IP with NetWare 4.11 by including tools and utilities that made it easier to create intranets and link networks to the Internet. Novell bundled tools, such as the IPX/IP gateway, to ease the connection between IPX workstations and IP networks. It also began integrating Internet technologies and support through features such as a natively hosted web server.
NetWare 5.x
With the release of NetWare 5 in October 1998, Novell finally acknowledged the prominence of the Internet by switching its primary NCP interface from the IPX/SPX network protocol to TCP/IP. IPX/SPX was still supported, but the emphasis shifted to TCP/IP. Novell also added a GUI to NetWare. Other new features were:
• Novell Storage Services (NSS), a new file system to replace the traditional NetWare File System - which was still supported
• Java virtual machine for NetWare
• Novell Distributed Print Services (NDPS)
• ConsoleOne, a new Java-based GUI administration console
• directory-enabled Public key infrastructure services (PKIS)
• directory-enabled DNS and DHCP servers
• support for Storage Area Networks (SANs)
• Novell Cluster Services (NCS)
• Oracle 8i with a 5-user license
The Cluster Services were a major advance over SFT-III, as NCS does not require specialized hardware or identical server configurations.
NetWare 5 was released during a time when NetWare market share dropped precipitously; many companies and organizations were replacing their NetWare servers with servers running Microsoft's Windows NT operating system. Novell also released their last upgrade to the NetWare 4 operating system, NetWare 4.2.
NetWare 5.1 was released in January 2000, shortly after its predecessor. It introduced a number of useful tools, such as:
• IBM WebSphere Application Server
• NetWare Management Portal (later renamed Novell Remote Manager), web-based management of the operating system
• FTP, NNTP and streaming media servers
• NetWare Web Search Server
• WebDAV support
NetWare 6.0
NetWare 6 was released in October 2001. This version has a simplified licensing scheme based on users, not servers. This allows unlimited connections per user.
NetWare 6.5
NetWare 6.5 was released in August 2003. Some of the new features in this version were:
• more open-source products such as PHP, MySQL and OpenSSH
• a port of the Bash shell and a lot of traditional Unix utilities such as wget, grep, awk and sed to provide additional capabilities for scripting
• iSCSI support (both target and initiator)
• Virtual Office - an "out of the box" web portal for end users providing access to e-mail, personal file storage, company address book, etc.
• Domain controller functionality
• Universal password
• DirXML Starter Pack - synchronization of user accounts with another eDirectory tree, a Windows NT domain or Active Directory.
• exteNd Application Server - a J2EE 1.3-compatible application server
• support for customized printer driver profiles and printer usage auditing
• NX bit support
• support for USB storage devices
• support for encrypted volumes
Open Enterprise Server
1.0
In 2003, Novell announced the successor product to NetWare: Open Enterprise Server (OES). First released in March 2005, OES completes the separation of the services traditionally associated with NetWare (e.g. Directory Services, file-and-print) from the platform underlying the delivery of those services. OES is essentially a set of applications (eDirectory, NetWare Core Protocol services, iPrint, etc.) that can run atop either a Linux or a NetWare kernel platform. Clustered OES implementations can even migrate services from Linux to NetWare and back again, making Novell one of the very few vendors to offer a multi-platform clustering solution.
Consequent to Novell's acquisitions of Ximian and SuSE, a German Linux distributor, it is widely observed that Novell is moving away from NetWare and shifting its focus towards Linux. Much recent marketing seems to be focussed on getting faithful NetWare users to move to the Linux platform in future releases.The clearest indication of this direction is Novell's controversial decision to release Open Enterprise Server in Linux form only. Novell later watered down this decision and stated that NetWare's 90 million users would be supported until at least 2015. Some of Novell's more perverse NetWare supporters have taken it upon themselves to petition Novell to keep NetWare in development.
[edit] 2.0
OES 2 was released on October 8, 2007. It includes NetWare 6.5 SP7, which supports running as a paravirtualized guest inside the Xen hypervisor and new Linux based version using SLES10.
New features include
• 64bit support
• Virtualization
• Dynamic Storage Technology, which provide Shadow Volumes
• Domain services for Windows (provided in OES 2 service pack 1)
Current NetWare situation
While Novell NetWare is still used by some organizations, its ongoing decline in popularity began in the mid-1990s, when NetWare was the de facto standard for file and print software for the Intel x86 server platform. Modern (2009) NetWare and OES installations are used by larger organizations that may need the added flexibility they provide.
Microsoft successfully shifted market share away from NetWare products toward their own in the late-1990s. Microsoft's more aggressive marketing was aimed directly to management through major magazines; Novell NetWare's was through IT specialist magazines with distribution limited to select IT personnel.
Novell did not adapt their pricing structure accordingly and NetWare sales suffered at the hands of those corporate decision makers whose valuation was based on initial licensing fees. As a result organizations that still use NetWare, eDirectory, and Novell software often have a hybrid infrastructure of NetWare, Linux, and Windows servers.
Netware Lite / Personal Netware
In 1991 Novell introduced a radically different and cheaper product - Netware Lite in answer to Artisoft's similar LANtastic. Both were peer to peer systems, where no specialist server was required, but instead all PCs on the network could share their resources.
The product line became Personal Netware in 1993.
Performance
NetWare dominated the network operating system (NOS) market from the mid-80s through the mid- to late-90s due to its extremely high performance relative to other NOS technologies. Most benchmarks during this period demonstrated a 5:1 to 10:1 performance advantage over products from Microsoft, Banyan, and others. One noteworthy benchmark NetWare 3.x running NFS services over TCP/IP (not NetWare's native IPX protocol) to a dedicated Auspex NFS server and a SCO Unix server running NFS service. NetWare NFS outperformed both 'native' NFS systems and claimed a 2:1 performance advantage over SCO Unix NFS on the same hardware.
There were several reasons for NetWare's performance.
File service instead of disk service
At the time NetWare was first developed, nearly all LAN storage was based on the disk server model. This meant that if a client computer wanted to read a particular block from a particular file it would have to issue the following requests across the relatively slow LAN:
1. Read first block of directory
2. Continue reading subsequent directory blocks until the directory block containing the information on the desired file was found, could be many directory blocks
3. Read through multiple file entry blocks until the block containing the location of the desired file block was found, could be many directory blocks
4. Read the desired data block
NetWare, since it was based on a file service model, interacted with the client at the file API level:
1. Send file open request (if this hadn't already been done)
2. Send a request for the desired data from the file
All of the work of searching the directory to figure out where the desired data was physically located on the disk was performed at high speed locally on the server. By the mid-1980s, most NOS products had shifted from the disk service to the file service model. Today, the disk service model is making a comeback, see SAN.
Aggressive caching
From the start, NetWare was designed to be used on servers with copious amounts of RAM. The entire file allocation table (FAT) was read into RAM when a volume was mounted, thereby requiring a minimum amount of RAM proportional to online disk space; adding a disk to a server would often require a RAM upgrade as well. Unlike most competing network operating systems prior to Windows NT, NetWare automatically used all otherwise unused RAM for caching active files, employing delayed write-backs to facilitate re-ordering of disk requests (elevator seeks). An unexpected shutdown could therefore corrupt data, making an uninterruptible power supply practically a mandatory part of a server installation.
The default dirty cache delay time was fixed at 2.2 seconds in NetWare 286 versions 2.x. Starting with NetWare 386 3.x, the dirty disk cache delay time and dirty directory cache delay time settings controlled the amount of time the server would cache changed ("dirty") data before saving (flushing) the data to a hard drive. The default setting of 3.3 seconds could be decreased to 0.5 seconds but not reduced to zero, while the maximum delay was 10 seconds. The option to increase the cache delay to 10 seconds provided a significant performance boost. Windows 2000 and 2003 server do not allow adjustment to the cache delay time. Instead, they use an algorithm that adjusts cache delay.

Efficiency of NetWare Core Protocol (NCP)
Most network protocols in use at the time NetWare was developed didn't trust the network to deliver messages. A typical client file read would work something like this:
1. Client sends read request to server
2. Server acknowledges request
3. Client acknowledges acknowledgement
4. Server sends requested data to client
5. Client acknowledges data
6. Server acknowledges acknowledgement
In contrast, NCP was based on the idea that networks worked perfectly most of the time, so the reply to a request served as the acknowledgement. Here is an example of a client read request using this model:
1. Client sends read request to server
2. Server sends requested data to client
All requests contained a sequence number, so if the client didn't receive a response within an appropriate amount of time it would re-send the request with the same sequence number. If the server had already processed the request it would resend the cached response, if it had not yet had time to process the request it would only send a "positive acknowledgement". The bottom line to this 'trust the network' approach was a 2/3 reduction in network transactions and the associated latency.
Non-preemptive OS designed for network services
One of the raging debates of the 90s was whether it was more appropriate for network file service to be performed by a software layer running on top of a general purpose operating system, or by a special purpose operating system. NetWare was a special purpose operating system, not a timesharing OS. It was written from the ground up as a platform for client-server processing services. Initially it focused on file and print services, but later demonstrated its flexibility by running database, email, web and other services as well. It also performed efficiently as a router, supporting IPX, TCP/IP, and Appletalk, though it never offered the flexibility of a 'hardware' router.
In 4.x and earlier versions, NetWare did not support preemption, virtual memory, graphical user interfaces, etc. Processes and services running under the NetWare OS were expected to be cooperative, that is to process a request and return control to the OS in a timely fashion. On the down side, this trust of application processes to manage themselves could lead to a misbehaving application bringing down the server.
By comparison, general purpose operating systems such as Unix or Microsoft Windows were based on an interactive, time-sharing model where competing programs would consume all available resources if not held in check by the Operating System. Such environments operated by preemption, memory virtualization, etc., generating significant overhead because there were never enough resources to do everything every application desired. These systems improved over time as network services shed their “application” stigma and moved deeper into the kernel of the “general purpose” OS, but they never equaled the efficiency of NetWare.
Probably the single greatest reason for Novell's success during the 80's and 90's was the efficiency of NetWare compared to general purpose operating systems. However, as microprocessors increased in power, efficiency became less and less of an issue. With the introduction of the Pentium processor, NetWare's performance advantage began to be outweighed by the complexity of managing and developing applications for the NetWare environment.

Sunday, February 14, 2010

Next: Process Synchronization: Semaphores Up: Process Management Previous: Process Contents


Process Scheduling [52]



summary
• only one process at a time is running on the CPU
• process gives up CPU:
• if it starts waiting for an event
• otherwise: other processes need fair access
• OS schedules which ready process to run next
• time slice or quantum for each process
• scheduling algorithms:
• different goals
• affect performance


Scheduling: Definitions [53]


long-term scheduler
• job scheduler
• which process on disk should be given memory?
• result: new process in ready queue
• important in batch systems
• many processes in memory IMPLIES high degree of multiprogramming


short-term scheduler
• CPU scheduler
• which process in ready queue should be given CPU
• result: new process on CPU


Scheduling: Definitions [54]


CPU-bound
• most of its time doing computation - little I/O


I/O-bound
• most of its time doing I/O - little computation


multilevel scheduling
• classified into different groups
• foreground (interactive) vs.
• background (batch)
• each group has its own ready queue


Performance: Definitions [55]


utilization
• percentage of time that the CPU is busy.
• if not busy, ready queue must be empty
• CPU actually executes NULL process
• goal: keep the CPU busy


throughput
• if busy, then work is being done
• number of processes completed per second


turnaround
• total time to complete a process
• includes waiting in the ready queue
• executing on the CPU
• waiting for I/O
• goal: fast turnaround


Performance: Definitions [56]


response
• time waiting in the ready queue and
• executing on CPU until some output produced
• average is across all output events
• goal: fast response time


waiting
• sum of periods spent waiting in ready queue
• average is across all visits to ready queue
• goal: short waiting time


important
• scheduler has a direct effect on waiting time
• decides which process in queue gets to run next
• remaining processes must then wait longer
• OS cannot control code, amount of I/O, etc.


Performance: Summary [57]
• UTILIZATION: CPU %busy
• THROUGHPUT: jobs/sec
• WAITING: sec/job
• RESPONSE: sec/job (usually in time-share systems)
• TURNAROUND: sec/job (usually in batch systems)





CPU Burst [58]


CPU burst
• cycle of CPU burst, I/O wait, CPU burst, ...
• program and data determine length of burst
• scheduler may interrupt a burst
• but does not affect the full length



scanf n, a, b /* I/O wait */
for (i=1; i<=n; i++) /* CPU burst */
x = x + a*b;
printf x /* I/O wait */
for (i=1; i<=n; i++) /* CPU burst */
for (j=1; j<=n; j++)
x = x + a*b;
printf x /* I/O wait */


Scheduling: FCFS [59]
• First-Come, First-Served is simplest scheduling algorithm
• ready queue is a FIFO queue: First-In, First-Out
• longest waiting process at the front (head) of queue
• new ready processes join the rear (tail)
• nonpreemptive: executes until voluntarily gives up CPU
• finished or waits for some event
• problem:
• CPU-bound process may require a long CPU burst
• other processes, with very short CPU bursts, wait in queue
• reduces CPU and I/O device utilization
• it would be better if the shorter processes went first


Scheduling: FCFS [60]

• assume processes arrive in this order: P1, P2, P3
• nonpreemptive scheduling
• average waiting time: (0+24+27)/3=17 ms
PID Burst
P1 24
P2 3
P3 3
Gantt chart


0242730

• assume processes arrive in this order: P2, P3, P1
• average waiting time: (6+0+3)/3=3 ms

PID Burst
P2 3
P3 3
P1 24
Gantt chart


03630

• in general, FCFS average waiting time is not minimal
• in general, better to process shortest jobs first


Scheduling: Round Robin (RR) [61]
• similar to FCFS, but preemption to switch between processes
• time quantum (time slice) is a small unit of time (10 to 100 ms)
• process is executed on the CPU for at most one time quantum
• implemented by using the ready queue as a circular queue
• head process gets the CPU
• uses less than a time quantum IMPLIES gives up the CPU voluntarily
• uses full time quantum IMPLIES timer will cause an interrupt
• context switch will be executed
• process will be put at the tail of queue


[III.B] Scheduling: RR [62]

• assume processes arrive in this order: P1, P2, P3
• preemptive scheduling
• time quantum: 4 ms
• P1 uses a full time quantum; P2, P3 use only a part of a quantum
• P1 waits 0+6=6; P2 waits 4; P3 waits 7
• average waiting time: (6+4+7)/3=5.66 ms

PID Burst
P1 24
P2 3
P3 3
Gantt chart


047101418222630

• very large time quantum IMPLIES RR = FCFS
• very small time quantum IMPLIES context switch is too much overhead
• quantum approximately CPU burst IMPLIES better turnaround
• rule of thumb: 80% should finish burst in 1 quantum


Scheduling: Shortest-Job-First (SJF) [63]
• assume the next burst time of each process is known
• SJF selects process which has the shortest burst time
• optimal algorithm because it has the shortest average waiting time
• impossible to know in advance
• OS knows the past burst times - make a prediction using an average
• nonpreemptive
• or preemptive:
• shortest-remaining-time-first
• interrupts running process if a new process enters the queue
• new process must have shorter burst than remaining time


Scheduling: SJF [64]

• assume all processes arrive at the same time: P1, P2, P3, P4
• nonpreemptive scheduling
• average waiting time: (3+16+9+0)/4=7 ms

PID Burst
P1 6
P2 8
P3 7
P4 3
Gantt chart


0391624

• SJF is optimal: shortest average waiting time
• but burst times are not known in advance
• next_predicted burst time by (weighted) average of past burst times

• next_predict = last_observed + last_predict
• next_predict = initialized value (usually 0)
• next_predict = last_observed


SJF: Weighted Average Burst [65]

:
:
: recent and past history the same
time 0 1 2 3 4 5 6 7
Burst ( )
6 4 6 4 13 13 13
Guess ( )
10 8 6 6 5 9 11 12



Scheduling: SJF [66]

• assume processes arrive at 1 ms intervals: P1, P2, P3, P4
• preemptive scheduling: shortest-remaining-time-first
• P1 waits 0+(10-1)=9; P2 waits 1-1=0
• P3 waits 17-2=15; P4 waits 5-3=2
• average waiting time: (9+0+15+2)/4=6.5 ms

PID Burst Arrival
P1 8 0
P2 4 1
P3 9 2
P4 5 3
Gantt chart


015101726

• nonpremptive SJF: 7.75 ms


Scheduling: Priority (PRIO) [67]
• assume a priority is associated with each process
• select highest priority process from the ready queue
• let be the (predicted) next CPU burst of a process
• SJF is a special case of priority scheduling
• assume: high numbers IMPLY high priority
• then priority is
• assume: low numbers IMPLY high priority
• then priority is
• equal-priority processes are scheduled in FCFS order
• PRIO can be preemptive or nonpreemptive
• priorities can be defined internally
• memory requirements, number of open files, burst times
• priorities can be defined externally
• user, department, company


Scheduling: PRIO [68]

• assume all processes arrive at the same time: P1, P2, P3, P4, P5
• nonpreemptive scheduling
• high priority: low number
• some OS use a high number!!! See VOS.
• average waiting time is: (6+0+16+18+1)/5=8.2 ms

PID Burst Priority
P1 10 3
P2 1 1
P3 2 3
P4 1 4
P5 5 2
Gantt chart


0161618 19

• indefinite blocking (starvation): low priority process never runs
• aging: low priorities increase with waiting time, will eventually run


VOS Scheduling: PRIO, FCFS, SJF [69]

for (i=1; i<=10; i++){ /* 10 CPU BURSTS */
for (j=1;j<=HOWLONG;j++) /* 1 CPU BURST */
pm_busywait(); /* PID1:long PID2:medium PID 3:short*/
pm_yield(); /* GO BACK TO READY QUEUE */
}
PRIO FCFS SJF
PID Burst priority=fixed priority=equal priority=1/burst
1 long 2 1 low
2 medium 3 high 1 medium
3 short 1 low 1 high


• schedulers favor different PIDs
• SUMMARY shows CPU burst (running) time for each PID
• SUMMARY shows waiting time for each PID in ready queue
• Gantt chart shows how long each PID is on the CPU
• schedulers have different performance


VOS Scheduling: PRIO [70]

===================================== SUMMARY =================================
FREE SUSPENDED READY RUNNING WAITING RECEIVING SLEEPING WRITING READ
PID time cnt time cnt time cnt time cnt time cnt time cnt time cnt time cnt ...
--- ---- --- ---- --- ---- --- ---- --- ---- --- ---- --- ---- --- ---- ---
0 0 1 0 0 72 2 17 3 0 0 0 0 0 0 0 0
1 29 2 1 1 25 11 34 11 0 0 0 0 0 0 0 0
2 64 2 0 1 1 11 24 11 0 0 0 0 0 0 0 0
3 16 2 1 1 58 11 14 11 0 0 0 0 0 0 0 0
4 89 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 89 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 89 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 89 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8 89 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9 89 2 0 1 0 1 0 1 0 0 0 0 0 0 0 0
---- --- ---- --- ---- --- ---- --- ---- --- ---- --- ---- --- ---- ---
TOT 643 14 2 4 156 36 89 37 0 0 0 0 0 0 0 0

Utilization: 80.9 %Busy
Throughput : 2.0 Jobs/Min
Wait Time : 28.0 Sec/Job
Burst Time : 24.0 Sec/Job


VOS Scheduling: PRIO [71]

Scheduling Algorithm: PRIO
>>> SUMMARY (READY) <<< >>>>>>>>>>> SUMMARY (RUNNING) <<<<<<<<<<
PID TOT Wait Time TOT Burst Time / Cnt = Single Burst
=== ============= ============== === ============
1 25 34 11 3.1
2 1 24 11 2.2
3 58 14 11 1.3
Sum Wait Time: 84 /3 Jobs Sum Burst Time: 72 /3 Jobs
Avg Wait Time: 28 Sec/Job Avg Burst Time: 24 Sec/Job
Longest Wait: 58(PID: 3) Longest Single Burst: 3.1(PID: 1)
Shortest Wait: 1(PID: 2) Shortest Single Burst: 1.3(PID: 3)
• other algorithms will have different average wait time


VOS Scheduling: PRIO [72]

Gantt chart of CPU Usage (Last Scheduler: PRIO)

------------------+-----------+---+-----+---+---+-----+---+--
PID |0 |2 |2 |2 |2 |2 |2 |2
------------------+-----------+---+-----+---+---+-----+---+--
time 9 15 17 20 22 24 27 29


--+-----+---+---+-+-----+-------+-----+-----+-------+-----+--
PID |2 |2 |2 |2|1 |1 |1 |1 |1 |1 |1
--+-----+---+---+-+-----+-------+-----+-----+-------+-----+--
time 31 34 36 3839 42 46 49 52 56 59


------+-----+-----+-------+-+---+-+---+-+-+---+-+---+-+------
PID |1 |1 |1 |1|3 |3|3 |3|3|3 |3|3 |3|3
------+-----+-----+-------+-+---+-+---+-+-+---+-+---+-+------
time 63 66 69 7374 7677 798081 8384 8687
• other algorithms will favor different PIDs


Mechanism (how) vs. Policy (what) [73]


mechanism
• how to do something
• implementation or function with parameters
• used in many ways (by policies)
• OS may be micro kernel - only basic mechanisms
• policies are decided at the user level


policy
• what or when to do something
• set of rules
• use mechanisms by setting parameters
• important choices in the design of the OS
• mechanisms should be separate from policies


Mechanism vs. Policy: Examples [74]


Timer(x sec)
• Policy 1: if LOW_PRIORITY Timer(0.1) else Timer(1.0)
• Policy 2: if LOW_PRIORITY Timer(0.1) else Timer(0.2)


Schedule(job)
• Policy 1: Schedule(I/O Job A); Schedule(CPU Job B)
• Policy 2: Schedule(CPU Job A); Schedule(I/O Job B)


Preempt(job)
• Policy 1: if A.running greater than 0.1 sec then Preempt(Job A)
• Policy 2: if A.running greater than 0.2 sec then Preempt(Job A)


Remove_From
Ready_Q(job)
• Policy 1: Remove_Ready_Q(oldest job): FCFS
• Policy 2: Remove_Ready_Q(highest priority job): PRIO
• Policy 3: Remove_Ready_Q(shortest job): SJF

Process Management

Try this link to know about Process Scheduling

Mamory Management







Memory Management

The memory management subsystem is one of the most important parts of the operating system. Since the early days of computing, there has been a need for more memory than exists physically in a system. Strategies have been developed to overcome this limitation and the most successful of these is virtual memory. Virtual memory makes the system appear to have more memory than it actually has by sharing it between competing processes as they need it.

Virtual memory does more than just make your computer's memory go further. The memory management subsystem provides:


Large Address Spaces
The operating system makes the system appear as if it has a larger amount of memory than it actually has. The virtual memory can be many times larger than the physical memory in the system,

Protection
Each process in the system has its own virtual address space. These virtual address spaces are completely separate from each other and so a process running one application cannot affect another. Also, the hardware virtual memory mechanisms allow areas of memory to be protected against writing. This protects code and data from being overwritten by rogue applications.

Memory Mapping
Memory mapping is used to map image and data files into a processes address space. In memory mapping, the contents of a file are linked directly into the virtual address space of a process.

Fair Physical Memory Allocation
The memory management subsystem allows each running process in the system a fair share of the physical memory of the system,

Shared Virtual Memory
Although virtual memory allows processes to have separate (virtual) address spaces, there are times when you need processes to share memory. For example there could be several processes in the system running the bash command shell. Rather than have several copies of bash, one in each processes virtual address space, it is better to have only one copy in physical memory and all of the processes running bash share it. Dynamic libraries are another common example of executing code shared between several processes.
Shared memory can also be used as an Inter Process Communication (IPC) mechanism, with two or more processes exchanging information via memory common to all of them. Linux supports the Unix TM System V shared memory IPC.

An Abstract Model of Virtual Memory




Before considering the methods that Linux uses to support virtual memory it is useful to consider an abstract model that is not cluttered by too much detail.

As the processor executes a program it reads an instruction from memory and decodes it. In decoding the instruction it may need to fetch or store the contents of a location in memory. The processor then executes the instruction and moves onto the next instruction in the program. In this way the processor is always accessing memory either to fetch instructions or to fetch and store data.

In a virtual memory system all of these addresses are virtual addresses and not physical addresses. These virtual addresses are converted into physical addresses by the processor based on information held in a set of tables maintained by the operating system.

To make this translation easier, virtual and physical memory are divided into handy sized chunks called pages. These pages are all the same size, they need not be but if they were not, the system would be very hard to administer. Linux on Alpha AXP systems uses 8 Kbyte pages and on Intel x86 systems it uses 4 Kbyte pages. Each of these pages is given a unique number; the page frame number (PFN).

In this paged model, a virtual address is composed of two parts; an offset and a virtual page frame number. If the page size is 4 Kbytes, bits 11:0 of the virtual address contain the offset and bits 12 and above are the virtual page frame number. Each time the processor encounters a virtual address it must extract the offset and the virtual page frame number. The processor must translate the virtual page frame number into a physical one and then access the location at the correct offset into that physical page. To do this the processor uses page tables.

Figure 3.1 shows the virtual address spaces of two processes, process X and process Y, each with their own page tables. These page tables map each processes virtual pages into physical pages in memory. This shows that process X's virtual page frame number 0 is mapped into memory in physical page frame number 1 and that process Y's virtual page frame number 1 is mapped into physical page frame number 4. Each entry in the theoretical page table contains the following information:


Valid flag. This indicates if this page table entry is valid,
The physical page frame number that this entry is describing,
Access control information. This describes how the page may be used. Can it be written to? Does it contain executable code?
The page table is accessed using the virtual page frame number as an offset. Virtual page frame 5 would be the 6th element of the table (0 is the first element).

To translate a virtual address into a physical one, the processor must first work out the virtual addresses page frame number and the offset within that virtual page. By making the page size a power of 2 this can be easily done by masking and shifting. Looking again at Figures 3.1 and assuming a page size of 0x2000 bytes (which is decimal 8192) and an address of 0x2194 in process Y's virtual address space then the processor would translate that address into offset 0x194 into virtual page frame number 1.

The processor uses the virtual page frame number as an index into the processes page table to retrieve its page table entry. If the page table entry at that offset is valid, the processor takes the physical page frame number from this entry. If the entry is invalid, the process has accessed a non-existent area of its virtual memory. In this case, the processor cannot resolve the address and must pass control to the operating system so that it can fix things up.

Just how the processor notifies the operating system that the correct process has attempted to access a virtual address for which there is no valid translation is specific to the processor. However the processor delivers it, this is known as a page fault and the operating system is notified of the faulting virtual address and the reason for the page fault.

Assuming that this is a valid page table entry, the processor takes that physical page frame number and multiplies it by the page size to get the address of the base of the page in physical memory. Finally, the processor adds in the offset to the instruction or data that it needs.

Using the above example again, process Y's virtual page frame number 1 is mapped to physical page frame number 4 which starts at 0x8000 (4 x 0x2000). Adding in the 0x194 byte offset gives us a final physical address of 0x8194.

By mapping virtual to physical addresses this way, the virtual memory can be mapped into the system's physical pages in any order. For example, in Figure 3.1 process X's virtual page frame number 0 is mapped to physical page frame number 1 whereas virtual page frame number 7 is mapped to physical page frame number 0 even though it is higher in virtual memory than virtual page frame number 0. This demonstrates an interesting byproduct of virtual memory; the pages of virtual memory do not have to be present in physical memory in any particular order.


3.1.1 Demand Paging
As there is much less physical memory than virtual memory the operating system must be careful that it does not use the physical memory inefficiently. One way to save physical memory is to only load virtual pages that are currently being used by the executing program. For example, a database program may be run to query a database. In this case not all of the database needs to be loaded into memory, just those data records that are being examined. If the database query is a search query then it does not make sense to load the code from the database program that deals with adding new records. This technique of only loading virtual pages into memory as they are accessed is known as demand paging.

When a process attempts to access a virtual address that is not currently in memory the processor cannot find a page table entry for the virtual page referenced. For example, in Figure 3.1 there is no entry in process X's page table for virtual page frame number 2 and so if process X attempts to read from an address within virtual page frame number 2 the processor cannot translate the address into a physical one. At this point the processor notifies the operating system that a page fault has occurred.

If the faulting virtual address is invalid this means that the process has attempted to access a virtual address that it should not have. Maybe the application has gone wrong in some way, for example writing to random addresses in memory. In this case the operating system will terminate it, protecting the other processes in the system from this rogue process.

If the faulting virtual address was valid but the page that it refers to is not currently in memory, the operating system must bring the appropriate page into memory from the image on disk. Disk access takes a long time, relatively speaking, and so the process must wait quite a while until the page has been fetched. If there are other processes that could run then the operating system will select one of them to run. The fetched page is written into a free physical page frame and an entry for the virtual page frame number is added to the processes page table. The process is then restarted at the machine instruction where the memory fault occurred. This time the virtual memory access is made, the processor can make the virtual to physical address translation and so the process continues to run.

Linux uses demand paging to load executable images into a processes virtual memory. Whenever a command is executed, the file containing it is opened and its contents are mapped into the processes virtual memory. This is done by modifying the data structures describing this processes memory map and is known as memory mapping. However, only the first part of the image is actually brought into physical memory. The rest of the image is left on disk. As the image executes, it generates page faults and Linux uses the processes memory map in order to determine which parts of the image to bring into memory for execution.


3.1.2 Swapping
If a process needs to bring a virtual page into physical memory and there are no free physical pages available, the operating system must make room for this page by discarding another page from physical memory.

If the page to be discarded from physical memory came from an image or data file and has not been written to then the page does not need to be saved. Instead it can be discarded and if the process needs that page again it can be brought back into memory from the image or data file.

However, if the page has been modified, the operating system must preserve the contents of that page so that it can be accessed at a later time. This type of page is known as a dirty page and when it is removed from memory it is saved in a special sort of file called the swap file. Accesses to the swap file are very long relative to the speed of the processor and physical memory and the operating system must juggle the need to write pages to disk with the need to retain them in memory to be used again.

If the algorithm used to decide which pages to discard or swap (the swap algorithm is not efficient then a condition known as thrashing occurs. In this case, pages are constantly being written to disk and then being read back and the operating system is too busy to allow much real work to be performed. If, for example, physical page frame number 1 in Figure 3.1 is being regularly accessed then it is not a good candidate for swapping to hard disk. The set of pages that a process is currently using is called the working set. An efficient swap scheme would make sure that all processes have their working set in physical memory.

Linux uses a Least Recently Used (LRU) page aging technique to fairly choose pages which might be removed from the system. This scheme involves every page in the system having an age which changes as the page is accessed. The more that a page is accessed, the younger it is; the less that it is accessed the older and more stale it becomes. Old pages are good candidates for swapping.


3.1.3 Shared Virtual Memory
Virtual memory makes it easy for several processes to share memory. All memory access are made via page tables and each process has its own separate page table. For two processes sharing a physical page of memory, its physical page frame number must appear in a page table entry in both of their page tables.

Figure 3.1 shows two processes that each share physical page frame number 4. For process X this is virtual page frame number 4 whereas for process Y this is virtual page frame number 6. This illustrates an interesting point about sharing pages: the shared physical page does not have to exist at the same place in virtual memory for any or all of the processes sharing it.


3.1.4 Physical and Virtual Addressing Modes
It does not make much sense for the operating system itself to run in virtual memory. This would be a nightmare situation where the operating system must maintain page tables for itself. Most multi-purpose processors support the notion of a physical address mode as well as a virtual address mode. Physical addressing mode requires no page tables and the processor does not attempt to perform any address translations in this mode. The Linux kernel is linked to run in physical address space.
The Alpha AXP processor does not have a special physical addressing mode. Instead, it divides up the memory space into several areas and designates two of them as physically mapped addresses. This kernel address space is known as KSEG address space and it encompasses all addresses upwards from 0xfffffc0000000000. In order to execute from code linked in KSEG (by definition, kernel code) or access data there, the code must be executing in kernel mode. The Linux kernel on Alpha is linked to execute from address 0xfffffc0000310000.


3.1.5 Access Control
The page table entries also contain access control information. As the processor is already using the page table entry to map a processes virtual address to a physical one, it can easily use the access control information to check that the process is not accessing memory in a way that it should not.

There are many reasons why you would want to restrict access to areas of memory. Some memory, such as that containing executable code, is naturally read only memory; the operating system should not allow a process to write data over its executable code. By contrast, pages containing data can be written to but attempts to execute that memory as instructions should fail. Most processors have at least two modes of execution: kernel and user. You would not want kernel code executing by a user or kernel data structures to be accessible except when the processor is running in kernel mode.






Figure 3.2: Alpha AXP Page Table Entry

The access control information is held in the PTE and is processor specific; figure 3.2 shows the PTE for Alpha AXP. The bit fields have the following meanings:


V
Valid, if set this PTE is valid,
FOE
``Fault on Execute'', Whenever an attempt to execute instructions in this page occurs, the processor reports a page fault and passes control to the operating system,
FOW
``Fault on Write'', as above but page fault on an attempt to write to this page,
FOR
``Fault on Read'', as above but page fault on an attempt to read from this page,
ASM
Address Space Match. This is used when the operating system wishes to clear only some of the entries from the Translation Buffer,
KRE
Code running in kernel mode can read this page,
URE
Code running in user mode can read this page,
GH
Granularity hint used when mapping an entire block with a single Translation Buffer entry rather than many,
KWE
Code running in kernel mode can write to this page,
UWE
Code running in user mode can write to this page,
page frame number
For PTEs with the V bit set, this field contains the physical Page Frame Number (page frame number) for this PTE. For invalid PTEs, if this field is not zero, it contains information about where the page is in the swap file.
The following two bits are defined and used by Linux:


_PAGE_DIRTY
if set, the page needs to be written out to the swap file,
_PAGE_ACCESSED
Used by Linux to mark a page as having been accessed.

3.2 Caches
If you were to implement a system using the above theoretical model then it would work, but not particularly efficiently. Both operating system and processor designers try hard to extract more performance from the system. Apart from making the processors, memory and so on faster the best approach is to maintain caches of useful information and data that make some operations faster. Linux uses a number of memory management related caches:

Buffer Cache
The buffer cache contains data buffers that are used by the block device drivers.

These buffers are of fixed sizes (for example 512 bytes) and contain blocks of information that have either been read from a block device or are being written to it. A block device is one that can only be accessed by reading and writing fixed sized blocks of data. All hard disks are block devices.

The buffer cache is indexed via the device identifier and the desired block number and is used to quickly find a block of data. Block devices are only ever accessed via the buffer cache. If data can be found in the buffer cache then it does not need to be read from the physical block device, for example a hard disk, and access to it is much faster.


Page Cache
This is used to speed up access to images and data on disk.

It is used to cache the logical contents of a file a page at a time and is accessed via the file and offset within the file. As pages are read into memory from disk, they are cached in the page cache.


Swap Cache
Only modified (or dirty) pages are saved in the swap file.

So long as these pages are not modified after they have been written to the swap file then the next time the page is swapped out there is no need to write it to the swap file as the page is already in the swap file. Instead the page can simply be discarded. In a heavily swapping system this saves many unnecessary and costly disk operations.


Hardware Caches
One commonly implemented hardware cache is in the processor; a cache of Page Table Entries. In this case, the processor does not always read the page table directly but instead caches translations for pages as it needs them. These are the Translation Look-aside Buffers and contain cached copies of the page table entries from one or more processes in the system.

When the reference to the virtual address is made, the processor will attempt to find a matching TLB entry. If it finds one, it can directly translate the virtual address into a physical one and perform the correct operation on the data. If the processor cannot find a matching TLB entry then it must get the operating system to help. It does this by signalling the operating system that a TLB miss has occurred. A system specific mechanism is used to deliver that exception to the operating system code that can fix things up. The operating system generates a new TLB entry for the address mapping. When the exception has been cleared, the processor will make another attempt to translate the virtual address. This time it will work because there is now a valid entry in the TLB for that address.

The drawback of using caches, hardware or otherwise, is that in order to save effort Linux must use more time and space maintaining these caches and, if the caches become corrupted, the system will crash.


3.3 Linux Page Tables





Figure 3.3: Three Level Page Tables

Linux assumes that there are three levels of page tables. Each Page Table accessed contains the page frame number of the next level of Page Table. Figure 3.3 shows how a virtual address can be broken into a number of fields; each field providing an offset into a particular Page Table. To translate a virtual address into a physical one, the processor must take the contents of each level field, convert it into an offset into the physical page containing the Page Table and read the page frame number of the next level of Page Table. This is repeated three times until the page frame number of the physical page containing the virtual address is found. Now the final field in the virtual address, the byte offset, is used to find the data inside the page.

Each platform that Linux runs on must provide translation macros that allow the kernel to traverse the page tables for a particular process. This way, the kernel does not need to know the format of the page table entries or how they are arranged.

This is so successful that Linux uses the same page table manipulation code for the Alpha processor, which has three levels of page tables, and for Intel x86 processors, which have two levels of page tables.


3.4 Page Allocation and Deallocation
There are many demands on the physical pages in the system. For example, when an image is loaded into memory the operating system needs to allocate pages. These will be freed when the image has finished executing and is unloaded. Another use for physical pages is to hold kernel specific data structures such as the page tables themselves. The mechanisms and data structures used for page allocation and deallocation are perhaps the most critical in maintaining the efficiency of the virtual memory subsystem.
All of the physical pages in the system are described by the mem_map data structure which is a list of mem_map_t

1 structures which is initialized at boot time. Each mem_map_t describes a single physical page in the system. Important fields (so far as memory management is concerned) are:


count
This is a count of the number of users of this page. The count is greater than one when the page is shared between many processes,
age
This field describes the age of the page and is used to decide if the page is a good candidate for discarding or swapping,
map_nr
This is the physical page frame number that this mem_map_t describes.
The free_area vector is used by the page allocation code to find and free pages. The whole buffer management scheme is supported by this mechanism and so far as the code is concerned, the size of the page and physical paging mechanisms used by the processor are irrelevant.

Each element of free_area contains information about blocks of pages. The first element in the array describes single pages, the next blocks of 2 pages, the next blocks of 4 pages and so on upwards in powers of two. The list element is used as a queue head and has pointers to the page data structures in the mem_map array. Free blocks of pages are queued here. map is a pointer to a bitmap which keeps track of allocated groups of pages of this size. Bit N of the bitmap is set if the Nth block of pages is free.

Figure free-area-figure shows the free_area structure. Element 0 has one free page (page frame number 0) and element 2 has 2 free blocks of 4 pages, the first starting at page frame number 4 and the second at page frame number 56.


3.4.1 Page Allocation
Linux uses the Buddy algorithm 2 to effectively allocate and deallocate blocks of pages. The page allocation code

attempts to allocate a block of one or more physical pages. Pages are allocated in blocks which are powers of 2 in size. That means that it can allocate a block 1 page, 2 pages, 4 pages and so on. So long as there are enough free pages in the system to grant this request (nr_free_pages > min_free_pages) the allocation code will search the free_area for a block of pages of the size requested. Each element of the free_area has a map of the allocated and free blocks of pages for that sized block. For example, element 2 of the array has a memory map that describes free and allocated blocks each of 4 pages long.

The allocation algorithm first searches for blocks of pages of the size requested. It follows the chain of free pages that is queued on the list element of the free_area data structure. If no blocks of pages of the requested size are free, blocks of the next size (which is twice that of the size requested) are looked for. This process continues until all of the free_area has been searched or until a block of pages has been found. If the block of pages found is larger than that requested it must be broken down until there is a block of the right size. Because the blocks are each a power of 2 pages big then this breaking down process is easy as you simply break the blocks in half. The free blocks are queued on the appropriate queue and the allocated block of pages is returned to the caller.






Figure 3.4: The free_area data structure

For example, in Figure 3.4 if a block of 2 pages was requested, the first block of 4 pages (starting at page frame number 4) would be broken into two 2 page blocks. The first, starting at page frame number 4 would be returned to the caller as the allocated pages and the second block, starting at page frame number 6 would be queued as a free block of 2 pages onto element 1 of the free_area array.


3.4.2 Page Deallocation
Allocating blocks of pages tends to fragment memory with larger blocks of free pages being broken down into smaller ones. The page deallocation code

recombines pages into larger blocks of free pages whenever it can. In fact the page block size is important as it allows for easy combination of blocks into larger blocks.

Whenever a block of pages is freed, the adjacent or buddy block of the same size is checked to see if it is free. If it is, then it is combined with the newly freed block of pages to form a new free block of pages for the next size block of pages. Each time two blocks of pages are recombined into a bigger block of free pages the page deallocation code attempts to recombine that block into a yet larger one. In this way the blocks of free pages are as large as memory usage will allow.

For example, in Figure 3.4, if page frame number 1 were to be freed, then that would be combined with the already free page frame number 0 and queued onto element 1 of the free_area as a free block of size 2 pages.


3.5 Memory Mapping
When an image is executed, the contents of the executable image must be brought into the processes virtual address space. The same is also true of any shared libraries that the executable image has been linked to use. The executable file is not actually brought into physical memory, instead it is merely linked into the processes virtual memory. Then, as the parts of the program are referenced by the running application, the image is brought into memory from the executable image. This linking of an image into a processes virtual address space is known as memory mapping.






Figure 3.5: Areas of Virtual Memory

Every processes virtual memory is represented by an mm_struct data structure. This contains information about the image that it is currently executing (for example bash) and also has pointers to a number of vm_area_struct data structures. Each vm_area_struct data structure describes the start and end of the area of virtual memory, the processes access rights to that memory and a set of operations for that memory. These operations are a set of routines that Linux must use when manipulating this area of virtual memory. For example, one of the virtual memory operations performs the correct actions when the process has attempted to access this virtual memory but finds (via a page fault) that the memory is not actually in physical memory. This operation is the nopage operation. The nopage operation is used when Linux demand pages the pages of an executable image into memory.

When an executable image is mapped into a processes virtual address a set of vm_area_struct data structures is generated. Each vm_area_struct data structure represents a part of the executable image; the executable code, initialized data (variables), unitialized data and so on. Linux supports a number of standard virtual memory operations and as the vm_area_struct data structures are created, the correct set of virtual memory operations are associated with them.


3.6 Demand Paging
Once an executable image has been memory mapped into a processes virtual memory it can start to execute. As only the very start of the image is physically pulled into memory it will soon access an area of virtual memory that is not yet in physical memory. When a process accesses a virtual address that does not have a valid page table entry, the processor will report a page fault to Linux.

The page fault describes the virtual address where the page fault occurred and the type of memory access that caused.

Linux must find the vm_area_struct that represents the area of memory that the page fault occurred in. As searching through the vm_area_struct data structures is critical to the efficient handling of page faults, these are linked together in an AVL (Adelson-Velskii and Landis) tree structure. If there is no vm_area_struct data structure for this faulting virtual address, this process has accessed an illegal virtual address. Linux will signal the process, sending a SIGSEGV signal, and if the process does not have a handler for that signal it will be terminated.

Linux next checks the type of page fault that occurred against the types of accesses allowed for this area of virtual memory. If the process is accessing the memory in an illegal way, say writing to an area that it is only allowed to read from, it is also signalled with a memory error.

Now that Linux has determined that the page fault is legal, it must deal with it.

Linux must differentiate between pages that are in the swap file and those that are part of an executable image on a disk somewhere. It does this by using the page table entry for this faulting virtual address.

If the page's page table entry is invalid but not empty, the page fault is for a page currently being held in the swap file. For Alpha AXP page table entries, these are entries which do not have their valid bit set but which have a non-zero value in their PFN field. In this case the PFN field holds information about where in the swap (and which swap file) the page is being held. How pages in the swap file are handled is described later in this chapter.

Not all vm_area_struct data structures have a set of virtual memory operations and even those that do may not have a nopage operation. This is because by default Linux will fix up the access by allocating a new physical page and creating a valid page table entry for it. If there is a nopage operation for this area of virtual memory, Linux will use it.

The generic Linux nopage operation is used for memory mapped executable images and it uses the page cache to bring the required image page into physical memory.

However the required page is brought into physical memory, the processes page tables are updated. It may be necessary for hardware specific actions to update those entries, particularly if the processor uses translation look aside buffers. Now that the page fault has been handled it can be dismissed and the process is restarted at the instruction that made the faulting virtual memory access.


3.7 The Linux Page Cache





Figure 3.6: The Linux Page Cache

The role of the Linux page cache is to speed up access to files on disk. Memory mapped files are read a page at a time and these pages are stored in the page cache. Figure 3.6 shows that the page cache consists of the page_hash_table, a vector of pointers to mem_map_t data structures.

Each file in Linux is identified by a VFS inode data structure (described in Chapter filesystem-chapter) and each VFS inode is unique and fully describes one and only one file. The index into the page table is derived from the file's VFS inode and the offset into the file.

Whenever a page is read from a memory mapped file, for example when it needs to be brought back into memory during demand paging, the page is read through the page cache. If the page is present in the cache, a pointer to the mem_map_t data structure representing it is returned to the page fault handling code. Otherwise the page must be brought into memory from the file system that holds the image. Linux allocates a physical page and reads the page from the file on disk.

If it is possible, Linux will initiate a read of the next page in the file. This single page read ahead means that if the process is accessing the pages in the file serially, the next page will be waiting in memory for the process.

Over time the page cache grows as images are read and executed. Pages will be removed from the cache as they are no longer needed, say as an image is no longer being used by any process. As Linux uses memory it can start to run low on physical pages. In this case Linux will reduce the size of the page cache.


3.8 Swapping Out and Discarding Pages
When physical memory becomes scarce the Linux memory management subsystem must attempt to free physical pages. This task falls to the kernel swap daemon (kswapd).

The kernel swap daemon is a special type of process, a kernel thread. Kernel threads are processes have no virtual memory, instead they run in kernel mode in the physical address space. The kernel swap daemon is slightly misnamed in that it does more than merely swap pages out to the system's swap files. Its role is make sure that there are enough free pages in the system to keep the memory management system operating efficiently.

The Kernel swap daemon (kswapd) is started by the kernel init process at startup time and sits waiting for the kernel swap timer to periodically expire.

Every time the timer expires, the swap daemon looks to see if the number of free pages in the system is getting too low. It uses two variables, free_pages_high and free_pages_low to decide if it should free some pages. So long as the number of free pages in the system remains above free_pages_high, the kernel swap daemon does nothing; it sleeps again until its timer next expires. For the purposes of this check the kernel swap daemon takes into account the number of pages currently being written out to the swap file. It keeps a count of these in nr_async_pages; this is incremented each time a page is queued waiting to be written out to the swap file and decremented when the write to the swap device has completed. free_pages_low and free_pages_high are set at system startup time and are related to the number of physical pages in the system. If the number of free pages in the system has fallen below free_pages_high or worse still free_pages_low, the kernel swap daemon will try three ways to reduce the number of physical pages being used by the system:


Reducing the size of the buffer and page caches,
Swapping out System V shared memory pages,
Swapping out and discarding pages.
If the number of free pages in the system has fallen below free_pages_low, the kernel swap daemon will try to free 6 pages before it next runs. Otherwise it will try to free 3 pages. Each of the above methods are tried in turn until enough pages have been freed. The kernel swap daemon remembers which method it was using the last time that it attempted to free physical pages. Each time it runs it will start trying to free pages using this last successful method.

After it has free sufficient pages, the swap daemon sleeps again until its timer expires. If the reason that the kernel swap daemon freed pages was that the number of free pages in the system had fallen below free_pages_low, it only sleeps for half its usual time. Once the number of free pages is more than free_pages_low the kernel swap daemon goes back to sleeping longer between checks.


3.8.1 Reducing the Size of the Page and Buffer Caches
The pages held in the page and buffer caches are good candidates for being freed into the free_area vector. The Page Cache, which contains pages of memory mapped files, may contain unneccessary pages that are filling up the system's memory. Likewise the Buffer Cache, which contains buffers read from or being written to physical devices, may also contain unneeded buffers. When the physical pages in the system start to run out, discarding pages from these caches is relatively easy as it requires no writing to physical devices (unlike swapping pages out of memory). Discarding these pages does not have too many harmful side effects other than making access to physical devices and memory mapped files slower. However, if the discarding of pages from these caches is done fairly, all processes will suffer equally.

Every time the Kernel swap daemon tries to shrink these caches

it examines a block of pages in the mem_map page vector to see if any can be discarded from physical memory. The size of the block of pages examined is higher if the kernel swap daemon is intensively swapping; that is if the number of free pages in the system has fallen dangerously low. The blocks of pages are examined in a cyclical manner; a different block of pages is examined each time an attempt is made to shrink the memory map. This is known as the clock algorithm as, rather like the minute hand of a clock, the whole mem_map page vector is examined a few pages at a time.

Each page being examined is checked to see if it is cached in either the page cache or the buffer cache. You should note that shared pages are not considered for discarding at this time and that a page cannot be in both caches at the same time. If the page is not in either cache then the next page in the mem_map page vector is examined.

Pages are cached in the buffer cache (or rather the buffers within the pages are cached) to make buffer allocation and deallocation more efficient. The memory map shrinking code tries to free the buffers that are contained within the page being examined.

If all the buffers are freed, then the pages that contain them are also be freed. If the examined page is in the Linux page cache, it is removed from the page cache and freed.

When enough pages have been freed on this attempt then the kernel swap daemon will wait until the next time it is periodically woken. As none of the freed pages were part of any process's virtual memory (they were cached pages), then no page tables need updating. If there were not enough cached pages discarded then the swap daemon will try to swap out some shared pages.


3.8.2 Swapping Out System V Shared Memory Pages
System V shared memory is an inter-process communication mechanism which allows two or more processes to share virtual memory in order to pass information amongst themselves. How processes share memory in this way is described in more detail in Chapter IPC-chapter. For now it is enough to say that each area of System V shared memory is described by a shmid_ds data structure. This contains a pointer to a list of vm_area_struct data structures, one for each process sharing this area of virtual memory. The vm_area_struct data structures describe where in each processes virtual memory this area of System V shared memory goes. Each vm_area_struct data structure for this System V shared memory is linked together using the vm_next_shared and vm_prev_shared pointers. Each shmid_ds data structure also contains a list of page table entries each of which describes the physical page that a shared virtual page maps to.
The kernel swap daemon also uses a clock algorithm when swapping out System V shared memory pages.

. Each time it runs it remembers which page of which shared virtual memory area it last swapped out. It does this by keeping two indices, the first is an index into the set of shmid_ds data structures, the second into the list of page table entries for this area of System V shared memory. This makes sure that it fairly victimizes the areas of System V shared memory.

As the physical page frame number for a given virtual page of System V shared memory is contained in the page tables of all of the processes sharing this area of virtual memory, the kernel swap daemon must modify all of these page tables to show that the page is no longer in memory but is now held in the swap file. For each shared page it is swapping out, the kernel swap daemon finds the page table entry in each of the sharing processes page tables (by following a pointer from each vm_area_struct data structure). If this processes page table entry for this page of System V shared memory is valid, it converts it into an invalid but swapped out page table entry and reduces this (shared) page's count of users by one. The format of a swapped out System V shared page table entry contains an index into the set of shmid_ds data structures and an index into the page table entries for this area of System V shared memory.

If the page's count is zero after the page tables of the sharing processes have all been modified, the shared page can be written out to the swap file. The page table entry in the list pointed at by the shmid_ds data structure for this area of System V shared memory is replaced by a swapped out page table entry. A swapped out page table entry is invalid but contains an index into the set of open swap files and the offset in that file where the swapped out page can be found. This information will be used when the page has to be brought back into physical memory.


3.8.3 Swapping Out and Discarding Pages
The swap daemon looks at each process in the system in turn to see if it is a good candidate for swapping.

Good candidates are processes that can be swapped (some cannot) and that have one or more pages which can be swapped or discarded from memory. Pages are swapped out of physical memory into the system's swap files only if the data in them cannot be retrieved another way.

A lot of the contents of an executable image come from the image's file and can easily be re-read from that file. For example, the executable instructions of an image will never be modified by the image and so will never be written to the swap file. These pages can simply be discarded; when they are again referenced by the process, they will be brought back into memory from the executable image.

Once the process to swap has been located, the swap daemon looks through all of its virtual memory regions looking for areas which are not shared or locked.

Linux does not swap out all of the swappable pages of the process that it has selected; instead it removes only a small number of pages.

Pages cannot be swapped or discarded if they are locked in memory.

The Linux swap algorithm uses page aging. Each page has a counter (held in the mem_map_t data structure) that gives the Kernel swap daemon some idea whether or not a page is worth swapping. Pages age when they are unused and rejuvinate on access; the swap daemon only swaps out old pages. The default action when a page is first allocated, is to give it an initial age of 3. Each time it is touched, it's age is increased by 3 to a maximum of 20. Every time the Kernel swap daemon runs it ages pages, decrementing their age by 1. These default actions can be changed and for this reason they (and other swap related information) are stored in the swap_control data structure.

If the page is old (age = 0), the swap daemon will process it further. Dirty pages are pages which can be swapped out. Linux uses an architecture specific bit in the PTE to describe pages this way (see Figure 3.2). However, not all dirty pages are necessarily written to the swap file. Every virtual memory region of a process may have its own swap operation (pointed at by the vm_ops pointer in the vm_area_struct) and that method is used. Otherwise, the swap daemon will allocate a page in the swap file and write the page out to that device.

The page's page table entry is replaced by one which is marked as invalid but which contains information about where the page is in the swap file. This is an offset into the swap file where the page is held and an indication of which swap file is being used. Whatever the swap method used, the original physical page is made free by putting it back into the free_area. Clean (or rather not dirty) pages can be discarded and put back into the free_area for re-use.

If enough of the swappable processes pages have been swapped out or discarded, the swap daemon will again sleep. The next time it wakes it will consider the next process in the system. In this way, the swap daemon nibbles away at each processes physical pages until the system is again in balance. This is much fairer than swapping out whole processes.


3.9 The Swap Cache
When swapping pages out to the swap files, Linux avoids writing pages if it does not have to. There are times when a page is both in a swap file and in physical memory. This happens when a page that was swapped out of memory was then brought back into memory when it was again accessed by a process. So long as the page in memory is not written to, the copy in the swap file remains valid.

Linux uses the swap cache to track these pages. The swap cache is a list of page table entries, one per physical page in the system. This is a page table entry for a swapped out page and describes which swap file the page is being held in together with its location in the swap file. If a swap cache entry is non-zero, it represents a page which is being held in a swap file that has not been modified. If the page is subsequently modified (by being written to), its entry is removed from the swap cache.

When Linux needs to swap a physical page out to a swap file it consults the swap cache and, if there is a valid entry for this page, it does not need to write the page out to the swap file. This is because the page in memory has not been modified since it was last read from the swap file.

The entries in the swap cache are page table entries for swapped out pages. They are marked as invalid but contain information which allow Linux to find the right swap file and the right page within that swap file.


3.10 Swapping Pages In
The dirty pages saved in the swap files may be needed again, for example when an application writes to an area of virtual memory whose contents are held in a swapped out physical page. Accessing a page of virtual memory that is not held in physical memory causes a page fault to occur. The page fault is the processor signalling the operating system that it cannot translate a virtual address into a physical one. In this case this is because the page table entry describing this page of virtual memory was marked as invalid when the page was swapped out. The processor cannot handle the virtual to physical address translation and so hands control back to the operating system describing as it does so the virtual address that faulted and the reason for the fault. The format of this information and how the processor passes control to the operating system is processor specific.
The processor specific page fault handling code must locate the vm_area_struct data structure that describes the area of virtual memory that contains the faulting virtual address. It does this by searching the vm_area_struct data structures for this process until it finds the one containing the faulting virtual address. This is very time critical code and a processes vm_area_struct data structures are so arranged as to make this search take as little time as possible.

Having carried out the appropriate processor specific actions and found that the faulting virtual address is for a valid area of virtual memory, the page fault processing becomes generic and applicable to all processors that Linux runs on.

The generic page fault handling code looks for the page table entry for the faulting virtual address. If the page table entry it finds is for a swapped out page, Linux must swap the page back into physical memory. The format of the page table entry for a swapped out page is processor specific but all processors mark these pages as invalid and put the information neccessary to locate the page within the swap file into the page table entry. Linux needs this information in order to bring the page back into physical memory.

At this point, Linux knows the faulting virtual address and has a page table entry containing information about where this page has been swapped to. The vm_area_struct data structure may contain a pointer to a routine which will swap any page of the area of virtual memory that it describes back into physical memory. This is its swapin operation. If there is a swapin operation for this area of virtual memory then Linux will use it. This is, in fact, how swapped out System V shared memory pages are handled as it requires special handling because the format of a swapped out System V shared page is a little different from that of an ordinairy swapped out page. There may not be a swapin operation, in which case Linux will assume that this is an ordinairy page that does not need to be specially handled.

It allocates a free physical page and reads the swapped out page back from the swap file. Information telling it where in the swap file (and which swap file) is taken from the the invalid page table entry.

If the access that caused the page fault was not a write access then the page is left in the swap cache and its page table entry is not marked as writable. If the page is subsequently written to, another page fault will occur and, at that point, the page is marked as dirty and its entry is removed from the swap cache. If the page is not written to and it needs to be swapped out again, Linux can avoid the write of the page to its swap file because the page is already in the swap file.

If the access that caused the page to be brought in from the swap file was a write operation, this page is removed from the swap cache and its page table entry is marked as both dirty and writable.



--------------------------------------------------------------------------------