UNIX Application Migration Guide

Article
08/30/2006

Chapter 4: Assessment and Analysis

Larry Twork, Larry Mead, Bill Howison, JD Hicks, Lew Brodnax, Jim McMicking, Raju Sakthivel, David Holder, Jon Collins, Bill Loeffler
Microsoft Corporation

October 2002

Applies to:
Microsoft® Windows®
UNIX applications

The patterns & practices team has decided to archive this content to allow us to streamline our latest content offerings on our main site and keep it focused on the newest, most relevant content. However, we will continue to make this content available because it is still of interest to some of our users.
We offer this content as-is, without warranty that it is still technically accurate as some of the material is undoubtedly outdated. Note that the content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

Summary: Chapter 4: Assessment and Analysis discusses application assessment. This is a process where the application or applications to be migrated are analyzed from both a business and technical perspective to form a migration strategy to either Win32 or Interix. (42 printed pages)

Introduction
Gathering Data
Evaluating Migration Objectives
Evaluating the Application
Defining the Migration Strategy

Introduction

The process of migrating UNIX software to Microsoft® Windows® 2000 is sometimes perceived to be a long, expensive undertaking because of incompatibilities between the two operating systems. However, application migration from UNIX to Windows is feasible, and this chapter will help guide you through the process.

To migrate an application from UNIX to Windows, you must consider the platform differences, the differences between the application's implementation on each platform, and the features of the deployment and management tools available for each operating environment. Therefore, any application migration from UNIX to Windows requires a thorough assessment of the application to be migrated, followed by an analysis to determine the most appropriate migration approach.

This chapter discusses the assessment and analysis aspects of the migration process in detail, including:

Gathering data
Evaluating the migration objectives
Evaluating the application
Defining the migration strategy

In addition, this chapter examines the specific migration strategies for various application architectures, including workstation-based applications and server-based applications.

Planning

The results of the assessment and analysis phase are used to develop the various migration plans—for example, development, testing, training, deployment and support. These results also act as input to risk assessment activities, which are a mainstay of both planning and testing. The assessment and analysis phase therefore requires planning, which includes:

Scoping the assessment, the analysis and the migration
Identifying data sources
Arranging interviews and workshops

The assessment and analysis phase follows the law of diminishing returns; that is, the value that it provides diminishes relative to effort. It is therefore important to size these activities according to the size of the migration.

Assumptions

This chapter focuses on software that is written in the C, C++ and Fortran compiled languages, in addition to scripting languages such as Perl and various UNIX shells (for example, Bourne, Korn, C and Bash). This chapter assumes that the application's UNIX-based source code is available in either an archive or source control system, such as the following:

Concurrent Version System (CVS)
Revision Control System(RCS)
Program Version Control System (PVCS)

Some migrations will result in the subsequent removal of the UNIX environment. However, many environments will need Windows to integrate (and coexist) with UNIX, and to run the same migrated application software that is currently running on the UNIX platform or platforms. The need for coexistence is an important factor in software migrations. A migration philosophy that takes into account cross-platform software portability allows Windows to be accepted in organizations that require the application to remain on the existing UNIX platform or platforms.

Time to market is just as important for internal applications as it is for independent software vendor (ISV) software. Applications often represent competitive advantages to their owners (for example, augmentations to computer-aided engineering analysis or product data management packages). A software migration philosophy must accommodate this urgency for the migration project to be successful based on time and resource constraints.

Gathering Data

As part of the plan for assessing an application for migration from UNIX to Windows, you must determine the application context, structure, development environment and usage model.

It is also important to understand the target development environment for the application migration. Examples include migration of both deployment and development environments to Windows and Microsoft Win32®; continued development on UNIX with Windows as a cross-platform migration; and porting to a Portable Operating System Interface for UNIX (POSIX) environment on Windows, such as Microsoft Interix.

You need to address these considerations, as well as the actual application source code migration and possibly the migration of associated test programs and/or scripts. The point at this stage is not to conduct an analysis of the different elements involved in the migration, but rather to understand them at a high level and therefore make the preliminary decisions on how the migration should proceed.

Application Context

The application context consists of all the elements of information technology upon which the application depends. The migration must keep existing dependencies or in some way replace or remove them.

The application may have had a long history—it may not have even begun as a UNIX application. Determining information about the application's history (for example, whether it was already ported to various UNIX environments may provide useful insights into how the application should be migrated to Windows.

UNIX platform support

There is a wide variety of UNIX platforms, each of which has a different bearing on how migration can proceed.

Standardization efforts have existed throughout the history of UNIX, ever since its first incarnation at Bell Laboratories in the 1970s. Its code base was subsequently split when an early version of UNIX was adopted and enhanced at the University of California, Berkeley, to yield the Berkeley Standard Distribution (BSD). Today, UNIX is a trademark administered by the Open Group, and refers to an operating system that conforms to the X/Open specification XPG4.2. This specification defines the name of, interfaces to and behaviors of all UNIX operating system functions.

XPG4.2 aligns with UNIX95—properly called the Single UNIX Spec (SUS) - SPEC1170, the project name that led to specification—and is largely a superset of earlier series of specifications. Approximately 80 percent of the SUS is made up of accredited International Organization for Standardization (ISO), POSIX and American National Standards Institute (ANSI) C standards.

The current X/Open specification, UNIX98, aligns with SUS V2 (XPG5). This specification added POSIX.1b-1993 real-time and POSIX.1c-1996 threads and some additional thread interfaces, along with the traditional UNIX shared object interfaces (such as dlopen and dlsym). Only a few systems currently conform to this specification.

Many UNIX-based systems are available, such as Sun's Solaris, Hewlett Packard's HP-UX, FreeBSD, Linux and the Microsoft Interix subsystem. Determining an application's standards support results in a better chance of understanding the application's portability to either Win32 or Interix. For example, if an application runs on a version of UNIX that supports the same standards as Interix, it is likely that a direct port using Interix would be more cost-effective than rewriting some of the code for the Win32 platform.

One of the first things you need to do in your assessment is determine which UNIX standards the application is based on. You can determine this either by knowing what the application claims to support (for example, through its original documentation) or by using analysis tools.

Standards organizations often provide certification brands alongside the standards. These are used as a rubber stamp by software vendors in much the same way as the "Windows Powered" or "Designed for Microsoft Windows" brands. You should therefore be aware of both the standards and the brands associated with them."

The application may support any of the following standards:

ANSI C: X3.159-1989, ISO/IEC 9899-1990
BSD 4.2-4.4 (Berkeley UNIX)
GNU
Institute of Electrical and Electronics Engineers (IEEE) 1003.1-1990 part 1, also known as ISO/IEC 9945-1:1990 (POSIX.1)
IEEE Std 1003.1b-1993, also known as ISO/IEC 9945-1:1993, POSIX1b (formerly POSIX.4)
IEEE 1003.2-1993 part 2, also known as ISO/IEC 9945-2:1993 (POSIX.2)
System V Interface Definition (SVID), SVID2, SVID3
System V4 UNIX
Version 7, the ancestral UNIX from Bell Laboratories
XPG2, XPG3
XPG4 (superset of POSIX—XPG Base branding)
XPG4v2 (UNIX95 branding)
XPG5 (UNIX98 branding)

UNIX standards and Interix

The main purpose of Interix is to allow the direct port of an application to Windows, so Interix often supports a superset of standards or behaviors. Interix mainly uses POSIX.1/.2/ISO C. When a piece of code isn't described in those specifications, Interix looks to the XPG4 specification and then to various UNIX implementations depending on the interface (for example Solaris, BSDI or Red Hat Linux).

The original core standards—POSIX.1-1990, ISO C (1990) and POSIX.2/2a-1992—continue to be, important. For example, there have been two ways to determine the pseudo-terminal name to use, one BSD-based and the other SVID-based. Interix supports both; all of the signal interfaces are supported by the POSIX.1 reliable signal model. Interix also supports many interfaces that are not part of the SUS or a POSIX standard, but are nonetheless in constant use—for example, Open Network Computing (ONC) remote procedure call (RPC) and X11 Release 5 (R5).

If the application supports XPG4, there is a high probability that the application can be ported to Interix. It is more important, however, to determine the following:

Does the application have significant dependence on UNIX application programming interfaces (APIs)?
Does Interix support the greatest majority of the application's required APIs?

Infrastructure dependencies

The infrastructure upon which the application depends includes client and server hardware, network elements and peripheral devices such as external tape drives and printers.

The migration of the application can be to a completely new infrastructure configured to support Windows. If this is the case, there may be migration issues in the application software. For example, if you are migrating from a 32-bit computer architecture to another architecture, you may face byte ordering issues. Some architectures number bytes in a binary word from left to right, which is referred to as big-endian. Other architectures number the bytes in a binary word from right to left, which is referred to as little-endian. A notable computer architecture that uses big-endian byte ordering is Sun's Sparc. Intel architecture uses little-endian byte ordering, as does the Compaq Alpha processor.

Using the big-endian and little-endian methods the number 0x12345678 would be stored as shown in Table 1.

Table 1 Big-endian and little-endian byte ordering example

Method	Byte 0	Byte 1	Byte 2	Byte 3
Big-endian	12	34	56	78
Little-endian	78	56	34	12

The following code snippet illustrates how the use of the big-endian and little-endian methods can affect the compatibility of applications.

#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>

int main()
{
  int buf; 
  int in; 
  int nread;

  in = open("file.in", O_RDONLY); 
  nread = read(in, (int *) &buf, sizeof(buf));
  printf("First Integer in file.in = %x\n", buf);
  exit(0);
}

In the preceding code, if the first integer word stored in the file.in file on a big-endian computer was the hexadecimal number 0x12345678, the resulting output on that computer would be as follows:

% ./test
First Integer in file.in = 12345678
%

If the file.in file were read by the same program running on a little-endian computer, the resulting output would be as follows:

% ./test
First Integer in file.in = 78563412
%

Because of the difference in output, you would need to rewrite the program so that it can read integers from a file based on the endian method that the computer uses.

The other computer hardware dependencies to check for are low-level hardware API calls or calls to specific devices, such as those shown in Table 2.

Table 2. Hardware API calls

Hardware dependency	Check for these calls or devices:
Memory allocation	alloca()
Cache management	cacheflush() (this is MIPS processor-specific)
Port input	inb(), inw(), inl(), insb(), insw(), insl()
Port output	outb(), outw(), outl(), outsb(), outsw(), outsl()
Paused input/output (I/O)	outb_p(), outw_p(), outl_p(), inb_p(), inw_p(), inl_p()
Input device management	joystick, keyboard, mouse
Display management	video graphics adapter (VGA)

The presence of such API calls in the application code requires that you rewrite the code elements to minimize the hardware dependencies (through the inclusion of alternate, low-level routines) or that you replace the API calls. You must then decide whether the code is to be portable between the two platforms, whether alternate code bases are to exist, or whether the code for the original platform is to be made obsolete.

Applications can have dependencies on peripheral devices as well as computer hardware. Peripherals include not only printers and storage devices, but also firewalls (for example, to set up a virtual private network tunnel to another application) and domain-specific communications devices (for example, to connect to teller computers or telemetry equipment).

For the migration to be successful, you must know what devices are in use by the original application. These devices may have their own dependencies; therefore, it is appropriate to create an inventory of all of the devices concerned, and include the following information for each device:

Device name, type and purpose
Software versions that the device runs
Mechanisms that the application uses to access the device
Dependencies on other devices and software packages
Whether the migrated application will use the device

The information that such an inventory provides will help you identify potential issues—for example, whether Windows can support each device or whether you should seek an alternative.

Third-party libraries

Most applications use externally sourced libraries of code or application software extensively, thus enabling common functionality to be reused rather than redeveloped. In particular, many package suppliers (including database suppliers) provide code libraries to enable API-level access to their own products.

Use of a library implies access to functionality that the target platform must support. There are a number of ways to access a third-party library:

A version of the library may exist for the target platform.
The library can be accessed from the original platform by means of RPC (this may be the case for databases).
Code may need to be developed to replace or render unnecessary the library access.
Code can be rewritten to support alternative services provided by the target platform.

You can determine whether the UNIX application is using third-party libraries by using the UNIX grep command on the Makefile for the application:

% grep -e "-l" -e "/lib" Makefile

Using the grep command in this way yields output that may look like the following example:

/apps/oracle/lib/libclient.a
/apps/oracle/lib/libcommon.a
/apps/oracle/lib/libgeneric.a
/apps/oracle/lib/libsqlnet.a
/apps/oracle/lib/libc3v6.a
/apps/oracle/lib/libcore3.a
/usr/lib/X11R6
/usr/lib/Motif1.2_R6
/xrt/lib/libxrtfield.a
-lXm
-lXt
-lX11

Note The output in the preceding example was generated from a real application's Makefile. It has been edited to remove application-specific references.

Some applications make use of multiple Makefiles, in which case the UNIX find command can be used in conjunction with grep to traverse the source hierarchy.

For example: % find. -name "[M | m]akefile" | xargs grep -e "-l" -e "lib.*[.].*[a|so]"

Analyzing the preceding output leads to the conclusion that the application is using the following third-party libraries:

Oracle Call Interface (OCI) libraries
X11 Release 6 (R6)
Motif 1.2 for X11R6
X11 toolkit library
X11 miscellaneous utilities library
Xrt widget library for Motif

If you have access to the binary libraries or executable files for the application, you can also run the ldd utility (or a similar utility) to obtain an inventory of the application's shared library dependencies (that is, libraries with a .so extension instead of the .a extension for static library archives).

For example, the dependencies of the test program shown in the earlier code snippet and output samples that illustrated the big-endian and little-endian methods—running on a Linux 7.1 system—yields the following output:

$ ldd test
    libc.so.6 => /lib/i686/libc.so.6 (0x40021000)
    /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

The preceding output shows that the simple test program depends on the standard C shared library major version 6 (libc.so.6).

Some libraries can be cross-platform libraries that provide a platform-independent API to functionality such as database access, graphical user interface (GUI) or networking. Such libraries encapsulate the different APIs of different operating systems, providing an application with a common set of APIs for all operating systems. Example cross-platform libraries include:

Rogue Wave's SourcePro C++
Trolltech's Qt product
INT's Carnac and GeoToolkit GUI toolkits

As with hardware dependencies, it is useful to produce an inventory of library dependencies as you find them. The inventory can document the following:

The name, type and context of the library
The purpose that the dependency serves
The impact that the dependency has on the migration
A recommendation for how to treat the dependency

Interoperability with other applications

In addition to libraries that support the operation of software, there are other applications and software packages that are either required by the application or depend on the application themselves. These applications and software packages can include database management systems, messaging and groupware applications, content management and Web server software, accounting software, resource management software and sales automation software.

Migrating one application can have a number of potential impacts on such external software, including:

How data will be shared at a low level (through files and the passing of information) and at a the user interface level (through the clipboard and the drag-and-drop feature)
How the applications will intercommunicate
- How users will access the applications; that is, through a single interface or portal or through multiple interfaces
- How security (authentication and access controls) will be managed between applications

A first step toward resolving these issues is to gain an understanding of what other programs exist and the dependencies that exist between them. Once again, the starting point is an inventory, to document each package and how it can affect the migration.

External software can have its own dependencies on libraries or hardware, which may further influence the migration. For example, a dependent software package already running on a Windows operating system may require an older version of Windows than the one intended for the migration. This can result in a decision to upgrade the external software package to a more recent version so that the two applications can reside on the same server; alternatively, two servers may be run in parallel, each running a different version of Windows. Each option offers trade-offs that are best explored though a cost-benefit analysis, which is discussed later in this chapter.

Application Structure

The next step toward the goal of migrating an application from UNIX to Windows is to gain a high-level view of the application to determine the main building blocks of the application and the application's scope (that is, the sum of the building blocks). This high-level view also helps establish the boundaries of the application.

These building blocks will enable the migration itself to be scoped, providing definitions of what parts of the application need to be migrated. After the objectives for the migration have been decided, each building block can be analyzed in more detail to determine exactly how it should be migrated.

An appropriate way to view the building blocks of an application is in terms of its layers, as described in the sections that follow.

Application types

Most commonly used applications are either workstation-based or server-based. Workstation-based applications run at the UNIX workstation (desktop) computer and access data that resides on network file shares or database servers. Workstation-based applications have the following architectural characteristics:

They can be single-process (monolithic) applications or multiple-process applications.
They use character-based user interfaces or GUI-based (for example, X Windows or OpenGL) user interfaces.
They access a file server (through the network file system [NFS]) or a database server for data resources.
They access a compute server for compute-intensive services (for example, finite element models for structural analysis).

The second application type—server-based applications—run on a server and are accessed from a UNIX workstation (desktop) client through telnet and remote shells (character-based applications), X Windows server (GUI-based applications), interprocess communication (IPC) (client-server applications) and other means. Server-based applications can be described as one of the following UNIX software architectures:

Compute servers that use a messaging mechanism, such as RPC or sockets, for IPC (between the server and the client computers) for the purpose of method calls and to obtain result sets from those method calls. Compute servers also typically use the Message Passing Interface (MPI) for load balancing
Database servers that provide interfaces (such as Open Database Connectivity [ODBC], OLE DB and OCI) to clients for access to database tables, views, stored procedures, and triggers
Web servers that contain Java Server Page (JSP) and Common Gateway Interface (CGI) access programs for generating dynamic Hypertext Markup Language (HTML) content from servers such as database servers and file servers

Later sections in this chapter discuss application migration methodology for UNIX workstation and server application architectures, together with the migration of those architectures to a Windows workstation and/or server environment.

User interfaces

The type of user interface has a clear bearing on the application migration. The following sections discuss the three most common types of user interfaces for a UNIX application: character based, graphical and browser based.

Character-based interfaces

The traditional UNIX user interface incorporates a character-oriented, scrolling terminal. UNIX was created before GUIs were available. A user types commands in the UNIX user interface, and UNIX responds with ASCII character output. UNIX was developed as a multiple-user, timesharing operating system, which assumed that users would sit at so-called "dumb" terminals and type commands on the keyboard.

Applications often use both cursor movement and the graphics capabilities found in modern terminals or emulators. To support this, UNIX has a database of terminal capabilities known as termcap. Some newer implementations of this use binary data and are known as terminfo. These libraries allow application writers to query the database for specific cursor movement commands, so applications can operate with a variety of hardware without embedding specific terminal commands into the application.

Another application development package specifically designed to alleviate the problem of terminal dependence is the curses library originally written at the University of California, Berkeley. It is a set of functions for manipulating terminal input and output (mostly output). These functions perform such actions as clearing the screen, moving the cursor to a specific row and column and writing a character or string to the screen. There are also input functions to retrieve user input in various modes, such as reading the input one character at a time or as a character string. Curses and similar libraries enable companies to create highly interactive, character-based applications, such as text editors.

Character-based interfaces can be ported directly into an Interix environment, or they can be rewritten to support the Windows to support Windows user interface standards. You should choose the latter option when user interface standardization is a main reason for the migration.

Graphical interfaces

Graphical interfaces in UNIX are typically based on the X Windows standard. X Windows (typically called X) is a combination of several elements: the X Protocol, X Display Server, X Clients, low-level APIs called Xlib (libX11.a) and higher-level libraries. The X Windows standard was developed at MIT to create a platform-independent, network-based, graphical user environment. X Windows separates the server part that draws the graphical interface (X Display) from the client—the application program that uses X Windows. The server and client can run on separate computers, so the application can run on a powerful compute server, while the X Display server runs at a workstation, listening for network connections at a specific port and acting on the commands sent by the X Clients.

Because X Windows is a set of toolkits and libraries, it has no look and feel like Windows or Mac OS. Motif is the windowing system, library and user interface style built on the X Windows system. Motif handles windows and a set of user interface controls known as widgets. Widgets cover the whole range of user interface elements, including buttons, scroll bars and menus.

Even X Windows with Motif is sometimes not enough to handle the requirements of the application to provide a user interface rich in widget content. Additional widget libraries have therefore been developed. One example is the xrt widget library, which makes use of the Motif libraries. Other libraries that you may encounter include Trolltech's Qt product and INT's Carnac and GeoToolkit GUI toolkits.

Graphics-intensive applications may support additional user interface standards, such as OpenGL. OpenGL has become a widely used and supported two-dimensional and three-dimensional graphics API. OpenGL fosters innovation and speeds application development by incorporating a broad set of rendering, texture mapping, special effects and other powerful visualization functions. Developers can take advantage of the capabilities of OpenGL across all popular desktop and workstation platforms, ensuring wide application deployment.

OpenGL runs on every major operating system, including Mac OS, OS/2, UNIX, Microsoft Windows 95, Microsoft Windows 98, Windows 2000, Microsoft Windows NT®, Linux, OPENStep and BeOS; it also works with every major windowing system, including Win32, Mac OS, Presentation Manager and X-Windows. OpenGL includes a full complement of graphics functions. The OpenGL standard has language bindings for C, C++, Fortran, Ada and Java, so applications that use OpenGL functions are typically portable across a wide array of platforms.

Because Interix supports X Windows and an array of platforms support OpenGL, you may be able to move graphical UNIX applications to Interix with little or no modification. However, older applications may use an older version of the windowing system, or may use older widget libraries that have no direct mapping in Interix. Also, as with character-based interfaces, you may prefer the option of rewriting the interface to provide a standard look and feel across Windows-based applications.

Browser-based interfaces

The advent of the Web has resulted in a number of applications relying on Web browsers for their user interfaces. Such interfaces can usually be ported with a minimum of effort because of the platform independence of nearly all the standards used by Web browsers and Web servers.

Because Web-based interfaces require little or no migration effort, this guide does not cover them.

Application code

The application code provides a first step toward analyzing the application structure. Most UNIX code is written in either C or C++. However, Fortran is also a language to consider. Fortran code is common in compute-intensive scientific and engineering applications. A Fortran application can be a stand-alone application, a routine called from other languages, or it may call routines in other languages. Fortran is commonly used in loosely coupled distributed grid computing. Distributed grid computing typically requires integration with dynamic scheduling and high-performance message passing, such as the MPI.

A Fortran application requires additional planning. The Fortran considerations for using either the Interix POSIX subsystem or UNIX emulator are the same as for C and C++ migrations. Note, however, that GNU Fortran 77 compiler, f77, is currently the only Fortran compiler available for Interix.

Low-level services

Any UNIX application that uses primarily C and C++ run-time calls may compile with modification on the Win32 API. If, however, the application makes system calls for low-level services such as process creation, shared memory, semaphores, signals, message queues and third-party library calls (library not available on Win32), these calls will not compile or link with Win32. You should consider a rewrite when only a few of these calls are made; however, a port to the Interix subsystem is desirable for applications that make large numbers of UNIX system calls, especially when those calls are made throughout the code and not limited to easily identifiable modules.

At this stage, it is sufficient to understand the presence and use of low-level services in the application, such that you can give those services a thorough analysis at the evaluation stage. Use the information in Appendix x and a text editor or text search tool (such as grep) to perform an initial analysis of the code for UNIX API dependency. You can then identify likely areas of the code likely to contain UNIX-specific services—for example, in device-specific or process management code.

Installation, configuration and execution scripts

UNIX applications usually have a complement of scripts that are used in the configuration, build, installation, setup and deployment environments to support the application. These scripts are written in a variety of languages. Common scripting languages include Perl and Shell (Bourne, Korn, C and Bash) scripts. The scripts take advantage of a number of UNIX utility features, along with features of the application itself.

The functions performed by UNIX scripts and utilities include configuration, setting up the user's desktop environment, scheduling (cron), monitoring (for example, Simple Network Management Protocol [SNMP]), parsing or searching (grep, sed, awk) and administration (for example, distribution of updated application binaries). Migrating an application from UNIX to Windows requires a combination of scripting languages to create similar script syntax and utilities to perform these functions.

To choose the correct combination of scripting language, address the following questions:

Are UNIX shell scripts used for configuration of the application environment? If so, further determine the following:
Which shell scripts are used for configuration (C, Korn, Bourne, Bash)?
Are any core UNIX utilities used for tasks such as installation and setup? (Core utilities are those found in the /bin directory.)
Which file systems do the shell scripts access, and do those file systems use or expect a UNIX-style, single file system root? For further information on UNIX file systems, see Chapter 8.
Is any current use made of cross-platform scripting, such as Perl, NuTCRACKER or Interix?
How is an application started and what parameters are needed? Must the superuser start the application?
Is a daemon used? In UNIX, you can start an application in the background and log off, and the application continues to run. This can't be done with Windows 2000 unless the application is a service.
How is the application closed? It is common in UNIX to have an application that receives a message to close, and is the application then responsible for closing all the other applications?
Does this application interface with any other applications, and if so, which ones?
Which shells does this application require or rely on?
Do the migrated applications require standard UNIX utilities (such as cd or ls) at run time?

Migrated applications often require specific UNIX-style components in their deployment environments. Because of this requirement, the UNIX emulated environment (that is, NuTCRACKER or Cygwin) or UNIX native environment (that is, Interix) must provide a full set of UNIX utilities and shell environments, such as Korn and C. Also because of this requirement, the scripts must be migrated (rewritten or ported) to the Windows 2000 environment.

Development Environment

The development environment not only provides valuable input to the migration strategy, it also must be migrated itself. You need to understand the development environment for the application so that you can provide a basis for the application's continued development, support and maintenance. The sections that follow discuss the tools that the development environment may include.

Modeling tools

Modeling tools include such facilities as graphical tools in support of a given methodology, such as Structured Systems Analysis and Design Methodology (SSADM), Rational Unified Process (RUP) or Unified Modeling Language (UML). Although such tools are normally independent of the applications, facilities such as code generators and code synchronizers enable models to be kept up to date with the code.

The migration of modeling tools forms an important part of your application migration from UNIX to Windows. You therefore must include the migration of these tools in your planning process by determining the answers to the following questions:

Is a specific methodology being adhered to?
Are graphical modeling tools being used?
Is there an ongoing requirement to keep models up to date with the code?

Many modeling tools are available across different platforms, including Windows. Therefore, the migration of these tools should not significantly impact the migration of the application. Because of this, and because that there is such a wide range of tools on the market, this guide does not cover the migration of modeling tools in detail.

Build tools

As part of your application data-gathering efforts, you must know which build tools and configuration scripts are used in the application's build environment.

The most common build tool for UNIX applications is the Make utility, which works with the Makefile configuration file to automatically determine which pieces of a large program need to be recompiled. The Make utility also issues the gcc command to recompile those pieces. Every time a source file changes, entering make at a shell prompt causes all the necessary recompilations to be performed. The Make utility uses the last-modification times of the files to decide which of the files must be updated. For each of those files, the Make utility issues the commands recorded in the Makefile.

There are two main implementations of Make: BSD's Make and GNU's Make, which is referred to as gmake. Make can be used with any programming language that has a compiler which runs with a shell command. In fact, Make is not limited to programs. It can be used to describe any task where some files must be updated automatically from others whenever the others change.

To use Make, you must create (or generate—for example, through a Configure script, as described later in this section) the Makefile, which describes the relationships among files in the program and states the commands for updating each file. For a program, the executable file is typically updated from object files, which are in turn made by compiling the source files.

The Make utility carries out commands in the Makefile to update one or more target names, where the name is typically a program. If no -f option is present, Make looks for a file: GNUmakefile (if gmake), makefile and Makefile, in that order. The first name checked, GNUmakefile, is not normally used. It is used only if a Makefile is specific to gmake, and will not be understood by other versions of make. The Make utility updates a target if the target depends on prerequisite files that have been modified since the target was last modified, or if the target does not exist.

Imake is a C preprocessor interface to the Make utility. It is used to generate Makefiles from a template, a set of GNU C-Compatible Compiler Preprocessor (cpp) macro functions and a per-directory input file called an Imakefile. This interface allows computer dependencies (such as compiler options, alternate command names and special make rules) to be kept separate from the descriptions of the various items to be built.

The xmkmf command is the usual way to create a Makefile from an Imakefile shipped with the application software. When invoked with no arguments in a directory that contains an Imakefile, the Imake program is run with arguments appropriate for the system (configured into xmkmf when X Windows was first installed) and generates a Makefile.

When invoked with the -a option, xmkmf builds the Makefile in the current directory and then automatically executes make Makefiles (in case there are subdirectories), Make includes and Make depend. This is the usual way to configure software that is outside of the X Windows build tree.

You can use programs such as autoconf to generate the configuration scripts automatically, or you can use the Make utility's CFLAGS variable to convey the configuration information. Build and timestamp information is captured in a Makefile for input to the UNIX Make program.

Many applications' software packages come with a Configure script of some kind. The Configure script attempts to deduce features of the operating system and set a series of compile-time macros, which will turn on and off code appropriate to the operating system.

Configure is a script toolkit that attempts to determine what kind of system it is running on and then correctly build a Makefile, which in turn is used by make to build the application. The Configure script uses a number of tests to check for the presence or absence of a specified feature, but many of those tests involve small programs that are then run through the C preprocessor or compiled. The feature tests are generally based on the output of the C preprocessor or driver rather than on execution of the test program. Configure works well when automating and running the same build process repeatedly; however, you should allow time for such builds to be configured. (Refer to Chapter 7 for more information.)

Note that this chapter describes only the most common Configure scripts—those generated through the GNU program autoconfig 2.53. Every package has different needs, and this section does not cover all of the Configure scripts that can be used on the many packages available.

Compilers

UNIX compilers process input files through multiple stages: preprocessing, compilation, assembly and linking. Suffixes for source file names identify the source language, but the name used for each compiler governs the default assumptions. For example, gcc assumes that preprocessed (.i) files are written in C and therefore assumes C style linking; g++ assumes that preprocessed (.i) files are written in C++ and therefore assumes C++ style linking.

A large number of options can be provided to control the processing. For example, options can switch debugging on and off, refer to external libraries and specify computer-dependent behavior.

A compiler can be used to gather some information about the application's ANSI C compliance. For example, as part of Quick Port migration described later in this chapter, the build Makefile can be modified to include gcc language options that control the dialect of C that the compiler accepts. For example, the -ansi option supports all ANSI C programs and turns off certain features of GNU C that are incompatible with ANSI C. The -pedantic option issues all the warnings demanded by ANSI C and rejects all programs that use forbidden extensions. Valid ANSI C programs should compile properly with or without the -pedantic option (though a rare few will require -ansi). However, without the -pedantic option, certain GNU extensions and traditional C features are supported in addition to the ANSI features.

Integrated development environments

Integrated development environments (IDEs) typically provide all development tools needed for programming, including compilers, linkers and project/configuration files that generate complete applications, create new classes and integrate those classes into the current project. IDEs also include file management for sources, headers, documentation and other material to be included in the project. IDEs can also include WYSIWYG (what you see is what you get)—the creation of user interfaces that have built-in dialog box and resource editors. The debugging of the application, and the inclusion of any other program needed for development by adding it to a Tools menu, are also typical capabilities.

Microsoft Visual Studio® is an IDE that includes all of the functions and capabilities just described, including a complete set of development tools for building reusable applications in Microsoft Visual Basic®, Microsoft Visual C++®, Microsoft Visual J++® and Microsoft Visual FoxPro®.

The Visual Studio development system also includes Microsoft Visual InterDev™, which provides easy integration with the Internet and a full Web page authoring environment; Microsoft Visual SourceSafe®, which provides source control; MSDN® Library, which provides development information, API references, code samples, references and other information; and Installer, which helps developers create highly reliable, self-repairing applications.

Visual Studio and the individual tools and languages that it contains are the foundation for building Windows-based components and applications, creating scripts, developing Web sites and applications and managing source code.

Visual Studio allows the developer to:

Write less code by providing programming wizards, drag-and-drop editing and reuse of program components from any of the Visual Studio languages.
Write code more quickly by minimizing errors with syntax and programming assistance within the editor.
Integrate dynamic HTML, script and components into Web solutions.
Manage Web sites from testing to production by means of integrated site management tools.
Create and debug Active Server Pages (ASP).
Use design-time controls to visual assemble data driven Web applications.

Visual Studio also includes the Windows 2000 Developer's Readiness Kit, which contains developer training and technical resources.

Testing, debugging and code optimization tools

After the build process has created the application's executables, the next step should be a test process. The test process can be partially or wholly automated through tools or scripts. In most cases, you need to duplicate the test process—along with its associated tools or scripts—on Windows.

Test process

When creating a test process for the application, consider the following:

Does the application have documented test procedures?
Does the application have its own test programs or test scripts?
If the application has test programs or scripts, how are they implemented? For example, are they implemented through C programs or scripts, Perl or Korn shell scripts?
Is the application instrumented with compiler preprocessor symbols?

It is a good idea to test the current UNIX versions of the application so that you can validate the process and verify that any testing programs and scripts are up to date. This may help you identify a latent bug before you find it on Windows.

Tools and/or scripts

If the application comes with its own test programs or scripts, consider migrating them along with the application. The test programs and scripts should and can go through the same portability analysis as the application programs.

Debuggers

What has been used for debugging the application on the UNIX platforms? For example, the GNU debugger, gdb, is available on almost every platform.

Packaging and archiving tools

It is important to understand the different archiving, packaging and compression facilities that are typically used for UNIX applications, because it is likely that the code will be delivered in one of these forms and will then have to be imported into the Windows environment.

The following are the archiving and packaging facilities typically used for UNIX applications:

tar. The tape archive program, which uses a variety of formats and has nothing to do with tape. It is still one of the most popular formats.
cpio. Copies files into or out of a cpio or tar archive. It was intended to replace tar. The cpio archives can also be unpacked by pax.
pax. The POSIX.2 standard archiver. It reads, writes and lists the members of an archive file (including tar-format archives), and also copies directory hierarchies.
rpm. The Red Hat Package Manager.
ar. Creates and maintains groups of files combined into an archive. It is not typically used for source archives, but is almost always used for object file archives (static object libraries).

The following are the compression formats typically used for UNIX applications:

compress. Creates a compressed file by using adaptive Lempel-Ziv coding to reduce the size of files (typically 70 percent smaller than the original file).
zip/gzip. Algorithms combine a version of Lempel-Ziv coding (LZ77) with another version of Huffman coding in what is often called string compression.
pack. Compresses files by using Huffman coding.
uncompress, zcat. Extracts compressed files.
gunzip. Decompresses files created through compress, zip, gzip or pack. The detection of the input format is automatic.
unpack, pcat. Restores files compressed by pack.

Table 3 summarizes common suffixes for archived and compressed file names.

Table 3. Archived/compressed file suffixes

Suffix	Format	Description
.a	ar	Created by and extracted with ar
.cpio	cpio	Created by and extracted with cpio
.gz	gzip	Created by gzip and extracted with gunzip
.tar	tar	Created by and extracted with tar
.Z	compressed	Compressed by compress Uncompressed with uncompress, zcat or gunzip
.z	pack or gzip	Compressed by pack and extracted with pcat Compressed by gzip and extracted with gunzip
.zip	Zip	Compressed by zip and extracted with unzip, or Compressed by pkzip and extracted with pkunzip

Source code management tools

Source code management tools group a number of facilities, such as source code control, build management and code archiving. A number of source control systems are used in UNIX and cross-platform environments, including Revision Control System (RCS), Source Code Control System (SCCS), Concurrent Version System (CVS) and Program Version Control System (PVCS) Dimensions.

When reviewing the application to be migrated, consider the following:

Is the code currently stored in a code management system?
Does the code management system operate in both the UNIX and Windows environments?

The native Windows source control system is Visual SourceSafe, which supports the Source Code Control Interface (SCCI) API.

Application Usage

It is important to determine the how and when the application is used, who uses it and how often it is used so that you can mitigate or deal with usage-related issues as part of the migration. The best way to determine usage is by speaking with the users and operators. You can use a combination of interviews and workshops for this. It is also helpful to gather additional materials, such as user documentation and training guides.

User-facing functionality

When analyzing application features and facilities, you should determine which are used most frequently (and are thus critical to the migration) and which are used less frequently or are not used at all.

Administrator functionality and operational management

Your analysis of the application's administrative or operational feasibilities and usage should include an examination of data management, backup/restore, security management, application configuration, startup/shutdown, failure operation and contingency planning/disaster recovery.

Service level definitions

You can define application service levels in terms of availability, scalability, performance, downtime constraints, planned outages and similar factors. To complete your analysis, you should determine the application's scalability and performance requirements.

Evaluating Migration Objectives

The output from data gathering should be an inventory of all elements affected by the migration, together with their interdependencies and any issues that you have uncovered. In the light of these issues, you should revisit the objectives of the migration to determine their viability and testability.

Business Objectives

You must determine how the application will meet business needs. Business needs vary, but can include the following examples:

The appearance and functionality of the application's UNIX user interface must be retained after the migration.
A UNIX-style development environment must be retained after the migration.
The application must run on multiple platforms, so coexistence requirements must be examined.
Common code must be used across multiple platforms.
The application's development is static.
The application will be replaced, so minimal changes are required.
The development environment will be migrated to Visual Studio.

Technical Objectives

You must establish criteria to ensure that the migration is successful. Consider the following when determining your criteria:

Can the integration of Windows-based applications be accomplished through Interix?
Is there significant dependence on UNIX APIs?
Is there good Interix support for required APIs?
Is there reliance on third-party libraries?
Are the third-party libraries available with Interix?
Are the third-party libraries available with Win32?
Does the application conform to ANSI C/C++?

Migration Objectives

You must establish criteria to judge that the resulting application delivers value. These criteria can include the following:

What is the ultimate intended use of the software? For example, is one of the objectives to improve usability and worker productivity by integrating the application with other Windows-based personal and workgroup productivity applications, such as Microsoft Word or Microsoft Excel?
Will the application behave as a native application in each unique target environment, particularly with regard to software installation, administration, distribution and user interface?
Is the migration time critical?
Is the required user interface based on the Windows GUI?

Evaluating the Application

Now that data gathering is complete, you can begin to thoroughly analyze the application. You should examine each building block of the application to determine any issues that may arise in porting the code to the new platform.

The application blocks include the following:

User interface
File and device I/O
File and process security
Processes and threads
Interprocess communication
Signals

The results of this examination will provide you with insight into any migration issues that may exist, and therefore the most appropriate approach to migrating the application. If you find a substantial number of issues, you should consider a rewrite of the application. More likely, however, you will need to choose between porting the application to the Interix environment or making it work under Win32.

In the analysis, you should look for use of UNIX-specific code. UNIX applications can contain millions of lines of custom code. The effort to rewrite an application generally increases as the amount of code increases, and using porting tools becomes more viable as code sizes increase. The major issues are normally with code that the application uses to communicate with the UNIX operating system through system calls. Solaris, HP-UX, Advanced Interactive Executive (AIX), Linux, FreeBSD and other UNIX brands all have some unique architectural features, APIs, commands and utilities.

UNIX-specific code will use either UNIX standard conventions (for example, the file hierarchy) or function calls that are specific to the source UNIX environment. You should log each occurrence of a UNIX-specific code element, because it will influence the decision on how to migrate the application.

In addition, you should consider whether the application code has been written in a hardware-independent manner. Examine the word size of the basic data types (for example, 64-bit versus 32-bit pointers), byte ordering (big-endian versus little-endian) and data alignment in structures. To facilitate portability, all hardware dependencies must be isolated and conditionally compiled and linked for the target environment build process. Or, all hardware dependencies must be rewritten to use hardware-independent constructs. UNIX applications that are designed around modular and portable coding methodologies have taken these issues into consideration.

You should determine whether the application contains custom device drivers. For example, custom device drivers are very common in process control applications. These device drivers are not portable and must generally be rewritten for the Windows 2000 platform.

User Interface

In your evaluation of the application, you should review the user interface to determine how it is built (that is, what libraries it uses) and what standards (if any) it uses.

X Windows/Motif

You can determine whether the UNIX application is using X Windows, Motif or xrt libraries by looking at the make program's Makefile and the output of the application's build. For example, you can use grep and ldd, as described earlier in this chapter.

X Windows libraries include:

X11 toolkit library (libXt.a)
X11 intrinsics library (libXi.a)
Athena widget library (libXaw.a)
X11 extensions library (libXext.a)
X Windows Display Manager (XDM) control protocol library (libXdmcp.a)
Xauthority routines library (libXau.a)
Miscellaneous utilities library (libXmu.a)

When a UNIX application makes calls to these libraries, it is linked to one or more of the following: X11, Xau, Xaw, Xi, Xmu, Xt and Xtst. These libraries contain several hundred API calls and do not map easily to the Win32 user interface API. To successfully port a user interface that uses this API, you must ensure that these libraries are available on the Windows platform (for example, by using Interix's X11R5 library).

When a UNIX application makes calls to the Motif API, it is linked to either or both of the following libraries: xm, mrm. These libraries also contain several hundred API calls that do not map easily to the Win32 GUI API. To port the user interface that makes use of this API, you must ensure that the Motif libraries are available on the Windows platform. Note that Motif libraries are built on top of X11R5 or R6 libraries and therefore require those as well. Additionally, the Motif Window Manager, mwm, is required to perform window management functions.

OpenGL

OpenGL is an API that allows an application to manipulate three-dimensional graphics on the screen, and it is available on both UNIX and Windows. On UNIX, OpenGL is often mixed with X Windows and Motif code to display buttons, menus and dialog boxes. When this is the case, the X Windows and Motif code guide the migration choice between an application port or a rewrite.

Character-mode interfaces

A character-mode user interface application writes to the console one line at a time (for example, by using the C printf() library call), and data input is requested through the use of prompts. This code can easily be migrated to Win32, usually by just recompiling the code.

When a UNIX application makes calls to the curses API, it is linked to a library called Curses or nCurses. These libraries contain several hundred API calls that do not map easily to the Win32 user interface API. To successfully port a user interface that makes use of these APIs, you must ensure that the libraries are available on the Win32 platform. This library makes use of the terminfo technology that is available on UNIX and not on standard Windows 2000. Use of the Interix curses or ncurses library makes a port of this user interface relatively easy.

File and Device I/O

The UNIX approach to file access differs from that of Windows. UNIX has a single file hierarchy based on the root file system (as indicated by a slash mark), whereas Windows uses a separate letter for each file system (for example, A and B for floppy disks, C through Z for hard disks). UNIX determines the location of the root file system only at boot time; the other file systems are added to the directory tree by mounting them (for example, with the entry: mount /dev/fd0 /mnt/floppy) by means of entries in the table /etc/fstab (or possibly /etc/vfstab or /etc/mnttab).

At the lowest level of the UNIX file system, files are referred to by numbers—that is, inodes. Inodes are indices to the Inode table, a part of the file system reserved for describing files (somewhat similar to the role of the file allocation table [FAT] in the file system included with the Microsoft MS-DOS® operating system). The directory system enables a file to be referred to by a name. The relationship between a file name and an inode is called a link.

There are two types of links: hard links and symbolic links. A hard link (sometimes referred to as a traditional link) links a filename to an inode and also enables a single file to have multiple names (that is, links). A symbolic link is a file whose contents are the name of another file. In a hard link, each of the file names has the same relationship to the inode; in a symbolic link, the symbolic link name refers to the true name and directory location of the file. If you delete one of several files (including the original) that are cross-referenced by hard links, the other file names will continue to work, but if you delete a file that is referenced by a symbolic link, then the symbolic link will point to nothing.

The network file system is a method of sharing file systems across networks. NFS has some similarities to, but is quite different from, the server message block (SMB) file system used on MS-DOS and Windows. With NFS, a UNIX computer can mount file systems connected to a different computer on the network—for example, mount hostname:/exporteddir1 /mnt. From the perspective of the local computer, /mnt is now just another portion of the single file hierarchy.

Devices are also treated as files from a UNIX application perspective.

The following are some questions that you need to ask to obtain information about the application from the perspective of file and device I/O:

Is there a reliance on absolute path names?
Are hard links and/or symbolic links used? Example calls to look for include readlink(), which reads the contents of a symbolic link, and symlink(), which creates a symbolic link to a file.
Are there any NFS file system dependencies?
Are file and device function calls used and required, especially the use of calls that are neither ANSI C/C++ nor POSIX compliant? The call to look for is chsize(), which changes the end of file on an open file.
Is non-blocking file I/O (that is, asynchronous I/O) used and required?
Are there any file-locking and/or record-locking requirements?
Are memory mapped files being used? Example calls to look for are mmap(), which maps a file into memory, and munmap(), which removes mappings for files in memory.

Interprocess Communication

As discussed in Chapter 2, UNIX introduced a philosophy of computing with features such as pipes, which provide the ability to link the output of one program to the input of another. Pipes are just one example of the ability to transfer data between processes. Various UNIX system implementations offer many other forms of interprocess communication, as explained in the following subsections.

Process pipes

Process pipes are found in all versions of UNIX and are also supported by Interix. They transfer data in one direction only. In general, the output of one process is piped (attached) to the input of another. Process pipes require ancestry between the processes (for example, a parent/child relationship).