Peter Mills - Abstracts of Publications


A Design Methodology for Data-Parallel Applications

L. Nyland, J. Prins, A. Goldberg, and P. Mills.

Dept. of Computer Science, Univ. of North Carolina at Chapel Hill
Kestrel Institute, Palo Alto, CA
Orielle LLC, Raleigh, NC

April 2000

ABSTRACT

A methodology for the design and development of data parallel applications and components is presented. Data-parallelism is a well-understood form of parallel computation, yet developing simple applications can involve substantial efforts to express the problem in low-level notations. We describe a process of software development for data-parallel applications starting from high-level specifications, generating repeated refinements of designs to match different architectural models and performance constraints, enabling a development activity with cost-benefit analysis. Primary issues are algorithm choice, correctness, and efficiency, followed by data decomposition, load balancing, and message-passing coordination. Development of a data-parallel multitarget tracking application is used as a case study, showing the progression from high to low-level refinements. We conclude by describing tool support for the process.

[Up] Back to publication list


A Refinement Methodology for Developing Data-Parallel Applications

L. Nyland, J. Prins, A. Goldberg, P.H Mills, J. Reif and R. Wagner.

Dept. of Computer Science, Duke University, Durham, NC
Dept. of Computer Science, Univ. of North Carolina at Chapel Hill
Kestrel Institute, Palo Alto, CA

February 1996

ABSTRACT

Data-parallelism is a relatively well-understood form of parallel computation, yet developing simple applications can involve substantial efforts to express the problem in low-level data-parallel notations. We describe a process of software development for data-parallel applications starting from high-level specifications, generating repeated refinements of designs to match different architectural models and performance constraints, supporting a development activity with cost-benefit analysis. Primary issues are algorithm choice, correctness and efficiency, followed by data decomposition, load balancing and message-passing coordination. Development of a data-parallel multitarget tracking application is used as a case study, showing the progression from high to low-level refinements. We conclude by describing tool support for the process.

[Up] Back to publication list


Models and Resource Metrics for Parallel and Distributed Computation

Zhiyong Li, Peter H. Mills, John H. Reif.

Dept. of Computer Science, Duke University, Durham, NC

January 1996

ABSTRACT

This paper presents a framework of using resource metrics to characterize the various models of parallel computation. Our framework reflects the approach of recent models to abstract architectural details into several generic parameters, which we call resource metrics. We examine the different resource metrics chosen by different parallel models, categorizing the models into four classes: the basic synchronous models, and extensions of the basic models which more accurately reflect practical machines by incorporating notions of asynchrony, communication cost and memory hierarchy. We then present a new parallel computation model, the LogP-HMM model, as an illustration of design principles based on the framework of resource metrics. The LogP-HMM model extends an existing parameterized network model (LogP) with a sequential hierarchical memory model (HMM) characterizing each processor. The result captures both network communication costs and the effects of multileveled memory such as local cache and I/O. More generally, the LogP-HMM is representative of a class of models formed by combining a network model with any of several existing hierarchical memory models. Along these lines we introduce a variant of the LogP-HMM model, the LogP-UMH, which combines the LogP with the Universal Memory Hierarchy (UMH) model. We examine the potential utility of both our models in the design of several near optimal FFT and sorting algorithms. We also examine the potential of the LogP-UMH to more accurately reflect parallel machines by matching the model to the CM-5 and IBM SP2.

[Up] Back to publication list


Specification and Development of Parallel Algorithms with the Proteus System

Allen Goldberg, Peter H. Mills, Lars S. Nyland, Jan F. Prins, John H. Reif, and James Riely.

Dept. of Computer Science, Duke University, Durham, NC
Dept. of Computer Science, Univ. of North Carolina at Chapel Hill
Kestrel Institute, Palo Alto, CA

May 1994

ABSTRACT

The Proteus language is a wide-spectrum parallel programming notation that supports the expression of both high-level architecture-independent specifications and lower-level architecture-specific implementations. A methodology based on successive refinement and interactive experimentation supports the development of parallel algorithms from specification to various efficient architecture-dependent implementations. The Proteus system combines the language and tools supporting this methodology. This paper presents a brief overview of the Proteus system and describes its use in the exploration and development of several non-trivial algorithms, including the fast multipole algorithm for N-body computations.

[Up] Back to publication list


Software Issues in High-Performance Computing and a Framework for the Development of HPC Applications

Peter H. Mills, Lars S. Nyland, Jan F. Prins, John H. Reif.

Dept. of Computer Science, Duke University, Durham, NC
Dept. of Computer Science, Univ. of North Carolina at Chapel Hill

May 1994

ABSTRACT

We identify the following key problems faced by HPC software: (1) the large gap between HPC design and implementation models in application development, (2) achieving high performance for a single application on different HPC platforms, and (3) accommodating constant changes in both problem specification and target architecture as computational methods and architectures evolve.

To attack these problems, we suggest an application development methodology in which high-level architecture-independent specifications are elaborated, through an iterative refinement process which introduces architectural detail, into a form which can be translated to efficient low-level architecture-specific programming notations. A tree-structured development process permits multiple architectures to be targeted with implementation strategies appropriate to each architecture, and also provides a systematic means to accommodate changes in specification and target architecture.

We describe the Proteus system, an application development system based on a wide-spectrum programming notation coupled with a notion of program refinement. This system supports the above development methodology via: (1) the construction of the specification and the successive designs in a uniform notation, which can be interpreted to provide early feedback on functionality and performance, (2) migration of the design towards specific architectures using formal methods of program refinement, (3) techniques for performance assessment in which the computational model varies with the level of refinement, and (4) the automatic translation of suitably refined programs to low-level parallel virtual machine codes for efficient execution.

[Up] Back to publication list


The Proteus System for the Development of Parallel Applications

Allen Goldberg, Jan Prins, John Reif, Rickard Faith, Zyiyong Li, Peter Mills, Lars Nyland, Dan Palmer, John Riely, and Stephen Westfold.

Dept. of Computer Science, Duke University, Durham, NC
Dept. of Computer Science, Univ. of North Carolina at Chapel Hill
Kestrel Institute, Palo Alto, CA

February 1996

ABSTRACT

In recent years technological advances have made the construction of large-scale parallel computers economically attractive. These machines have the potential to provide fast solutions to computationally demanding problems that arise in computational science, real-time control, computer simulation, large database manipulation, and other areas. However, applications that exploit this performance potential have been slow to appear; such applications have proved exceptionally difficult to develop and, once written, too often fail to deliver the expected speed-up.

This state of affairs may be explained by the proliferation of parallel architectures and the simultaneous lack of effective high-level architecture-independent programming languages. Parallel applications are currently developed using low-level parallel programming notations that reflect specific features of the target architecture (e.g., shared vs. distributed memory, SIMD vs. MIMD, exposed vs. hidden interconnection network). These notations have significant drawbacks: they lack portability across architectures, and they are too low-level to support the expression and exploration of complex designs. The problem is a fundamental one: abstract models of parallel computation lead to unrealistic algorithms, whereas machine-specific models lead to intractable analysis of even the simplest programs. The effect of the target architecture is pervasive: at a high level, different architectures often require fundamentally different algorithms to achieve optimal performance; at a low level, overall performance exhibits great sensitivity to changes in communication topology and memory hierarchy.

The result is that the developer of a parallel application must explore a complex and poorly-understood design space that contains significant architecture-dependent trade-offs. Moreover, this algorithmic exploration must be conducted using programming languages that offer exceptionally low levels of abstraction in their mechanisms for expressing concurrency. While a reasonable solution to these problems is to trade reduced access to architecture-specific performance for improved abstract models of computation, this trade may not always be the right one: the whole point of parallelism, for most applications, is performance. The goal of our effort is to provide improved capabilities for exploring the design space of a parallel application using prototypes, and evolving a prototype into a highly-specialized and efficient parallel implementation.

[Up] Back to publication list


Rate Control as a Language Construct for Parallel and Distributed Programming

Peter Mills, Jan Prins, John Reif.

Dept. of Computer Science, Duke University, Durham, NC
Dept. of Computer Science, Univ. of North Carolina at Chapel Hill

May 1993

ABSTRACT

This paper introduces a new parallel programming language construct, the rate construct, and examines its utility for a variety of problems. The rate construct specifies constraints on the relative rates of progress of tasks executing in parallel, where progress is the amount of computational work as measured by elapsed "ticks" on a local logical clock. By prescribing expected work, the rate construct constrains the allocation of processor-time to tasks needed to achieve that work; in a parallel setting this constrains the distribution of tasks to processors and multiprocessing ratios, effected for example by load balancing. We present definitions of rate and underlying real-time primitives as orthogonal extensions to the architecture-independent parallel programming language Proteus. The utility of the rate construct is evidenced for a variety of problems, including weighted parallel search for a goal, adaptive many-body simulation in which rates abstract the requirements for load-balancing, and variable time-stepped computations in which the use of rates can alter the frequency of asynchronous iterations.

[Up] Back to publication list


Prototyping N-body Simulation in Proteus

Peter Mills, Lars Nyland, Jan Prins, John Reif.

Dept. of Computer Science, Duke University, Durham, NC
Dept. of Computer Science, Univ. of North Carolina at Chapel Hill

April 1992

ABSTRACT

This paper explores the use of Proteus, an architecture-independent language suitable for prototyping parallel and distributed programs. Proteus is a high-level imperative notation based on sets and sequences with a single construct for the parallel composition of processes communicating through shared memory. Several different parallel algorithms for N-body simulation are presented in Proteus, illustrating how Proteus provides a common foundation for expressing the various parallel programming models. This common foundation allows prototype parallel programs to be tested and evolved without the use of machine-specific languages. To transform prototypes to implementations on specific architectures, program refinement techniques are utilized. Refinement strategies are illustrated that target broad-spectrum parallel intermediate languages, and their viability is demonstrated by refining an N-body algorithm to data-parallel CVL code.

[Up] Back to publication list


Prototyping High-Performance Parallel Computing Applications

Peter Mills, Lars Nyland, Jan Prins, John Reif.

Dept. of Computer Science, Duke University, Durham, NC
Dept. of Computer Science, Univ. of North Carolina at Chapel Hill

May 1992

ABSTRACT

This paper explores the use of Proteus, an architecture-independent language suitable for prototyping time-sensitive parallel and distributed programs. Proteus is a high-level imperative notation based on sets and sequences with succinct yet powerful constructs for the parallel composition of processes communicating through shared memory. Several different parallel algorithms for N-body simulation in molecular dynamics are presented using Proteus, illustrating how the language provides a common foundation for expressing various parallel programming models. This common foundation supports the construction of high-performance computing applications across a wide range of parallel machines through a development methodology in which prototype parallel programs can be tested and evolved without the use of machine-specific languages. To transform prototypes to implementations on specific architectures, program refinement techniques are utilized. Refinement strategies are illustrated that target broad-spectrum parallel intermediate languages, and their viability is demonstrated by refining an N-body algorithm to data-parallel intermediate code. Time-sensitive variants of a parallel N-body algorithm are also described to illustrate how Proteus allows the expression of resource requirements through real-time constraints as well as progress constraints which abstractly specify the distribution of computational resources.

[Up] Back to publication list


Prototyping Parallel and Distributed Programs in Proteus

Peter Mills, Lars Nyland, Jan Prins, John Reif, Robert Wagner.

Dept. of Computer Science, Duke University, Durham, NC
Dept. of Computer Science, Univ. of North Carolina at Chapel Hill

December 1991

ABSTRACT

This paper presents Proteus, an architecture-independent language suitable for prototyping parallel and distributed programs. Proteus is a high-level imperative notation based on sets and sequences with a single construct for the parallel composition of processes. Although a shared-memory model is the basis for communication between processes, this memory can be partitioned into shared and private variables. Parallel processes operate on individual copies of private variables, which are independently updated and may be merged into the shared state at specifiable barrier synchronization points. Several examples are given to illustrate how the various parallel programming models, such as synchronous data-parallelism and asynchronous control-parallelism, can be expressed in terms of this foundation. This common foundation allows prototypes to be tested, evolved and finally implemented through refinement techniques targeting specific architectures.

[Up] Back to publication list


A Semantics for Priority using Partial Orders with a Simultaneity Relation

Peter Mills

Dept. of Computer Science, Univ. of North Carolina at Chapel Hill

February 1990

ABSTRACT

In this paper we present a semantics for concurrent systems with priority constraints, using as a basis partial orders of events with both precedence and simultaneity relations. Previous work has shown that theories based on partial orders modelling precedence alone are insufficient to capture priority. We use the prosset model of Gaifman and Pratt, an extension to partial-order theories which includes a relation for explicit simultaneity, to develop a semantics for a COSY dialect allowing the specification of priority. We show that Janicki's "sequences of multiples" semantics for priorities can be derived from a linearization of the prosset semantics. Finally, we look at a simple taxonomy to clarify the relation of these various semantics to other event-history models.

[Up] Back to publication list


A Partial-Order Characterization of Delay-Insensitive Circuits

Peter Mills, J. Dean Brock.

Dept. of Computer Science, Univ. of North Carolina at Chapel Hill

September 1988

ABSTRACT

With todays emphasis on concurrency and asynchrony in digital circuits, it is not surprising that many languages used to specify circuits, such as CSP, Petri-nets and path expression variants, have a semantics which can be naturally captured by partial order models. In this paper we present a simple theory based on partial orders, scenarios with protocols, which can serve as a semantic basis for asynchronous circuits. This extended model has, in addition to partial orders symbolizing event causality and temporal precedence, a protocol relation which constrains the times at which modules may send output. We then examine the possibility of using such a partial order semantics to characterize delay-insensitive circuits, affording simplification over several linearly-ordered trace theories advanced so far. It is shown that scenarios with protocols are subsumed under Vaughn Pratt's pomset model of processes, an extension and repackaging of Dean Brock's original scenario theory which unifies the causal and temporal relations into one partial order. Lastly, we discuss ongoing work which uses the pomset (partially ordered multiset) model directly as a basis for a trace theory for delay-insensitive circuits, with focus on rules for tests for composability.

[Up] Back to publication list



3D Ultrasound Display using Optical Tracking

Peter Mills, Henry Fuchs.

Dept. of Computer Science, Univ. of North Carolina at Chapel Hill

September 1990

ABSTRACT

We present a method to capture a 3D ultrasound volume from a series of 2D cross-sectional images obtained with a conventional B-mode scanner. We use optical tracking of the ultrasound transducer to locate the digitized sections within the volume. In this paper we describe a simple optical tracking technique which uses two conventional video cameras and stereopsis, and discuss its freedom-of-motion effects on reconstruction and visualization. We are in the process of investigating its utility in an experiment imaging a human fetus. This technique is but a pilot forerunner of optical methods to be employed in a project whose goal is to achieve real-time interactive display of 3D ultrasound.

[Up] Back to publication list


Imex: a Tool for Image Display and Contour Management in a Windowing Environment

Peter Mills, Henry Fuchs, Stephen Pizer, Julian Rosenman.

Dept. of Computer Science, Univ. of North Carolina at Chapel Hill

May 1989

ABSTRACT

Medical workstations of the future will support the real-time display and interactive manipulation of 3-D objects derived from CT, MRI and other imaging modalities. As part of such an integrated system for visualizing 3-D volumes, we have developed an highly interactive, flexible, and portable program for the 2D display of image slices and contours outlining anatomical objects. The editing of these contours as well as their automatic creation by thresholding and edge-tracking is supported. The contours may later be used to generate 3D surfaces for shaded-graphics rendering, or to mask out regions of interest in the image for volume rendering. This "image executive" program, or Imex, is designed to run in a windowing environment (i.e., the X Window System). The user-interface model, which may be described as a "Macintosh for images", associates one movable and resizable window with each displayed view of a 2D image slice or of an indexed array of slices. Any number of views into one slice or into a subset of an array of slices may be present. Natural interaction is achieved by providing immediate response and by using the mouse to effect navigation and viewing functions, for example grabbing and dragging a slice onto another to copy its field-of-view or other attributes.

[Up] Back to publication list


The Virtual Simulator

Charles Mosher, George Sherouse, Peter Mills, Kevin Novins, Stephen Pizer, Julian Rosenman, Edward Chaney.

Dept. of Computer Science, Univ. of North Carolina at Chapel Hill

October 1986

ABSTRACT

We have undertaken to provide radiotherapists with Computer-Aided Design and 3D display tools to advance radiation treatment planning. A major part of this effort is a CAD tool for radiation treatment design that allows physicians to comfortably explore alternatives to traditional treatment geometries. Care has been taken to create an intuitive and highly interactive user interface and to incorporate high quality 3D display techniques. We have explored various control configurations for the input devices to make interactions more natural. We believe that such software will allow radiation therapy physicians to more precisely design and direct radiation treatments, leading to higher cancer cure rates with fewer side effects of treatment.

[Up] Back to publication list


High-Speed Interaction on a Vibrating-Mirror 3D Display

Peter Mills, Henry Fuchs, Stephen Pizer.

Dept. of Computer Science, Univ. of North Carolina at Chapel Hill

May 1984

ABSTRACT

An implementation of a true 3-dimensional display-system using vibrating mirror is described. This implementation is characterized by using a standard raster graphics system, and supports user-interaction for independent manipulation of multiple objects that is near real-time. The high-speed performance of the object-manipulations, which include translation, rotation, scaling, intensity-windowing, spatial-clipping, intensity-highlighting, and blinking, is achieved through five means: exploiting the power of the user-programmable graphics processor, fully utilizing the raster-device characteristics, coordinating concurrent execution of the host and graphics computers, localizing the image changes effected by interaction, and by relying upon successive refinement of the initially coarse object-display during periods of reduced interaction. Recent enhancements include the capability to display more points (circa 120,000), the integration of the functions into one system allowing multiple simultaneous manipulations, and the support for independent object motion. The system is currently being used to display CT and NMR data for medical imaging and electron microscope tomographs for molecular modeling.

[Up] Back to publication list


Interactive 3D Display of Medical Images by Varifocal Mirror

S. Pizer, H. Fuchs, E. Heinz, E. Staab, E. Chaney, J. Rosenman, J. Austin, S. Bloomberg, E. MacHardy, P.H. Mills, and D. Strickland.

Dept. of Computer Science, Univ. of North Carolina at Chapel Hill

August 1983

ABSTRACT

Not yet available online.

[Up] Back to publication list