#### Collected works of Kenneth E. Batcher, emeritus professor, and works inspired by his research and scholarship.

#### Browse the *Kenneth E. Batcher Collection: Papers from the Parallel and Associative Computing Laboratory
* Collections

**1 - 40**of

**40**| Next »

## A Multiple Associative Model to Support Branches in Data Parallel Applications Using the Manager-Worker Paradigm2005ASC (associative computing model) and MASC (multiple associative computing model) have long been studied in the Department of Computer Science at Kent State University. While the previous studies provide the background and the basic definition of the model, the description of the interactions between the instruction streams (ISs) is very brief, high level, and incomplete. One change here is that we specify the interaction between ISs and consider that all of the ISs operate on the same clock in order to support predictable worst case computation times, while earlier the ISs were assumed to interact in a MIMD type fashion. This paper provides a detailed explanation as to how these interactions can be supported in the case where only a few ISs are supported. |

## A scalable associative processor with applications in database and image processing04/2004This paper describes the implementation and use of a dedicated associative SIMD co-processor ideally suited for many applications such as database processing, image processing, genome matching, or molecular similarity analysis. The concept of associative SIMD processing is introduced, and differentiated from other associative and SIMD techniques. Then our ASC (ASsociative Computing) processor is briefly described, along with its implementation of associative SIMD processing. Finally, we demonstrate the use of our ASC processor on relational database processing and on the image processing operation of edge detection. |

## A scalable pipelined associative SIMD array with reconfigurable PE interconnection network for embedded applications11/2005This paper describes the FPGA implementation of a specialized SIMD processor array for embedded applications. An alternative to traditional SoC or MPSoC architectures, this array combines the massive parallelism inherent in SIMD architectures with the search capabilities of associative computing, producing a SIMD Processor Array System on a Chip (PASoC) well suited for applications such as data mining and bioinformatics. This paper first describes the architecture of the system, and then the pipelining of the processing elements and the addition of a reconfigurable interconnection network. Finally, the paper concludes with a brief description of a string-matching algorithm that can be used for the applications cited above. |

## A software implementation of a cycle precision simulator of a multiple associative model01/2010The Multiple Associative Computing (MASC) parallel model is a generalization model of an Associative Computing (ASC) parallel model designed to support multiple ASC data parallel threads by using control parallelism. The MASC model is designed to combine the advantages of both Single Instruction Stream Multiple Data Streams (SIMD) and Multiple Instruction Streams Multiple Data Streams (MIMD) models. Here is the first time that a complete description of MASC model has been implemented (in software) true to its original description. A cycle precision simulator is built to demonstrate the performance of MASC on various multithreaded algorithms. The simulator is a software prototype for the model with sufficient software details to allow it to be converted into a hardware prototype of the model. If a reasonable limit for the number of threads simultaneously supported is assumed, the resulting hardware design is not only easily to implement, but can easily support a huge number of processing units and is a excellent candidate architecture for supporting large scale (e.g., terascale and petascale) computing. Experimental results shows that, when processing large-scale instances using multiple workers, the algorithm executed by the MASC model using a static task assignment scheme provides strong scaling with constant time overhead. |

## A study of a byte oriented associative array processor for image processing11/19/1975Goodyear Aerospace Corporation technical report AP-122364 |

## An Associative Static and Dynamic Convex Hull Algorithm2002This paper presents a new static and dynamic recursive parallel algorithm for the convex hull problem. This algorithm is a parallel adaptation of the Graham scan and Quick Hull algorithms. The computational model selected for this algorithm is the associative computing model (ASC) which supports massive parallelism through the use of data parallelism and constant time associative search and maximum functions. Also, ASC can be supported on existing SIMD computers. The static algorithm requires O(n) space, O(log n) average case running time, and O(n) worst case running time. If O(log n) ISs are used the, static algorithm should have an average running time of O(log log n). |

## An Efficient Associative Processor Solution to an Air Traffic Control Problem2010This paper proposes a SIMD solution to air traffic control (ATC) using an enhanced SIMD machine model called an Associative Processor (AP). This differs from previous ATC systems that are designed for MIMD computers and have a great deal of difficulty meeting the predictability requirements for ATC, which are critical for meeting the strict certification standards required for safety critical software components. The proposed SIMD solution will support accurate and meaningful predictions of worst case execution times and will guarantee all deadlines are met. Also, the software will be much simpler and smaller in size than the current corresponding ATC software. An important consequence of these features is that the V&V (Validation and Verification) process will be considerably simpler than for current ATC software. Additionally, the associative processor is enhanced SIMD hardware and is considerably cheaper and simpler than the MIMD hardware currently used to support ATC. The ClearSpeed CSX600 accelerator is used to emulate the AP model. A preliminary implementation of the proposed method has been developed and experimental results comparing MIMD and CSX600 approaches are presented. The performance of CSX600 has better scalability, efficiency, and predictability than that of MIMD. |

## An Extension of the ASC Language Compiler to Support Multiple Instruction Streams in the MASC Model Using the Manager-Worker Paradigm2006In this paper, we describe and implement compiler extension for a parallel computer language called Associative Computing (ASC) language to support multiple instruction streams in a Multiple Associative Computing (MASC) model using manager-worker paradigm. A user directed MASC directive is used to enable concurrent executions of the THEN part and the ELSE part in a parallel IF- THEN-ELSE statement by using two different worker- instruction streams. For most applications, this technique should substantially improves the performance of the system over its performance using only one instruction stream; moreover it is more effective than using multiple instruction streams to execute every parallel IF-THEN-ELSE statements found in a program. When the overhead outweighs the benefit gained from using multiple instruction streams, a user can choose to use only one instruction stream to execute the IF-THEN-ELSE statement. While not explicitly covered here, parallel CASE statements can be handled similarly. |

## Architecture of a massively parallel processor05/1980The massively parallel processor (MPP) system is designed to process satellite imagery at high rates. A large number (16,384) of processing elements (PE's) are configured in a square array. For optimum performance on operands of arbitrary length, processing is performed in a bit-serial manner. On 8-bit integer data, addition can occur at 6553 million operations per second (MOPS) and multiplication at 1861 MOPS. On 32-bit floating-point data, addition can occur at 430 MOPS and multiplication at 216 MOPS. |

## ASC: An associative-computing paradigm11/1994Today's increased computing speeds allow conventional sequential machines to effectively emulate associative computing techniques. We present a parallel programming paradigm called ASC (ASsociative Computing), designed for a wide range of computing engines. Our paradigm has an efficient associative-based, dynamic memory-allocation mechanism that does not use pointers. It incorporates data parallelism at the base level, so that programmers do not have to specify low-level sequential tasks such as sorting, looping and parallelization. Our paradigm supports all of the standard data-parallel and massively parallel computing algorithms. It combines numerical computation (such as convolution, matrix multiplication, and graphics) with nonnumerical computing (such as compilation, graph algorithms, rule-based systems, and language interpreters). This article focuses on the nonnumerical aspects of ASC. |

## ASPRO debugger RSX version users guide02/1987Goodyear Aerospace Corporation technical manual GER-16891 |

## ASPRO emulator (VAE) user's guide02/1985Goodyear Aerospace Corporation technical manual GER 17267 |

## ASPRO floating point programming manual10/1985Goodyear Aerospace Corporation technical manual GER 16810 |

## ASPRO programming reference manual10/1982Goodyear Aerospace Corporation technical manual GER-16809 Revision B |

## ASPRO support software users guide08/1982Goodyear Aerospace Corporation technical manual GER-16846 |

## Bit-Serial Parallel Processing Systems05/1982About a decade ago, a bit-serial parallel processing system STARAN®1 was developed. It used standard integrated circuits that were available at that time. Now, with the availability of VLSI, a much greater processing capability can be packed in a unit volume. This has led to the recent development of two bit-serial parallel processing systems: an airborne associative processor and a ground based massively parallel processor. |

## Comparisons of air traffic control implementations on an associative processor with a MIMD and consequences for parallel computing02/2013This paper has two complementary focuses. The first is the system design and algorithmic development for air traffic control (ATC) using an associative SIMD processor (AP). The second is the comparison of this implementation with a multiprocessor implementation and the implications of these comparisons. This paper demonstrates how one application, ATC, can more easily, more simply, and more efficiently be implemented on an AP than is generally possible on other types of traditional hardware. The AP implementation of ATC will take advantage of its deterministic hardware to use static scheduling. The software will be dramatically smaller and cheaper to create and maintain. Likewise, a large AP system will be considerably simpler and cheaper than the MIMD hardware currently used. While APs were used for ATC-type applications earlier, these are no longer available. We use a ClearSpeed CSX600 accelerator to emulate the AP solutions of ATC on an ATC prototype consisting of eight data-intensive ATC real-time tasks. Its performance is compared with an 8-core multiprocessor (MP) using OpenMP. Our extensive experiments show that the AP implementation meets all deadlines while the MP will regularly miss a large number of deadlines. The AP code will be similar in size to sequential code for the same tasks and will avoid all of the additional support software needed with an MP to handle dynamic scheduling, load balancing, shared resource management, race conditions, false sharing, etc. At this point, essentially only MIMD systems are built. Many of the advantages of using an AP to solve an ATC problem would carry over to other applications. AP solutions for a wide variety of applications will be cited in this paper. Applications that involve a high degree of data parallelism such as database management, text processing, image processing, graph processing, bioinformatics, weather modeling, managing UAS (Unmanned Aircraft Systems or drones) etc., are good candidates for AP solutions. This raises the issue of whether we should routinely consider using non-multiprocessor hardware like the AP for applications where substantially simpler software solutions will normally exist. It also raises the question of whether the use of both AP and MIMD hardware in a single heterogeneous system could provide more versatility and efficiency. Either the AP or MIMD could serve as the primary system, but could hand off jobs it could not handle efficiently to the other system. |

## Efficient associative SIMD processing for non-tabular structured data2004With recent advances in VLSI technology, it is easy to develop chips containing hundreds or thousands of processors - in effect, SIMD processing on a chip. Associative SIMD processing goes further, providing efficient parallel methods for searching tabular data. The techniques described here show that associative processing can also be extended to efficiently process non-tabular structured data (e.g., trees), as we demonstrate by supporting structure codes in our associative ASC processor. |

## Functional description of ASPRO: The high speed associative processor07/16/1984Goodyear Aerospace Corporation technical report GER-16868 |

## Implementing a scalable ASC processor2003Previous papers (Walker et al. (2001); Wu et al. (2002)) have described our implementation of a small prototype processor and control unit for associative computing, called the ASC processor. That initial prototype was implemented on an Altera education board using an Altera FLEX 10K FPGA, and was limited to an unrealistic 4 processing elements (PE). This paper describes a more complete implementation - a scalable ASC processor that can scale up to 52 PE on an Altera APEX 20KE board, or further on larger FPGA. This paper also proposes extensions to support multiple control units and control parallelism. |

## Implementing associative processing: Rethinking earlier architectural decisions04/2001This paper describes an initial design of an associative processor for implementation using field-programmable logic devices (FPLDs). The processor is based loosely on earlier work on the STARAN computer, but updated to reflect modern design practices. We also draw on a large body of research at Kent State on the ASC and MASC models of associative processing, and take advantage of an existing compiler for the ASC model. The resulting design consists of an associative array of 8-bit RISC Processing Elements (PEs), operating in byte-serial fashion under the control of an Instruction Stream (IS) Control Unit that can execute assembly language code produced by a machine-specific back-end compiler. |

## Implementing Associative Search and Responder Resolution04/2002In a paper presented last year at WMPP'01 [Walker01], we described the initial prototype of an associative processor implemented using field-programmable logic devices (FPLDs). That paper presented an overview of the design, and concentrated on the processor's instruction set and its implementation using FPLDs. This paper describes the implementation of the processor's associative operation -- associative searching and responder resolution -- in more detail. |

## Massively parallel processor (MPP): July 1979 phase I final report07/23/1979Goodyear Aerospace Corporation technical report GER-16684 |

## MPP: A supersystem for satellite image processing1982In 1971 NASA Goddard Space Flight Center initiated a program to develop high-speed image processing systems. These systems use thousands of processing elements (PE's) operating simultaneously to achieve their speed (massive parallelism). A typical satellite image contains millions of picture elements (pixels) that can generally be processed in parallel. In 1979 a contract was awarded to construct a massively parallel processor (MPP) to be delivered in 1982. The processor has 16,896 PE's arranged in a 128-row by 132-column rectangular array. The PE's are in the array unit (Figure 1). Other major blocks in the massively parallel processor are the array control unit, the staging memory, the program and data management unit, and the interface to a host computer. |

## Multiple Instruction Stream Control for an Associative Model of Parallel Computation2003This paper describes a system software design for multiple instruction stream control in a massively parallel associative computing environment. The purpose of providing multiple instruction stream control is to increase throughput and reduce the amount of parallel slackness inherent in single instruction stream parallel programming constructs. The multiple associative computing (MASC) model is used to describe this technique and a brief introduction to the MASC model of parallel computation is presented. A simple parallel computing example is used to illustrate the techniques for multiple instruction stream control in a massively parallel runtime environment. |

## On using the UML to describe the MASC model of parallel computation06/2000A Unified Modeling Language (UML) description of the MASC model of parallel computation is presented. This UML description identifies MASC objects and specifies various object and inter-object relationships, dependencies, and behaviors. This was achieved by describing various views of the MASC model by using many of the UML structural and behavioral diagrams. The use of using UML to describe MASC has been highly effective for further parallel modeling techniques, comparisons to other parallel models, MASC parallel system software research, and MASC algorithm development. |

## Predictability for Real-Time Command and Control2001This paper describes a new and different paradigm for real-time command and control that we show can provide a static, polynomial-time scheduling algorithm. Current efforts in real-time scheduling have been unable to predict system performance, and use a unpredictable "dynamic" scheduling algorithm. The Nation’s Air Traffic Control (ATC) System has been unable to satisfy performance requirements, and is extremely expensive. An editorial in "USA Today" (4-19-1999) cites expenditures in ATC at $41 billion. But, current multiprocessor technology cannot do the job. The FAA wisely required a “proof of concept” study before the (AAS) contract. But that study, after expending a billion dollars, was abandoned, and production moved forward. AAS was canceled in 1995 exactly in line with real-time scheduling theory predictions. Garey, Graham, and Johnson state: "For these scheduling problems, no efficient optimization algorithm has yet been found, and indeed, none is expected.”[9]. Stankovic et. al. state: "… complexity results show most real-time multiprocessing scheduling is NP-hard.” [6]. We offer a completely different approach, that has been shown to overcome the severe limitations of multiprocessing. Additionally, we show that real-time parallel-processing techniques can be statically scheduled to solve the set of tasks making up the ATC problem for a worst-case environment. |

## Relating the power of the Multiple Associative Computing (MASC) model to that of reconfigurable bus-based models05/2010The MASC (Multiple ASsociative Computing) model is a multi-SIMD model that uses control parallelism to coordinate the interaction of data parallel threads and supports associative SIMD computing on each of its threads. There have been a wide range of algorithms developed for this model. Research on using this model in real-time system applications and building a scalable MASC architecture is currently quite active. In this paper, we present simulations between MASC and reconfigurable bus-based models, e.g., various versions of the Reconfigurable Multiple Bus Machine (RMBM). Constant time simulations of the basic RMBM by MASC and vice versa are obtained. Simulations of the segmenting RMBM, fusing RMBM, and extended RMBM by MASC in non-constant time are also discussed. By taking advantage of previously established relationships between RMBM and two other popular parallel computational models, namely, the Reconfigurable Mesh (RM) and the Parallel Random Access Machine (PRAM), we extend our simulation results to further categorize the power of the MASC model in relation to RM and PRAM. |

## Simulating PRAM with a MSIMD model (ASC)08/14/1998The ASC (MSIMD) model for parallel computation supports a generalized version of an associative style of computing that has been used since the introduction of associative SIMD computers in the early 1970's. In particular, this model supports data parallelism, constant time maximum and minimum operations, one or more instruction streams (ISs) which are sent to a unique set in a dynamic partition of the processors, and assignment of tasks to the ISs using control parallelism. ASC also allows a network to interconnect the processing elements (PEs). This paper shows how ASC can simulate synchronous PRAM, and the converse. These results provide an important step in defining the power of associative model in terms of PRAM which is the most studied parallel model. Also, these simulations will provide numerous algorithms for ASC by giving an automatic method of converting algorithms from PRAM to ASC. |

## Simulation between enhanced meshes and the multiple associative computing (MASC) model11/1999MASC (for Multiple Associative Computing) is a practical, highly scalable joint control parallel, data parallel model that naturally supports massive parallelism and a wide range of applications. In this paper, we propose efficient algorithms for the MASC model with a 2-D mesh to simulate enhanced meshes, e.g., meshes with multiple broadcasting (MMB), and basic reconfigurable meshes (BRM). The results not only show the power of the MASC model in terms of the enhanced mesh models but also provide an automatic conversion of numerous algorithms designed for enhanced meshes to the MASC model. |

## Solving a 2D knapsack problem on an associative computer augmented with a linear network1996This paper describes a parallelization of the sequential dynamic programming method for solving a 2D knapsack problem where multiples of n rectangular objects are optimally packed into a knapsack of size |

## Solving a 2D Knapsack Problem Using a Hybrid Data-Parallel/Control Style of Computing2004Summary form only given. This paper describes a parallel solution of the sequential dynamic programming method for solving a NP class, 2D knapsack (or cutting-stock) problem which is the optimal packing of multiples of n rectangular objects into a knapsack of size L/spl times/W and are only obtainable with guillotine-type (side to side) cuts. Here, we describe and analyze this problem for the associative model. Since the introduction of associative SIMD computers over a quarter of a century ago, associative computing and the data-parallel paradigm remain popular. The MASC (multiple instruction stream associative computer) parallel model supports a generalized version of an associative style of computing. This model supports data parallelism, constant time maximum and minimum operations, one or more instruction streams (ISs) which are sent to an equal number of partition sets of processors, and assignment of tasks to ISs using control parallelism. We solve this NP class problem with a parallel algorithm that runs in O(W(n+L+W)) time using L processors, where L>W for a 2D knapsack problem with a capacity of L/spl times/W. The new multiple IS version using LW processors and max{L,M} ISs runs in O(n+L+W) given practical hardware considerations. Both of these results are cost optimal with respect to the best sequential implementation. Moreover, an efficient MASC algorithm for this well-known problem should give insight to how the associative model compares to other parallel models such as PRAM. |

## Study of multistage SIMD interconnection networks04/1978Four SIMD multistage networks—Feng's data manipulator, STARAN flip network, omega network, and indirect binary n-cube—are analyzed. Three parameters—topology, interchange box, and control structure—are defined. It is shown that the latter three networks use equivalent topologies and differences in their capabilities result from the other parameters. An augmented data manipulator network using a modified control structure to perform more single pass interconnections than the other networks is presented. Some problems may be solved more efficiently if the 2n processing elements of an SIMD machine can be partitioned into submachines of size 2r. Single and multiple control partitioning are defined. The capabilities of these multistage networks to perform in these partioned environments are discussed. |

## The Architecture of Tomorrow's Massively Parallel Computer07/01/1987Goodyear Aerospace delivered the Massively Parallel Processor (MPP) to NASA/Goddard in May 1983, over three years ago. Ever since then, Goodyear has tried to look in a forward direction. There is always some debate as to which way is forward when it comes to supercomputer architecture. Improvements to the MPP's massively parallel architecture are discussed in the areas of data I/O, memory capacity, connectivity, and indirect (or local) addressing. In I/O, transfer rates up to 640 megabytes per second can be achieved. There are devices that can supply the data and accept it at this rate. The memory capacity can be increased up to 128 megabytes in the ARU and over a gigabyte in the staging memory. For connectivity, there are several different kinds of multistage networks that should be considered. |

## The universality of various types of SIMD machine interconnection networks03/1977SIMD machine architects must choose an interconnection network to provide interprocessor communication. The universality of a network is its ability to simulate arbitrary interconnections of the processing elements. We examine the universality of five particular networks which cover the types used in the llliac IV, STARAN, Omen, SlMDA, and RAP machines. They also cover the types discussed by Feng, Lang, Lawrie, Orcutt, Siegel, and Stone. We give O((log2N)2 ) algorithms, where N is the number of processing elements, for the Perfect Shuffle, PM21, WPM21, and Cube networks to simulate arbitrary interconnections (Orcutt has given an O(Nl/2log2N) algorithm for the llliac network). We analyze Batcher's bitonic sorting method and show how each network can implement it on an SIMD machine. We discuss how sorting destination tags is equivalent to simulating interconnections. |

## Virtual parallelism by self simulation of the multiple instruction stream associative model1996The ASC model for parallel computation supports a generalization of an associative style of computing that has been used since the introduction of associative SIMD computers in the early 1970's. In particular, this model supports data parallelism, constant time maximum and minimum operations, one or more instruction streams (ISs) which are sent to an equal number of partition sets of processors (PEs), and assignment of tasks to the ISs using control parallelism. Since problems often need more processors than a machine has, it is useful to have virtual processors simulated by existing architectures. This paper shows how virtual parallelism is possible where more PEs and ISs are simulated than the actual hardware possesses. The extra time needed for an ASC(n,j) machine to simulate an ASC(N,J) machine where |

## VLDC String Matching for Associative Computing and Multiple Broadcast Mesh1996This paper presents a new parallel algorithm far string matching with variable length "don't care" (VLDC). The initial computational model used is the associative computing model (ASC) enhanced with a linear network. ASC is a natural extension of the data parallel paradigm to a complete model far parallel computation. It supports massively parallelism through the use of data parallelism and constant time functions such as associative search and maximum value. It is also shown that the same algorithm is equally adaptable to the mesh with multiple broadcast. The algorithm has a run time of O(m) using O(n) processors, given a pattern of size m and a text of size n. The algorithm has the unique feature of permitting the identification of all match continuation points in the text after each "don't care" character. |