Fifth Bay Area Scientific Computing Day

Lawrence Berkeley National Laboratory
Berkeley, California
March 13, 2004

ABSTRACTS OF THE POSTERS

[click on the title to see the poster]



Structure-Preserving Macromodeling of MEMS and RF MEMS

Z. Bai, S. Bhave, D. Bindel, J. Demmel, R. Howe and J. Wang
University of California, Davis and Berkeley

Microelectromechanical systems (MEMS) are moving from simple single function devices to elaborate structures with subtle dynamics. The need for accurate simulations of 3D MEMS and RF MEMS has led to large and complex models. Reduced-order models (macromodeling) can efficiently and accurately evaluate the input/output behaviors of such complex models. In this poster, we will present a new structure-preserving model reduction technique for large-scale second-order dynamical systems and applications to MEMS simulation.


On Improving the Performance of an Iterative Linear Solver

Allison Baker
Lawrence Livermore National Laboratory

Two approaches to improving the performance, i.e. time to solution, of an iterative linear solver algorithm are particularly viable. First, algorithmic changes that improve convergence properties result in faster convergence due to fewer overall floating-point operations. Second, modifications to an algorithm that reduce the movement of data through memory greatly impact performance because of the growing gap between CPU performance and memory access time. Ideally, a balance is achieved between improving the efficiency of an iterative linear solver from a memory-usage standpoint and maintaining favorable numerical properties. We discuss the restarted generalized minimum residual (GMRES) method in the context of both approaches to improving performance.



BEE2: A High-End Reconfigurable Computer

Chen Chang
UC Berkeley

Ever since general purpose CMOS processors based on Von Neumann architecture (stored program, sequential execution, and time multiplexing hardware) became the underlying fabric of high end computing (HEC), the basic strategy to improve performance has been twofold: 1) use CMOS circuits that have the maximum performance on a single chip using the most advanced technology available, and 2) assemble N of these chips along with memory attempting to achieve an N times computational increase. However, in recent years, the ever increasing speed gap between processor and memory, the explosion of power consumption due to high clock rate, and the limited instruction/thread level parallelism, have limited the overall performance gain to far below the potential offered by the underlying silicon fabrication technology advancements.

The BEE2 project explores FPGA-based reconfigurable computing technology as an alternative to the exiting processor-based design for high-end parallel computers. The goal is to design the next generation space/cost/power efficient parallel computer using FPGAs as the basic building blocks. The first generation of the BEE platform was constructed in 2001 and currently in daily use at Berkeley Wireless Research Center, as a large-scale hardware emulator for advanced DSP and communication system designs and simulation. The Matlab/Simulink high- level block diagram programming environment replaces the traditional VHDL/Verilog description as the primary method of design entry for FPGA. Real-time I/O and matching computation is supported natively on the system to ensure proper interface with real world analog acquisition and actuation components. Many communication and DSP chip/system designs have been successfully completed on the BEE system.

The BEE2 hardware platform, currently under design, is a scaled-up version of the BEE system, and the goal is to scale beyond communication related applications to include a wide range of high performance general computing applications. The BEE2 system can provide in a single server rack space, up to 25 TOPS (32-bit integer) or 1 TFLOPS (single-precision floating point) peak performance, with 640 GB DDR2 DRAM capacity, 22 TB local disk storage, and 5Tbps I/O bandwidth to external devices, all using the latest Xilinx Virtex-II pro FPGA chips. The system is scalable from 4 to 128 Processing FPGA chips in a single rack, and up to several thousands of FPGAs in multiple racks. The new programming environment builds on top of the current Simulink design flow on BEE, and extends it to include high-level parallel programming models, such as Shared Address Space and MPI-2, as well as specialized alternative programming models for particular applications, like electromagnetic simulation for antenna design, large chip-level SPICE simulation, large dense ad-hoc wireless network simulation. One of the key challenges is to identify and implement the appropriate programming methodology for the system to achieve a balance between user productivity and high machine performance.



Power Series Methods for Inverting Matrix Valued Functions

Aaron Diaz
Santa Clara University

We consider the problem of computing power series approximations to the function A^{-1}(t)b(t) for real matrix valued functions A and b of a real variable. These procedures have application whenever systems A(t)x=b(t) need be solved for many values of the parameter t. We demonstrate power series methods to be competitive with both direct factorization and preconditioned iterative solvers. The bulk of the computation can be done offline and hence these methods are particularly suitable for real time applications. We focus on the special case when A is a polynomial.In this case there are explicit formulas for the expansion coefficients in a given basis.


Applying Automated Multi-level Substructuring to Large-scale Electromagnetic Simulation

Weiguo Gao
Lawrence Berkeley National Laboratory

The method of Automated Multi-level Sub-structuring (AMLS) has attracted a lot of attention recently. It has been shown that the method can be used to solve some large-scale (generalized) eigenvalue problems much faster than the conventional approach.

Although the method is originally developed in the engineering community for vibrational analysis of large structures, recent studies indicate that it can be effective in other disciplines as well.

In this poster presentation, we examine the convergence properties of AMLS and analyze its applicability to the generalized eigenvalue problem arising from the simulation of the electromagnetic field associated with the next generation accelerator design. One interesting characteristic of this application is that the stiffness matrix produced by the hierarchical vector finite elements scheme contains a null space of large dimension. We will present an efficient way to deflate this null space in AMLS.


Distributed Large Scale Simulation of Immersed Boundary Systems in Titanium

Edward Givelberg and Kathy Yelick
University of California at Berkeley

We present an algorithm for a distributed computation of the immersed boundary method. The immersed boundary method has been used to simulate systems containing elastic material immersed in a viscous incompressible
fluid. Our two main applications are in modeling the fluid dynamics of the heart and the cochlea (the inner ear). The algorithm has been implemented in Titanium, a Java dialect developed at Berkeley for large-scale scientific computing.


Multi-protein Complex Data Mining for Detecting Protein Interactions and Functional Organizations

Xiaofeng He
Lawrence Berkeley National Laboratory

Multi-protein Complex Data Mining for Detecting Protein Interactions and Functional Organizations" Protein Interaction Networks present a useful perspective for understanding cellular processes. Recent experiments employing high-throughput mass spectrometric characterizations have resulted in large datasets of physiologically relevant multi-protein complexes. We present a unified representation of such datasets based on an underlying bipartite graph model that present an advance over existing models of the network. Our unified representation allows for weighting of connections between proteins shared in more than one complex as well as addressing the higher level of organization that occurs when the network is viewed as consisting of protein complexes that share components. This representation also allows for the application of the rigorous MinMaxCut graph clustering algorithm for the determination of relevant protein modules in the networks. Statistically significant annotations of clusters in the protein-protein and complex-complex network using concepts from the Gene Ontology suggest that this method will be useful for posing hypotheses about uncharacterized components of protein complexes or uncharacterized relationships between protein complexes.



Scalable Scientific Computing Software Research

Sophia Lefantzi, Benjamin Allan and Jaideep Ray
Sandia National Laboratories

Scientific computing has progressed to the point where multidisciplinary teams are used to build complex simulations. In this environment we need efficiency in the construction of the software as well as in the runtime performance. Building and validating a simulation usually takes far longer than running it. We present our ongoing work enabling efficient software construction, our experiments toward improved runtime efficiency with the Brook streaming computing language, and how the two can be tied together.



Diagonal Dominance of SPD Matrices

Oren Livne
Stanford University

Let A be a symmetric matrix. If A is strictly (or irreducibly) Diagonally Dominant (DD), then A is Symmetric Positive Definite (SPD). The converse does not hold. We raise the question: does SPD still imply some weaker form of DD?

First, the answer is positive, under ``natural assumptions''. It hinges on finding a sharp lower bound for the (positive) diagonal entries of any SPD A. Standard variational bounds exist, but neither bound the a_{ii}'s _ away_ from zero, nor reflect their true magnitude. We propose (and prove!) two alternative bounds, which turn out to be sharp. Basically, if A is SPD, then it is _Quasi-Diagonally Dominant (QDD)_, that is, a_{ii} is greater than a matrix-independent constant 0 < K < 1 times the sum of the off-diagonal |a_{ij}|'s.

Second, K depends only on the _ number_ of non-zero entries per row in A, and on the degree of binormalization (the minimum row norm in A divided by the maximum row norm in A). That is, any sparse binormalized SPD matrix is QDD, regardless of its sparsity pattern, or specific entry values.



Suppression of Airfoil Trailing-edge Noise via Derivative-free Shape Optimization

Alison L. Marsden, Meng Wang, John E. Dennis, Jr. and Parviz Moin
Stanford University

The application of optimization methods to physically realistic fluid mechanics problems presents a number of challenges because of the computational cost involved in doing accurate simulations and the difficulty of obtaining gradient information. In this work, derivative-free shape optimization is applied to minimize aerodynamic noise in the flow over an airfoil trailing-edge. Reduction of trailing-edge noise is relevant to a number of engineering applications including airframe noise reduction and wind turbine, hydrofoil and rotorblade design. Aeroacoustics problems such as this necessitate the use of modern computational techniques such as large-eddy simulation (LES) in order to capture a wide range of turbulence scales which are the source of broadband noise.

Due to the high computational cost, time-dependence, and complexity of aeroacoustic simulations, constrained shape optimization for noise reduction is difficult to perform using traditional optimization methods such as adjoint-based gradient methods. In this work, a tailored version of the surrogate management framework (SMF) (Booker et al., 1999) has been implemented and applied to optimize noise reduction for an airfoil trailing-edge using several shape parameters and constraints. The SMF method provides a robust and efficient alternative to gradient-based methods. Using SMF, design space exploration is performed not with the expensive actual function but with an inexpensive surrogate function. The use of a polling step in the SMF guarantees convergence to a local minimum of the cost function. In the trailing-edge problem, constraints on lift and drag are enforced using the filter method of Audet and Dennis (2000). Using this method, several interesting and surprising optimal shapes have been identified, all of which resulted in significant reduction (as much as 80%) of trailing-edge noise. These shapes have provided motivation to study the physics of the flow, and in particular, the trade-off between noise reduction and loss of lift. The results of this study demonstrate the successful application of shape optimization to a time-dependent flow problem, and validate the use of a novel adaptation of the SMF method with constraints.



Fast, Compact Sparse Matrix Inversion for Discretized Differential
Operators

Jonathan Edward Moussa
University of California at Berkeley

Sparse factored approximate inversion is one of the many kinds of incomplete factorization techniques that fill the gray area between the extremes of iterative solvers and complete factorization. Most incomplete factorizations are only suitable for preconditioning of iterative techniques due to their necessarily imposed low accuracy. Here, we demonstrate that certain factored forms retain their sparsity as their accuracy is increased towards machine precision. All numerical tests are performed on discretized differential operators, for which the presented factored forms have a physical, intuitive derivation.

The coauthor of this work is Prof. Marvin Cohen.



TRGSS: A Trust Region Generating Set Search Algorithm for Nonlinear Optimization

Ricardo Oliva
Lawrence Berkeley National Laboratory

We describe a new nonlinear optimization algorithm which combines a quasi-Newton
Trust-Region (QNTR) algorithm with a Generating-Set Search (GSS) method to solve the
nonlinear optimization problem of minimizing  f : R^{n}-> R, where f  is a continuously
differentiable function.  Our TRGSS algorithm inherits the fast (global) convergence
properties enjoyed by Trust-Region methods, while benefiting from the robust and
parallel characteristics of Generating Set Search methods.  Results from standard tests
problems for unconstrained minimization indicate that TRGSS is competitive when the
problem size n is less that the number of available processors.



Modeling of Arterial Endothelial Cell Responsiveness to Flow

John S. Tamaresis,
Department of Mathematics
University of California, Davis

Abdul I. Barakat
Department of Mechanical and Aeronautical Engineering
University of California, Davis

Arterial endothelial cell (EC) responsiveness to flow is essential for normal vascular function and plays a role in the development of atherosclerosis. However, the mechanisms by which ECs respond to flow remain incompletely understood. EC sensitivity to mechanical stimulation likely occurs via sensing of the mechanical stimulus at the cell surface with subsequent transmission through cytoskeleton to intracellular transduction sites. Furthermore, recent evidence appears to indicate that the mechanical stimulus is transduced to a chemical stimulus at these sites which subsequently initiates a sequence of biochemical signals.

Our conceptual framework treats the mechanical stimulus as a force applied to a cellular structure. We model the conversion of this force into the deformation of an EC-surface flow sensor by assuming that it is a standard linear viscoelastic solid (three-parameter Maxwell model, TPMM). Transmission of this deformation signal from the flow sensor through the cytoskeleton to intracellular transduction sites is accomplished by networks of TPMMs connected in series, parallel, and combinations of these configurations. We postulate that the deformation signal is converted into a chemical signal at an intracellular transduction site by inducing a change in its chemical state that we call activation. Our notion of a mechanosensitive molecule (MSM) combines a TPMM for deformation and first-order kinetics for the reversible transition between inactive and active states. The deformation and kinetics are coupled via phenomenological forward and reverse rate constants of the form k=A exp{(-cx)} where A and c are parameters and x is the deformation.

The deformation of a single TPMM is formulated as a linear constant-coefficient ordinary differential equation (ODE). We formulate the TPMM network model of signal transmission as a system of implicit, linear, constant-coefficient ODEs subject to algebraic constraints (differential algebraic equation system, DAE). The activation of a single MSM is formulated as a linear variable-coefficient ODE. In addition, we perform forward sensitivity analysis to study the effect of parameter variations on the steady state concentration of activated MSM. To execute the required numerical simulations for all of these problems, we use algorithms that are based on linear multistep methods derived from backwards differentiation formulas as implemented in Matlab's ODE suite and SUNDIALS (developed at Lawrence Livermore National Laboratory).

For deformations of surface flow sensor and intracellular networks as well as activation of MSM, our results suggest that responses differ based on whether the mechanical forcing stimulus is steady or oscillatory. These findings provide insight into the mechanisms by which ECs respond differently to steady versus oscillatory flow and offer a possible rationale for the preferential development of early atherosclerotic lesions in arterial regions of oscillatory flow.

Program Participants Back to main page