Fifth Bay
Area
Scientific
Computing Day
Structure-Preserving
Macromodeling of MEMS and RF MEMS
Z. Bai, S. Bhave, D. Bindel, J. Demmel, R. Howe and J. Wang
University of California, Davis and Berkeley
Microelectromechanical systems (MEMS) are moving from simple
single function devices to elaborate structures with subtle
dynamics. The need for accurate simulations of 3D MEMS and RF MEMS
has led to large and complex models. Reduced-order models
(macromodeling) can efficiently and accurately evaluate
the input/output behaviors of such complex models. In this poster,
we will present a new structure-preserving model reduction technique
for large-scale second-order dynamical systems and applications to
MEMS simulation.
On Improving the
Performance of an
Iterative Linear Solver
Allison Baker
Lawrence Livermore National Laboratory
Two approaches to improving the performance, i.e. time to solution, of
an iterative linear solver algorithm are particularly viable. First,
algorithmic changes that improve convergence properties result in
faster convergence due to fewer overall floating-point operations.
Second, modifications to an algorithm that reduce the movement of data
through memory greatly impact performance because of the growing gap
between CPU performance and memory access time. Ideally, a balance is
achieved between improving the efficiency of an iterative linear solver
from a memory-usage standpoint and maintaining favorable numerical
properties. We discuss the restarted generalized minimum residual
(GMRES) method in the context of both approaches to improving
performance.
BEE2: A High-End Reconfigurable Computer
Chen Chang
UC Berkeley
Ever since general purpose CMOS processors based on Von Neumann
architecture (stored program, sequential execution, and time
multiplexing hardware) became the underlying fabric of high end
computing (HEC), the basic strategy to improve performance has been
twofold: 1) use CMOS circuits that have the maximum performance on a
single chip using the most advanced technology available, and 2)
assemble N of these chips along with memory attempting to achieve an N
times computational increase. However, in recent years, the ever
increasing speed gap between processor and memory, the explosion of
power consumption due to high clock rate, and the limited
instruction/thread level parallelism, have limited the overall
performance gain to far below the potential offered by the underlying
silicon fabrication technology advancements.
The BEE2 project explores FPGA-based reconfigurable computing
technology as an alternative to the exiting processor-based design for
high-end parallel computers. The goal is to design the next generation
space/cost/power efficient parallel computer using FPGAs as the basic
building blocks. The first generation of the BEE platform was
constructed in 2001 and currently in daily use at Berkeley Wireless
Research Center, as a large-scale hardware emulator for advanced DSP
and communication system designs and simulation. The Matlab/Simulink
high-
level block diagram programming environment replaces the traditional
VHDL/Verilog description as the primary method of design entry for
FPGA. Real-time I/O and matching computation is supported natively on
the system to ensure proper interface with real world analog
acquisition and actuation components. Many communication and DSP
chip/system designs have been successfully completed on the BEE system.
The BEE2 hardware platform, currently under design, is a scaled-up
version of the BEE system, and the goal is to scale beyond
communication related applications to include a wide range of high
performance general computing applications. The BEE2 system can provide
in a single server rack space, up to 25 TOPS (32-bit integer) or 1
TFLOPS (single-precision floating point) peak performance, with 640 GB
DDR2 DRAM capacity, 22 TB local disk storage, and 5Tbps I/O bandwidth
to external devices, all using the latest Xilinx Virtex-II pro FPGA
chips. The system is scalable from 4 to 128 Processing FPGA chips in a
single rack, and up to several thousands of FPGAs in multiple racks.
The new programming environment builds on top of the current Simulink
design flow on BEE, and extends it to include high-level parallel
programming models, such as Shared Address Space and MPI-2, as well as
specialized alternative programming models for particular applications,
like electromagnetic simulation for antenna design, large chip-level
SPICE simulation, large dense ad-hoc wireless network simulation. One
of the key challenges is to identify and implement the appropriate
programming methodology for the system to achieve a balance between
user productivity and high machine performance.
Power Series Methods for Inverting
Matrix Valued Functions
Aaron Diaz
Santa Clara University
We consider the problem of computing power series approximations
to the function A^{-1}(t)b(t) for real matrix valued functions A and b
of a real variable. These procedures have application whenever systems
A(t)x=b(t) need be solved for many values of the parameter t. We
demonstrate power series methods to be competitive with both direct
factorization and preconditioned iterative solvers. The bulk of the
computation can be done offline and hence these methods are
particularly
suitable for real time applications. We focus on the special case when
A is a polynomial.In this case there are explicit formulas for the
expansion coefficients in a given basis.
Applying Automated
Multi-level Substructuring to Large-scale Electromagnetic Simulation
Weiguo Gao
Lawrence Berkeley National Laboratory
The method of Automated Multi-level
Sub-structuring (AMLS)
has attracted a lot of attention recently. It has been
shown that the method can be used to solve some large-scale
(generalized) eigenvalue problems much faster than the conventional
approach.
Although the method is originally
developed in the engineering
community for vibrational analysis of large structures, recent studies
indicate that it can be effective in other disciplines as well.
In this poster presentation, we examine
the convergence properties
of AMLS and analyze its applicability to the generalized
eigenvalue problem arising from the simulation of the electromagnetic
field associated with the next generation accelerator design.
One interesting characteristic of this application is that
the stiffness matrix produced by the hierarchical vector finite
elements scheme contains a null space of large dimension. We will
present an efficient way to deflate this null space in AMLS.
Distributed
Large Scale Simulation of Immersed Boundary Systems in Titanium
Edward Givelberg and Kathy Yelick
University of California at Berkeley
We present an algorithm for a distributed computation of the immersed
boundary method. The immersed boundary method has been used to simulate
systems containing elastic material immersed in a viscous incompressible
fluid. Our two main applications are in modeling the fluid dynamics of
the heart and the cochlea (the inner ear). The algorithm has been
implemented in Titanium, a Java dialect developed at Berkeley for
large-scale
scientific computing.
Multi-protein
Complex
Data Mining for Detecting Protein
Interactions and Functional Organizations
Xiaofeng He
Lawrence Berkeley National Laboratory
Multi-protein Complex Data Mining for
Detecting Protein Interactions
and Functional Organizations" Protein Interaction Networks present a
useful perspective for understanding cellular processes. Recent
experiments employing high-throughput mass spectrometric
characterizations have resulted in large datasets of physiologically
relevant multi-protein complexes. We present a unified representation
of such datasets based on an underlying bipartite graph model that
present an advance over existing models of the network. Our unified
representation allows for weighting of connections between proteins
shared in more than one complex as well as addressing the higher level
of organization that occurs when the network is viewed as consisting of
protein complexes that share components. This representation also
allows for the application of the rigorous MinMaxCut graph clustering
algorithm for the determination of relevant protein modules in the
networks. Statistically significant annotations of clusters in the
protein-protein and complex-complex network using concepts from the
Gene Ontology suggest that this method will be useful for posing
hypotheses about uncharacterized components of protein complexes or
uncharacterized relationships between protein complexes.
Scalable
Scientific Computing Software
Research
Sophia Lefantzi, Benjamin Allan and Jaideep Ray
Sandia National Laboratories
Scientific computing has progressed to the point where
multidisciplinary teams are used to build complex simulations. In this
environment we need efficiency in the construction of the software as
well as in the runtime performance. Building and validating a
simulation usually takes far longer than
running it. We present our ongoing work enabling efficient software
construction, our experiments toward improved runtime efficiency with
the Brook streaming computing language, and how the two can be tied
together.
Diagonal Dominance of SPD Matrices
Oren Livne
Stanford University
Let A be a symmetric matrix. If A is strictly (or irreducibly)
Diagonally Dominant (DD), then A is Symmetric Positive Definite (SPD).
The converse does not hold. We raise the question: does SPD still imply
some weaker form of DD?
First, the answer is positive, under ``natural assumptions''. It
hinges on finding a sharp lower bound for the (positive) diagonal
entries of
any SPD A. Standard variational bounds exist, but neither bound the
a_{ii}'s _ away_ from zero, nor reflect their true magnitude. We
propose (and prove!) two alternative bounds, which turn out to be
sharp. Basically, if A is SPD, then it is _Quasi-Diagonally Dominant
(QDD)_, that is, a_{ii} is greater than a matrix-independent constant 0
< K < 1 times the sum of the off-diagonal |a_{ij}|'s.
Second, K depends only on the _ number_ of non-zero entries per row
in A, and on the degree of binormalization (the minimum row norm in A
divided by the maximum row norm in A). That is, any sparse binormalized
SPD matrix is QDD, regardless of its sparsity pattern, or specific
entry values.
Suppression
of Airfoil Trailing-edge Noise via Derivative-free Shape Optimization
Alison L. Marsden, Meng Wang, John E. Dennis, Jr. and Parviz
Moin
Stanford University
The application of optimization methods to physically realistic fluid
mechanics problems presents a number of challenges because of the
computational cost involved in doing accurate simulations and the
difficulty of obtaining gradient information. In this work,
derivative-free shape optimization is applied to minimize aerodynamic
noise in the flow over an airfoil trailing-edge. Reduction of
trailing-edge noise is relevant to a number of engineering applications
including airframe noise reduction and wind turbine, hydrofoil and
rotorblade design. Aeroacoustics problems such as this necessitate the
use of modern computational techniques such as large-eddy simulation
(LES) in order to capture a wide range of turbulence scales which are
the source of broadband noise.
Due to the high computational cost, time-dependence, and complexity of
aeroacoustic simulations, constrained shape optimization for noise
reduction is difficult to perform using traditional optimization
methods such as adjoint-based gradient methods. In this work, a
tailored version of the surrogate management framework (SMF) (Booker et
al., 1999) has been implemented and applied to optimize noise reduction
for an airfoil trailing-edge using several shape parameters and
constraints. The SMF method provides a robust and efficient alternative
to gradient-based methods. Using SMF, design space exploration is
performed not with the expensive actual function but with an
inexpensive surrogate function. The use of a polling step in the SMF
guarantees convergence to a local minimum of the cost function. In the
trailing-edge problem, constraints on lift and drag are enforced using
the filter method of Audet and Dennis (2000). Using this method,
several interesting and surprising optimal shapes have been identified,
all of which resulted in significant reduction (as much as 80%) of
trailing-edge noise. These shapes have provided motivation to study the
physics of the flow, and in particular, the trade-off between noise
reduction and loss of lift. The results of this study demonstrate the
successful application of shape optimization to a time-dependent flow
problem, and validate the use of a novel adaptation of the SMF method
with constraints.
Sparse factored approximate inversion is one of the many kinds
of incomplete factorization techniques that fill the gray area between
the extremes of iterative solvers and complete factorization. Most
incomplete factorizations are only suitable for preconditioning of
iterative
techniques due to their necessarily imposed low accuracy. Here, we
demonstrate that certain factored forms retain their sparsity as their
accuracy is increased towards machine precision. All numerical tests
are
performed on discretized differential operators, for which the
presented
factored forms have a physical, intuitive derivation.
The coauthor of this work is Prof. Marvin Cohen.
TRGSS: A Trust Region Generating Set
Search Algorithm for Nonlinear Optimization
Ricardo Oliva
Lawrence Berkeley National Laboratory
We describe a new nonlinear optimization algorithm which combines a
quasi-Newton
Trust-Region (QNTR) algorithm with a Generating-Set Search (GSS) method
to solve the
nonlinear optimization problem of minimizing f : R^{n}-> R, where f is a continuously
differentiable function. Our TRGSS algorithm inherits the fast
(global) convergence
properties enjoyed by Trust-Region methods, while benefiting from the
robust and
parallel characteristics of Generating Set Search methods.
Results from standard tests
problems for unconstrained minimization indicate that TRGSS is
competitive when the
problem size n is less that
the number of available processors.
Modeling
of Arterial Endothelial Cell Responsiveness to Flow
John S. Tamaresis,
Department of Mathematics
University of California, Davis
Abdul I. Barakat
Department of Mechanical and Aeronautical Engineering
University of California, Davis
Arterial endothelial cell (EC)
responsiveness to flow is
essential for normal vascular function and plays a role in the
development of atherosclerosis. However, the mechanisms by which ECs
respond to flow remain incompletely understood. EC sensitivity to
mechanical stimulation likely occurs via sensing of the mechanical
stimulus at the cell surface with subsequent transmission through
cytoskeleton to intracellular transduction sites. Furthermore, recent
evidence appears to indicate that the mechanical stimulus is transduced
to a chemical stimulus at these sites which subsequently initiates a
sequence of biochemical signals.
Our conceptual framework treats the mechanical stimulus as a force
applied to a cellular structure. We model the conversion of this force
into the deformation of an EC-surface flow sensor by assuming that it
is a standard linear viscoelastic solid (three-parameter Maxwell model,
TPMM). Transmission of this deformation signal from the flow sensor
through the cytoskeleton to intracellular transduction sites is
accomplished by networks of TPMMs connected in series, parallel, and
combinations of these configurations. We postulate that the deformation
signal is converted into a chemical signal at an intracellular
transduction site by inducing a change in its chemical state that we
call activation. Our notion of a mechanosensitive molecule (MSM)
combines a TPMM for deformation and first-order kinetics for the
reversible transition between inactive and active states. The
deformation and kinetics are coupled via phenomenological forward and
reverse rate constants of the form
k=
A exp{(-
cx)} where
A and
c are parameters and
x is the deformation.
The deformation of a single TPMM is formulated as a linear
constant-coefficient ordinary differential equation (ODE). We formulate
the TPMM network model of signal transmission as a system of implicit,
linear, constant-coefficient ODEs subject to algebraic constraints
(differential algebraic equation system, DAE). The activation of a
single MSM is formulated as a linear variable-coefficient ODE. In
addition, we perform forward sensitivity analysis to study the effect
of parameter variations on the steady state concentration of activated
MSM. To execute the required numerical simulations for all of these
problems, we use algorithms that are based on linear multistep methods
derived from backwards differentiation formulas as implemented in
Matlab's ODE suite and SUNDIALS (developed at Lawrence Livermore
National Laboratory).
For deformations of surface flow sensor and intracellular networks as
well as activation of MSM, our results suggest that responses differ
based on whether the mechanical forcing stimulus is steady or
oscillatory. These findings provide insight into the mechanisms by
which ECs respond differently to steady versus oscillatory flow and
offer a possible rationale for the preferential development of early
atherosclerotic lesions in arterial regions of oscillatory flow.