RELIABLE SYSTEMS RESEARCH @CMU

Programming Reliable Systems
Carnegie Mellon University

The Programming Reliable Systems research projects seek to develop approaches which will improve the reliability of complex software systems. The focus is on both the software development process and products.

The goal is first the identification and characterization of reliability problems within applications by the analysis of the source program, dynamic testing, and user studies. How did the software not work properly? Did it crash or reboot the machine, abort the application, or perhaps silently fail and produced incorrect results. The second goal is to identify common programming problems and then approaches for mitigating the problem without enforcing added complexity or severe restrictions upon the programmer. Many causes of software reliability problems include: non-orthoginal error reporting, lack of knowledge management during the software development processing, and an over-abstraction of the computer system.

RESEARCH THRUSTS

FlakyIO is an extreme I/O testing project that examines the ability of applications to handle exceptions. We use callee-generated software exception generation to determine an application's ability to handle error conditions. The concept of the FlakyIO architecture can be applied to any subsystem or module that generates an exception for the caller to handle. We chose to explore I/O because I/O has a well-known standard for error generation. We are able to test a large number of applications without having to customize the exception generation. Guide to writing your own FlakyIO system

Program Data Characteristics is an approach that helps guide a programmer by explaining how data is currently being used in the application. For example, while you may not have identified a variable as a const, does it act like one?

Thread Analysis shows program characteristics to help eliminate thread problems. Our approach is to identify the thread safe code. Rather than force a developer to look at the entire application base, simply eliminate what we know is thread-safe.

Null Pointer Analysis examines the pointers that are passed into subroutines. What subroutines fail to check the pointers to make sure a NULL pointer has not been passed in and indirected upon, causing a crash. If the subroutine does not protect itself, then the caller should. We simply let you know where the vulnerabilities are.

SCUM is a static program analysis mechanism for providing a robustness metric of an application. The identification of general robustness problems can be used to provide feedback to the programmer to direct the manual insertion of error checks into the application code at the most appropriate location.

EDUCATIONAL MATERIAL

Guide to Programming and Debugging Reliable Systems The idea about programming and debugging reliable systems is to get past the natural assumptions people make when writing programs. At times people create almost an urban legend about what is and what is not possible. It does not matter what programming language you write in, this guide will help you to understand how programs work, and how you can write more reliable applications and also be able to debug these applications when something goes wrong.

Structures of Programming Languages Course is an analytical examination of modern high-level programming language structures, including design specification and implementation, advanced forms of data types, expressions and control primitives.

Social Implications of Technology Course is based around the fundamental discussion of risk, privacy, and security. The ility topics include: reliability, integrity, quality, security, privacy, difficulty, and huamity. Does technology dehumanize us? Do we have an obligation to making our software easier? Do we still need the EULA? Can we solve all reliability and integrity problems with more testing? In the course of raising awareness of the social implications of technology, this course is designed to help the student increase her/his critical thinking ability.

Programming Reliable and Secure Software Seminar is designed to illustrate the problems and issues involved in developing reliable and secure software. It is an all-day seminar for practicing engineers that covers in-depth topics on how programs work and fail, from the compilation system to memory management to interaction with the operating system and the external world. The emphasis is on debugging and code maintenace, after all those phases account for a significant portion of the software development process.

PUBLICATIONS

Framework for Exercising I/O Exception Handling Code. Michael W. Bigrigg. International Journal of Information and Communication Technology. Vol. 1 No. 3/4. 2008.

Teaching a Multicultural Perspective in Software Engineering. Michael W. Bigrigg and Karen J. Filipski. Frontiers in Education, October 10th-13th, 2007, Milwaukee WI, USA.

Empirical Testing of the Handling of Reliability-Aware Storage Devices. Michael W. Bigrigg. Workshop on Empirical Evaluation of Dependability and Security, in conjunction with the International Conference on Dependable Systems and Networks, DSN-2006; June 25th-28th, 2006, Philadelphia PA, USA.

Using the FlakyIO Exception Injection Framework Zachary Pavlov, Michael Duke, and Michael W. Bigrigg. PACISE 21st Annual Conference, April 7th-8th 2006, Indiana PA.

Using Social Engineering to Design Secure Software. Tyler St. John, Ken Estes, and Michael W. Bigrigg. PACISE 21st Annual Conference, Indiana PA, April 8, 2006.

The Quality of Compiler Messages and Code Integrity. Lidiya Ber and Michael W. Bigrigg. PACISE 21st Annual Conference, Indiana PA, April 8, 2006. (Superceeds CMU-04-09-03)

An Evaluation of the Usefulness of Compiler Error Messages. Michael W. Bigrigg, Russell Bortz, Shyamal Chandra, Jared Sheehan, and Sara Smith. Carnegie Mellon University Technical Report CMU-04-09-03, December 2003. Abstract | PDF

Robustness Hinting for Improving End-to-End Dependability. Michael W. Bigrigg. Second Workshop on Evaluating and Architecting System Dependability (EASY). In conjunction with ASPLOS-X, San Jose CA, October 2002. Abstract | PDF

The Set-Check-Use Methodology for Detecting Error Propagation Failures in I/O Routines. Michael W. Bigrigg, Jacob J. Vos. Workshop on Dependability Benchmarking in conjunction with The International Conference on Dependable Systems and Networks, DSN-2002. Washington DC, June 2002. Abstract | PDF

Testing the Portability of Desktop Applications to a Networked Embedded System. Michael W. Bigrigg and Joseph G. Slember. Workshop on Reliable Embedded Systems, in conjunction with the 20th IEEE Symposium on Reliable Distributed Systems, New Orleans, LA, October 28, 2001. Abstract | PDF

A Case for the Management of Graceful Degredation. Michael W. Bigrigg. Carnegie Mellon University Technical Report CMU-04-5-01. May 2001. Abstract | PDF

RELATED PROJECTS

Static Program Analysis for Reliability

LcLint, University of Virginia
Software Productivity Tools, Microsoft Research
Meta-level Compilation, Stanford
Open Source Quality, Berkeley
CodeWizard, Parasoft
Extended Static Checking, Compaq Research

Dynamic Program Analysis for Reliability

Ballista, Carnegie Mellon University
Fuzz, University of Wisconsin
Pervasive Dependability, AT&T Research
CrashMe
Bug Isolation, Berkeley

Program Analysis for Software Engineering

Aristotle, Georgia Tech
Testing Multi-threaded Programs, University of Arizona
Specification Mining, Berkeley
Program Analysis Group, MIT

PEOPLE

Michael Bigrigg
Project Scientist, Carnegie Mellon University

Student Programmers, Carnegie Mellon University
Xavier Appe, Michael Duke, Madhur Joshi, Jeff Knupp, Morgan Linton, Zachary Pavlov

Student Programmers, Indiana University of Pennsylvania
Ken Estes

Student Programmers, University of Pittsburgh
Lidiya Ber, Kevin Kibler, Alexander Poulis, Joe Slember, Tyler St. John, Jacob Vos, Christy Wilson