Mining Software Processes

Fund Coordinator

Project Description

In the proposal "Mining Software Processes", we aim for the mining and assessment of software development processes as they manifest themselves in recorded activities — and specifically discover how process features determine program features. The assistant to be funded was Kim Herzig, who started this very project with the given funding.

Not everything has turned out as originally planned in the proposal. IBM's Jazz data turned out to be unusable for our purposes, and the expert we temporarily hired for mining structured data had substantial difficulties getting acquainted with the appropriate research methods. Relying on Kim alone, we have worked together with SAP and Microsoft to mine their process databases; the most important finding so far is that while traditional product complexity metrics continue to be inadequate defect predictors, process metrics, such as the number of changes, fare much better; in particular, we have been able to show that bursts of changes — that is, several changes in a short period of time — are the best defect predictors ever for Windows, with precision and recall well over 90%. We are currently in talks with Google to mine their processes; Google already funded an intern for a two-week research stay at Saarland during April 2010.

Focusing on changes as defect predictors led to Kim developing a model on how changes depend on each other. His concept of a change genealogy — that is, a dependence graph between individual changes applied — allows for the capture of the long-term impact of changes, such as "Whenever I changed A, someone else did work on B in the following days". This change genealogy also enables expression of long-term relationships between software features; we are currently using model checking to identify matching patterns expressed in temporal logic. This infrastructure is highly promising. Today, we can already extract such rules with a confidence of well above 90%; a submission for the International Conference of Software Engineering 2011 is on its way.