2015 • Schedule • Talks • Topics • Call for Papers • Registration • Venue • Accommodation • Postproceedings
Invited Talks
Managing Socio-Technical Dependencies in Distributed Software Development
Anita Sarma, University of Nebraska, Lincoln
Tuesday, 09:00–10:00
Analyzing Changes in Software Systems: From ChangeDistiller to FMDiff
Martin Pinzger, University of Klagenfurt
Wednesday, 09:00–10:00
Software systems continuously change and developers spent a large portion of their time in keeping track and understanding changes and their effects. Current development tools provide only limited support. Most of all, they track changes in source files only on the level of textual lines lacking semantic and context information on changes. Developers frequently need to reconstruct this information manually which is a time consuming and error prone task. In this talk, I present three techniques to address this problem by extracting detailed syntactical information from changes in various source files. I start with introducing ChangeDistiller, a tool and approach to extract information on source code changes on the level of ASTs. Next, I present the WSDLDiff approach to extract information on changes in web services interface description files. Finally, I present FMDiff, an approach to extract changes from feature models defined with the linux Kconfig language. For each approach I report on cases studies and experiments to highlight the benefits of our techniques. I also point out several research opportunities opened by our techniques and tools, and the detailed data on changes extracted by them.
Tutorials
The MetricsGrimoire toolset – a mature, industrial-strength toolset to obtain data from software repositories
Gregorio Robles, Universidad Rey Juan Carlos
Monday, 11:00–12:30
In this tutorial we will introduce the MetricsGrimoire toolset, a mature, industrial-strength toolset to obtain data from software repositories. The MetricsGrimoire toolset is the results of over 12 years of development. Its development started 12 years ago at the Universidad Rey Juan Carlos. With the creation in 2013 of Bitergia, it is now the technological basis of a start-up that offers results and analysis to free/open source software companies and communities.
Empirical Software Engineering [ slides ]
Massimiliano Di Penta, University of Sannio
Wednesday, 14:00–15:30
Empirical research is of paramount importance in any field of software engineering, as it helps to gain evidence of phenomenon occurring in software products or processes, as well as in conducting appropriate evaluation of existing approaches and tools. The goal of this tutorial is to provide a general overview of principles and techniques needed when conducting empirical studies in software engineering, with particular emphasis on controlled experiments and case studies. More specifically, the tutorial will provide an overview of the main principles for empirical study planning and design and, more importantly, it will provide some practical insights about how to perform suitable statistical analyses on the results of empirical studies or empirical evaluations. Such an overview will be provided using available datasets and showing how to perform statistical analyses using the R environment.
Hackathon
Process automation in software engineering
Bogdan Vasilescu, UC Davis, USA
Introduction Monday 16:30–17:00, Hackathon Monday 19:00–24:00
Talks
Session 1: Meta-Programming and Transformations (4 talks)
“A Meta-Level API for JavaScript Instrumentation Platforms”, Laurent Christophe, Elisa Gonzalez Boix, Coen De Roover and Wolfgang De Meuter.
JavaScript has become ubiquitous on server and client tiers of contemporary web applications. JavaScript offers many dynamic and reflective features which makes it hard to analyze JavaScript programs statically. However, dynamic analyses such as runtime taint analysis do not suffer from the same limitations and remain applicable. A well known technique to implement dynamic analysis is program instrumentation which consists in inserting analysis-related code into the target program. Several instrumentation platforms have been proposed however they require a deep understanding of JavaScript for implementing correct and precise runtime analyses. In this paper we propose a new JavaScript instrumentation platform which incorporates a meta-programming API that handles low-level concerns and can automatically reason about elaborate built-in functionalities.
“MacroRecorder: Recording and Replaying Source Code Transformations”, Gustavo Santos, Nicolas Anquetil, Anne Etien, Stéphane Ducasse and Marco Tulio Valente
During its lifetime, a software system is under continuous maintenance in order to improve its structure, to fix bugs, or to adapt to new APIs, for example. In such cases, developers sometimes systematically perform sequences of the same code changes (e.g. create a class, then extract a method) on groups of related code entities. In this paper we introduce MacroRecorder, a tool that records and replays a sequence of source code changes. We also provide discussion on ongoing work about this topic.
“Reasoning over AST Changes”, Reinout Stevens and Coen De Roover
Change distilling algorithms provide researchers with concrete change operations that transform a source Abstract Syntax Tree into a target AST. It outputs a sequence of operations that need to be applied from left to right to transform the source AST into the target one. Unfortunately, directly querying and using these change operations is non-trivial. In our latest work, we introduce a dependency graph based representation of these change operations which allows for change agnostic querying. We integrate these structures into our history querying tool QwalKeko.
“Evolution of Metaprograms, or How to Transform XSLT to Rascal”, Vadim Zaytsev
Metaprogramming is a well-established methodology of constructing programs that work on other other programs, analysing, parsing, transforming, compiling, evolving, mutating, transplanting them. Metaprograms themselves evolve as well, and there are times when this evolution means migrating to a different metalanguage. This fairly complicated scenario is demonstrated here by considering a concrete case of porting several rewriting systems of grammar extraction from XSLT to Rascal.
Session 2: Mining, Queries and Analysis
“Predicting the health of a project? An assessment in a major IT company”, Vincent Blondeau, Anne Etien, Nicolas Anquetil, Stéphane Ducasse, Sylvain Cresson and Pascal Croisy
More and more companies would like to mine software data with the goal of assessing the health of their software projects. The hope is that some software metrics could be tracked to predict failure risks or confirm good health. If a factor of success was found, projects failures could be anticipated and early actions could be taken by the organisation to help or to monitor closely the project. Allowing to act in a preventive mode rather than a curative one. We were called by a major IT company to fulfil this goal. We conducted a study to check whether software metrics can be related to project failure. The study was both theoretic with a review of literature on the subject and practical with mining of past projects data and interview with project managers. We found that metrics used in practice are not useful to assess project outcome.
“Developer Oriented and Quality Assurance Based Simulation of Software Processes”, Verena Honsel, Daniel Honsel, Jens Grabowski and Stephan Waack
Software process planning involves the consideration of process based factors, e.g., development strategies, but also social factors, e.g., collaboration of developers. To facilitate a project manager in decision making during the project, we develop an agent-based simulation tool which allows him to test different alternative future scenarios. For this, it is indispensable to understand software evolution and its influences. We cover different aspects of software evolution with models tailored towards specific questions. For the investigation of system growth, developer networks and file dependency graphs we performed two case studies of open source projects. This way, we infer parameters close to reality and are able to compare empirical with simulated results
“Parsing and Analyzing SQL Queries in Stack Overflow Questions”, Csaba Nagy and Anthony Cleve
The rapid growth and increasing popularity of Stack Overflow made it a large knowledge base of several programming topics which also attracts researchers. To mention a few examples, they study actual trends that developers follow design questions of Q&A systems island parsing techniques to analyze posts, recommendation systems, and try model the quality of the posts.
In our paper, we introduce an approach to parse and analyze SQL queries in Stack Overflow questions with the main goal to identify common patterns among them. Such similar structures in SQL statements can point to problematic language constructs (e.g. antipatterns) in SQL statements which should be avoided by developers.
“Example-Driven Model Queries”, Carlos Noguera
Model querying is an integral part of Model-Driven Engineering. Developers query models when specifying model transformations, when defining model constraints, or simply when they need to extract some information from the model. Model queries are often specified in a general programming language, with developers just navigating models through their programming interfaces. OCL is the best known model query language, and while successful, it is difficult to express complex structural properties featured by the sought after model elements. In this presentation we describe an example-driven query facility that aims at easing the description of structural features in a query. In our approach, developers can describe their queries in terms of model fragments augmented with variables and special relations. Templates are translated into logic queries which in turn are executed by Ekeko.
“Weighted Multi-Factor Multi-Layer Identification of Potential Causes for Events of Interest in Software Repositories”, Philip Makedonski and Jens Grabowski
Change labeling is a fundamental challenge in software evolution. Certain kinds of changes can be labeled based on directly measurable characteristics. Labels for other kinds of changes, such as changes causing subsequent fixes, need to be estimated retrospectively as information whether a change can be considered a cause for a subsequent fix is usually not available at the time of the change. In this article we present a weight-based approach for identifying potential causes for events of interest based on a cause-fix graph supporting multiple factors, such as causing a fix or a refactoring, and multiple layers reflecting different levels of granularity, such as project, file, class, method. We outline different strategies that can be employed to refine the weights distribution across the different layers in order to obtain more specific labelling at finer levels of granularity.
Session 3: Tools and Languages + Technology Showdown
“Overview of Reverse Engineering Tools Features and Extension”, Brice Govin, Nicolas Anquetil, Anne Etien, Stephane Ducasse and Arnaud Monegier Du Sorbier
For more than three decades, reverse engineering has been a major issue in industry wanting to capitalise on legacy systems. Lots of companies have developed reverse engineering tools in order to help developers in their work. However, those tools have been focusing on traditional information systems. Working on a time critical embedded system we found that the solutions available focus either on software behaviour structuring or on data extraction from the system. None of them seem to be clearly using both approaches in a complementary way. In this paper, based on our industrial experiment, we list the requirements that such a tool should fulfil. We also present a short overview of existing reverse engineering tools and their features.
“On the Evaluation of a DSL for Architectural Consistency Checking”, Andrea Caracciolo
Software architecture erodes over time and needs to be constantly monitored to be kept consistent with its original intended design. Consistency is rarely monitored using automated techniques. The cost associated to such an activity is typically not considered proportional to its benefits.
To improve this situation, we propose Dict ̄o, a uniform DSL for specifying architectural invariants. This language is designed to reduce the cost of consistency checking by offering a framework in which existing validation tools can be matched to newly-defined language constructs.
In this paper we discuss how such a DSL can be qualitatively and qualitatively evaluated in practice.
Showdown: “A Search-based Approach for Generalizing and Refining Source Code Templates”, Tim Molderez and Coen De Roover
Code templates are a convenient means to search and transform source code. However, such templates may still be difficult to specify, as they can produce too few or too many matches. To assist the users of our Ekeko/X program transformation tool, we have provided a suite of mutation operators for code templates and have designed a genetic search algorithm to recommend modifications to templates.
Showdown: “BibSLEIGH: Bibliography of Software Language Engineering in Generated Hypertext”, Vadim Zaytsev
The body of research contributions is vast and full of papers. Projects like DBLP help us navigate through it and relate authors to papers and papers to venues. Relating papers to papers is already considerably harder, and projects like Google Scholar do their best to battle all the variations in citations. Relating papers to topics is an open problem with some automated methods under development but mostly manual contributions. Relating papers to concepts and especially concepts to concepts is impossible without expert intervention and sometimes requires years of research to accomplish in a convincing manner.
BibSLEIGH has started in 2014 as a personal project for pretty-printing, normalising and eventually annotating bibliographic items. At SATToSE 2015 I would like to discuss its potential in a broader scope.
Session 4: Human / Developer Factors
“Pretty Printers: anatomy and measured effects on productivity of teams of developers”, Carlos Ulderico Cirello Filho
Production of software involves writing source code. This process takes several forms, among them the disposition of text so it becomes easy for human eyes to read and understand what is being done. The formative years of a developer mold the vision of what is right and comfortable to work with. Aside the debate of coding style choices, there is also the impact of merge conflicts in versioning systems when changed source code possesses both cosmetic and content changes, therefore halting merging from long forked pieces of code.
“Interactive User-Oriented Views for Better Understanding Software Systems”, Truong Ho-Quang
Understanding about software artefacts is a crucial task for people who want to join in any software development process. However, because of the large amount of detailed and scattered information in software artefacts, understanding them is usually very time-consuming and vulnerable to human errors and subjectivities. A system that aids practitioners to investigate understanding about software artefacts could reduce the vulnerabilities and speed up software development/maintenance process. Our research focuses on building a comprehensive view of software system in order for developers to achieve the two goals: (i) to save the time spending on searching and navigating on source code; AND (ii) to gain better understanding about software artefacts regarding to domain-specific tasks. To achieve these goals, we propose an empirical approach which has been conducted and received preliminary results.
“Collaboration Networks in Software Development: Perspectives from Applying different Granularity Levels using Social Network Analysis”, Miguel Ángel Fernández, Gregorio Robles and Jesus M. Gonzalez-Barahona
This paper shows research in progress in the analysis of collaboration networks found in software development projects. Traditionally, collaboration networks are obtained by analyzing collaboration in the same file or module/directory; when two developers perform modifications on the same entity during a given time period it is assumed that they are at least implicitly collaborating. In our research, we want to study how the granularity of the software artifact affects the research output of collaboration graphs. In this regard, we obtain traditional graphs based on collaboration in files and augment it with information of collaboration at the function/method level. In the future we want to include developer affiliation information to perform a collaboration analysis at the company level.
“Controlled Experiment to Assess a Test-Coverage Visualization: Lesson Learnt”, Alexandre Bergel, Vanessa Peña and Tobias Kuhn
Evaluating a software visualization is a difficult and error-prone activity. In this short paper, we report our experience
when conducting a controlled experiment to evaluate test blueprint, a visualization to assess the test coverage. Our experiment went through two iterations. The first iteration was unfortunately inconclusive, due to some decisions we took that we are now considering as mistakes. After revising our experiment, we obtained exploitable results, which are also matching our intuition.
Session 5: Bugs and Software Quality
“Predicting Software Quality through Network Analysis”, Giulio Concas, Michele Marchesi, Cristina Monni, Matteo Orru' and Roberto Tonelli
We used a complex network approach to study the evolution of a large software system, Eclipse, with the aim of statistically characterize software defectiveness along the time. We studied the software networks associated to several releases of the system, focusing our attention specifically on their community structure, modularity and clustering coefficient. We found that the maximum average defect density is related to two different metrics: the number of detected communities inside a software network and the clustering coefficient. These two relationships both follow a power-law distribution which leads to a linear correlation between clustering coefficient and number of communities. These results can be useful to make predictions about the evolution of software systems, especially with respect to their defectiveness.
“Null Check Analysis”, Haidar Osman
Null dereferencing is one of the most frequent bugs in Java systems causing programs to crash due to the uncaught NullPointerException. Developers often fix this bug by introducing a guard (i.e., null check) on the potentially-null objects before using them.
In this paper we investigate the null checks in 717 open-source Java systems to understand when and why developers introduce null checks. We find that 35% of the if -statements are null checks. A deeper investigation shows that 71% of the checked-for-null objects are returned from method calls. This indicates that null checks have a serious impact on performance and that developers introduce null checks when they use methods that return null.
“Detecting Violations of CSS Code Conventions”, Boryana Goncharenko
Code conventions are used to preserve code base consistency and express preference of a particular programming style. Often, code conventions are described in natural language and developers need to apply them manually. Existing tools typically offer a predefined set of rules that cannot be customized. This project aims at allowing CSS developers to express an arbitrary set of code conventions and detect their violations automatically. The solution requires designing a domain specific language capable of expressing existing conventions and implementing its interpreter to automatically detect their violations.
“Design by Contract and Modular Reasoning in Aspect-Oriented Languages”, Tim Molderez and Dirk Janssens
Aspect-oriented programming aims to separate crosscutting concerns into their own modules, called aspects. While aspects can achieve this at a syntactic level, this is done at the expense of modular reasoning: Whenever a method call is made, all aspects should be inspected to determine whether or not that call's behaviour will be affected by an aspect. To restore modular reasoning, we present a two-part approach that does not affect the programming language itself, but rather governs how to write contracts for aspect-oriented programs.
Session 6: Empirical Studies and Industrial Experience
“An empirical study of identical function clones in CRAN”, Maëlick Claes
Code clone analysis is a very active subject of study, and research on
inter-project code clones is starting to emerge. This talk presents an empirical study of identical function clones in the CRAN package archive network, in order to understand the extent of this practice in the R community. Depending on too many packages may hamper maintainability as unexpected conflicts may arise during package updates, but duplicating functions from other packages may also reduce package maintainability. We study how the characteristics of cloned functions in CRAN snapshots evolve over time, and classify these clones depending on what has prevented package developers to rely on dependencies instead.
“Driving the Evolution of Cloud Software towards Energy Awareness”, Christophe Ponsard, Jean-Christophe Deprez and Dimitri Durieux
ICT energy efficiency is a growing concern. Large effort has already been spent making hardware energy aware and improving hardware energy efficiency, and further progress in this area requires to consider evolving the software layer to more energy awareness. Although specific work is devoted in areas like embedded/mobile systems, much remains to be done at software level for Cloud applications. Software developers need an energy aware Cloud infrastructure as well as code development support to make informed decisions about energy efficiency and compromise with other important non-functional requirements like performance. This evolution to energy awareness impacts a number of artifacts including requirements, design, code and tests.
In the scope of this paper, we focus on the evolution of analysis phase (requirements, design and test load specification) with the limited goal of enabling the collection energy measurement data that can be used for a design time evolution and will later enable dynamically adaption scenarios. In order to help Cloud application developers, we propose a framework composed of (1) a Goal-Question-Metric analysis of energy goals, (2) a UML profile for relating energy requirements and associated KPI metrics to application design and deployment elements, and (3) an automated Cloud deployment of energy probes able to monitor those KPI and aggregate them back to questions and goals.
“On the use of Java database frameworks in Java projects – A large-scale historical empirical analysis”, Mathieu Goeminne
More and more software systems are becoming data-intensive, and require intensive access to a database. In this paper, we empirically study a large corpus of data-intensive Java projects as well as the database frameworks used in these projects to manage the connection between the source code and the database. We also study how this usage evolves over time. In particular, we study whether certain database frameworks are used simultaneously, whether some database frameworks get replaced over time by others. Using the statistical technique of survival analysis, we also study the survival of database frameworks in the Java projects in which they are used, as well as the survival of these projects themselves.
“Mutation Testing: An Industrial Experiment”, Ali Parsai, Quinten David Soetens and Serge Demeyer
To assess the ability of a test suite to catch faults, an accurate quality metric is needed. Mutation testing is a reliable and repeatable approach that provides such metric. However, because of its computationally intensive nature and the difficulties in applying such a technique to complex systems, it has not been widely adopted in industry. This study aims to determine if the information gathered by this method is worth the performance costs in an industrial case.