Yale PL Day 2015

Venue

The Yale PL Day will take place on Wednesday, November 04, 2015, at the Yale Computer Science Department. The address is 51 Prospect Street, New Haven, CT.

themecircle.net

The morning talks will take place in the room AKW 000 (the basement / ground floor of the computer science building, depending on through which doors you entered), while the afternoon talks will take place in AKW 400, also the computer science building.

Everyone is invited to attend.

Program

9:15 - 9:30	Welcome and introductions
9:30 - 09:50	Norman Danner (Wesleyan): Provably correct extraction of cost recurrences from higher-order programs
9:50 - 10:30	Johannes Kloos (MPI - SWS): Equivalence Testing for JavaScript Transformations
10:30 - 11:00	Zvonimir Pavlinovic (NYU): (Efficiently) Finding Minimum Type Error Sources
11:00 - 11:30	Quentin Carbonneaux (Yale): Invariant logics and their theory
BREAK	hike to East Rock Park, lunch at Nica's
2:00 - 2:30	Philippe Suter (IBM): ActiveSheets: Stream Processing with a Spreadsheet
2:30 - 3:00	Ronghui Gu (Yale): Deep specifications and certified abstraction layers
3:00 - 3:30	Mikaël Mayer (EPFL): Disambiguation for Programming by Example
3:30 - 4:00	Break
4:00 - 4:30	Siddharth Krishna (NYU): Learning to Verify the Heap
4:30 - 5:00	Eric Koskinen (Yale): Reducing Crash Recoverability to Reachability
5:00 - 6:00	Damien Zufferey (MIT): PSync : A partially synchronous language for fault-tolerant distributed algorithms

Damien Zufferey (MIT): PSync : A partially synchronous language for fault-tolerant distributed algorithms

Abstract

Fault-tolerant distributed algorithms play an important role in many critical/high-availability applications. These algorithms are notoriously difficult to implement correctly, due to asynchronous communication and the occurrence of faults, such as the network dropping messages or computers crashing.

We introduce PSync , a domain specific language based on the Heard-Of model, which views asynchronous faulty systems as synchronous ones with an adversarial environment that simulates asynchrony and faults by dropping messages. We define a runtime system for PSync that efficiently executes on asynchronous networks. We formalise the relation between the runtime system and PSync in terms of observational refinement. This high-level synchronous abstraction introduced by PSync simplifies the design and imple- mentation of fault-tolerant distributed algorithms and enables auto- mated formal verification.

We have implemented an embedding of PSync in the Scala programming language with a runtime system for partially synchronous networks. We show the applicability of PSync by implementing several important fault-tolerant distributed algorithms and we compare the implementation of consensus algorithms in PSync against implementations in other languages in terms of code size, runtime efficiency, and verification.

This is joint work with Cezara Dragoi and Tom Henzinger.

Short bio

Damien Zufferey is a Postdoctoral researcher in Martin Rinard's group at MIT CSAIL since Octorber 2013. Before moving to MIT, He obtained a PhD at the Institute of Science and Technology Austria (IST Austria) under supervision of Thomas A. Henzinger in September 2013 and a Master in computer science from EPFL in 2009. He is interested in improving software reliability by developing theoretical models, building analysis tools, and giving the programmer the appropriate language constructs. He is particularly interested in the study of complex concurrent systems. His research lies at the intersection of formal methods and programming languages.

Time: 45 minutes talk.

Mikaël Mayer (EPFL): Disambiguation for Programming by Example

Abstract

Programming by Examples (PBE) has the potential to revolutionize end-user programming by enabling end users, most of whom are non-programmers, to create small scripts for automating repetitive tasks. However, examples, though often easy to provide, are an ambiguous specification of the user's intent. Because of that, a key impedance in adoption of PBE systems is the lack of user confidence in the correctness of the program that was synthesized by the system. We present two novel user interaction models that communicate actionable information to the user to help resolve ambiguity in the examples. One of these models allows the user to effectively navigate between the huge set of programs that are consistent with the examples provided by the user. The other model uses active learning to ask directed example-based questions to the user on the test input data over which the user intends to run the synthesized program. Our user studies show that each of these models significantly reduces the number of errors in the performed task without any difference in completion time. Moreover, both models are perceived as useful, and the proactive active-learning based model has a slightly higher preference regarding the users' confidence in the result.

Short bio

Mikaël is a fourth year PhD student at EPFL (Lausanne, Switzerland), working under the supervision of Viktor Kuncak.

Mikaël wants to reduce the gap between user intent and programming. For that he developed in his master Comfusy, a tool that synthesize programs out of arithmetic constraints. Although it was complete, he figured out that most user intentions were not mathematical at all. He then started his PhD with Pong Designer, a game engine by demonstration on Android, in which user could create Pacman or brick-breaker games using touch and demonstration. Later, he discovered as many people that demonstrations or examples are ambiguous no matter what, so during an extended internship at Microsoft he was able to find ways to resolve ambiguities, which he is going to present.

Mikaël loves music. He plays the piano, accordion, ukulele, wrote songs for birthday parties, symphonies for home made 3D animations, and much more. Mikaël loves math. With a friend, they designed www.reflex4you.com, a website linked to zazzle to let people offer gifts with beautiful customized math art on it.

Mikaël loves education. With Upbraining©, they developed Learn to Write, an android application for pre-writing with fun challenges for kids.

Time: 25 minutes talk, 5 minutes questions.

Johannes Kloos (MPI - SWS): Equivalence Testing for JavaScript Transformations

Abstract

Modern web applications use dynamic HTML and Javascript programs on web browsers to implement highly complex features. This introduces new optimization challenges: A typical web application will download, on average, 350 kB of Javascript code. For network-bound settings, such as mobile devices, the initial download time causes the user experience to deteriorate. In response, web applications perform several source-to-source transforms such as minimization or lazy loading. While these improve the download times significantly, they can interact in subtle ways with the complex and dynamic language features, causing unexpected bugs.

We provide a generic dynamic analysis for equivalence testing for Javascript code and apply it to check a transformation that rewrites JavaScript code to replace functions with stubs which load their actual code on first invocation.

Our analysis collects execution traces of the program before and after transformation, annotates the traces with state information, and then runs a matching algorithm that checks the traces are observationally equivalent. We define observational equivalence as ``same sequence of calls to library functions with same arguments'' and show that this is appropriate under a few assumption on limited introspection.

We demonstrate the effectiveness of our testing methodology on a commercial implementation of this transformation.

Short bio

Johannes Kloos is a PhD student at MPI-SWS in Kaiserslautern, Germany. He is working with Rupak Majumdar in the area of software verification. His interests include concurrency (especially asychronous concurrency), static and dynamic analysis, and figuring out how to apply all of this to JavaScript and web applications.

Time: 40 minutes talk.

Ronghui Gu (Yale): Deep specifications and certified abstraction layers.

Abstract

Modern computer systems consist of a multitude of abstraction layers (e.g., OS kernels, hypervisors, device drivers, network protocols), each of which defines an interface that hides the implementation details of a particular set of functionality. Client programs built on top of each layer can be understood solely based on the interface, independent of the layer implementation. Despite their obvious importance, abstraction layers have mostly been treated as a system concept; they have almost never been formally specified or verified. This makes it difficult to establish strong correctness properties, and to scale program verification across multiple layers.

In this talk, I will present a novel language-based account of abstraction layers and show that they correspond to a strong form of abstraction over a particularly rich class of specifications which we call deep specifications. Just as data abstraction in typed functional languages leads to the important representation independence property, abstraction over deep specification is characterized by an important implementation independence property: any two implementations of the same deep specification must have contextually equivalent behaviors. We present a new layer calculus showing how to formally specify, program, verify, and compose abstraction layers. We show how to instantiate the layer calculus in realistic programming languages such as C and assembly, and how to adapt the CompCert verified compiler to compile certified C layers such that they can be linked with assembly layers. Using these new languages and tools, we have successfully developed multiple certified OS kernels in the Coq proof assistant, the most realistic of which consists of 37 abstraction layers, took less than one person year to develop, and can boot a version of Linux as a guest.

Short bio

Ronghui Gu is a fifth year Ph.D. student at Yale FLINT Group and supervised by Prof. Zhong Shao. His work focuses on building reliable and secure low-level systems. During the past few years, he and his Yale colleagues built CertiKOS, the first fully verified hypervisor. Ronghui earned his B.S. at Tsinghua University and his undergraduate thesis is about verifying the preemptive scheduler of uC/OS-II.

Time: 30 minutes talk.

Quentin Carbonneaux (Yale): Invariant logics and their theory

Abstract

I present a very recent work on a new semantics for Hoare Logic. Instead of using Hoare Logic to prove functional correctness, I recycle it to prove invariants true during the whole execution of a program. The pre-conditions are put forward and characterize the initial states allowing the program to execute safely (i.e. not break the invariant). The post-conditions now simply serve compositionality and allow modular reasoning. I will show on a trivial language how we define the semantic validity of Hoare triples in such a setting and how soundness and completeness of the logic are proved. Example applications of the described logic are the usual 'this program does not get stuck', but also the more interesting 'this program does not run out of memory'.

Short bio

Quentin is a fourth year PhD student working under Zhong Shao's supervision at Yale. He worked on tools and techniques for formal reasoning on the resource usage of computer programs. He does not like to talk about himself using the third person.

Time: 30 minutes talk.

Norman Danner (Wesleyan): Provably correct extraction of cost recurrences from higher-order programs

Abstract

A central method for analyzing the asymptotic complexity of a functional program is to extract and then solve a recurrence that expresses evaluation cost in terms of input size. The relevant notion of input size is often specific to a datatype, with measures including the length of a list, the maximum element in a list, and the height of a tree. In this work, we give a formal account of the extraction of cost and size recurrences from higher-order functional programs over inductive datatypes. Our approach allows a wide range of programmer-specified notions of size, and ensures that the extracted recurrences correctly predict evaluation cost. To extract a recurrence from a program, we first make costs explicit by applying a monadic translation from the source language to a complexity language, and then abstract datatype values as sizes. Size abstraction can be done semantically, working in models of the complexity language, or syntactically, by adding rules to a preorder judgement. Different models of the complexity language encode different notions of size, which in turn permit different kinds of cost analyses. Underlying the process is a logical relations argument that the extracted recurrences are upper bounds for evaluation cost; the proof is entirely syntactic and therefore applies to all of the models we consider.

Short bio

Norman Danner is an Associate Professor of Computer Science at Wesleyan University, where he has been since 2002. His earlier work in implicit computational complexity has led him to be interested in bringing tools for reasoning about correctness of programs to the domain of reasoning about complexity.

Time: 20 minutes talk.

Siddharth Krishna (NYU): Learning to Verify the Heap

Abstract

We present a novel data-driven verification framework able to prove memory safety and functional correctness of heap-manipulating programs. For this, we first use a new algorithm based on statistical machine learning to map observed program states (from runs on random inputs or from a test suite) to separation logic formulae representing the invariant shape of data structures at relevant program locations. We then attempt to verify these predictions using a theorem prover, where counterexamples to the proposed invariants are used as additional input for the shape predictor in a refinement loop. Once valid shape invariants have been obtained, we use a second learning algorithm to strengthen them with data invariants, again employing a refinement loop using the underlying theorem prover.

We have implemented our techniques in Cricket, an extension of the GRASShopper verification tool. Cricket is able to automatically prove memory safety and correctness of a variety of classical heap-manipulating algorithms beyond the reach of current tools.

Short bio

Siddharth Krishna is a third year PhD student at NYU Courant, working with Thomas Wies. He is interested in applications of Machine Learning in Formal Methods, and has worked on ML algorithms to learn invariants in program verification..

Time: 30 minutes talk.

Zvonimir Pavlinovic (NYU): (Efficiently) Finding Minimum Type Error Sources

Abstract

Compilers for statically typed functional programming languages are notorious for generating confusing type error messages. When the compiler detects a type error, it typically reports the program location where the type checking failed as the source of the error. Since other error sources are not even considered, the actual root cause is often missed. A more adequate approach is to consider all possible error sources and report the most useful one subject to some usefulness criterion. We show that this approach can be formulated as an optimization problem related to satisfiability modulo theories (SMT). This formulation cleanly separates the heuristic nature of usefulness criteria from the underlying search problem. Unfortunately, algorithms that search for an optimal error source cannot directly use principal types which are crucial for dealing with the exponential-time complexity of the decision problem of polymorphic type checking. As a solution to this problem, we also present a new algorithm that efficiently finds an optimal error source in a given ill-typed program. Our algorithm uses an improved SMT encoding to cope with the high complexity of polymorphic typing by iteratively expanding the typing constraints from which principal types are derived. The algorithm preserves the clean separation between the heuristics and the actual search. We have implemented our algorithm for OCaml. In our experimental evaluation, we found that the algorithm reduces the running times for optimal type error localization from minutes to seconds scaling better than previous localization algorithms, and has the potential to significantly improve the quality of type error reports produced by state of the art compilers.

Short bio

Zvonimir Pavlinovic is a third year PhD student at New York University supervised by Thomas Wies. Zvonimir is interested in automated debugging and repair of program errors, currently focusing on the problem of automatic localization of static type errors. His other research interests include program verification and type systems. In his very limited free time, Zvonimir likes to play soccer.

Time: 30 minutes talk.

Philippe Suter (IBM): ActiveSheets: Stream Processing with a Spreadsheet

Abstract

Continuous data streams are ubiquitous in finance, healthcare, engineering, etc., and it is often crucial for them to be analyzed in real-time. Stream processing is a programming paradigm that addresses this challenge. Our objective is to make it easier for analysts, with little programming experience, to develop continuous analytics applications directly. We propose enhancing a spreadsheet, a pervasive and familiar tool, to obtain a programming platform for stream processing. We present the design and implementation of an enhanced spreadsheet that enables visualizing live streams, live programming to compute new streams, and exporting computations to be run on a server where they can be shared with other users, and persisted beyond the life of the spreadsheet.

Short bio

Philippe Suter is a Research Staff Member at IBM Research in Yorktown Heights, NY, working on programming languages. His interests are around making future programming environments more powerful for expert users and easier to approach for novices. Until recently, he believed ukuleles were always tuned similarly to the first four strings of a guitar.

Time: 25 minutes talk.

Eric Koskinen (Yale): Reducing Crash Recoverability to Reachability

Abstract

Software applications run on a wide variety of platforms (filesystems, virtual slices, mobile hardware, etc.) that do not provide 100% uptime. As such, these applications may crash at any unfortunate moment losing volatile data and, when re-launched, they must be able to correctly recover from potentially inconsistent states left on disk. From a verification perspective, crash recovery bugs can be particularly frustrating because, even when it has been formally proved for a program C that C satisfies propery P, the proof is foiled by these external events that crash and restart the program.

I will discuss our recent work on this problem. First, we provide a heirarchical formal model of what it means for a program to be crash recoverable. Intuitively, we say that a program correctly recovers from a crash if the re-execution after the crash results in the same behaviors as an uncrashed execution from one of the initial states. A program is *crash recoverable* if it correctly recovers from all possible crashes. This specification captures the crash recoverability of many real world programs, including those in our evaluation.

Next, I will discuss a novel technique capable of automatically proving that a program correctly recovers from a crash via a reduction to reachability. Our technique takes an input control-flow automaton and transforms it into a novel encoding that blends the capture of snapshots of pre-crash states into a symbolic search for a proof that (i) recovery terminates and (ii) every recovered execution simulates some crash-free execution.

Finally, I will describe our implementation in a tool called Eleven82, capable of analyzing C/C++ programs to detect recoverability bugs or prove their absence. We have applied our tool to benchmark examples drawn from industrial file systems and databases, including GDBM, LevelDB, LMDB, PostgreSQL, SQLite, VMware and ZooKeeper.

[This is joint work with Junfeng Yang. To appear in POPL 2016.]

Short bio

Eric Koskinen is a Research Scientist at Yale University. In recent years he has been a Visiting Professor at New York University, and a Research Staff Member at IBM TJ Watson. He received his Ph.D in Computer Science from the University of Cambridge. Previously, he was a software engineer at Amazon.com. Eric’s research is centered around devising mathematical techniques that aid the development of safe concurrent software, and applying those techniques to realistic computer systems. To this end, Eric has made advances along a spectrum of fields, ranging from systems/concurrency, to foundational results in formal methods.