Run your Java applications on Big Data Frameworks

Casper is a compiler that can automatically retarget sequential Java programs to Big Data processing frameworks such as Spark, Hadoop or Flink.

MapReduce is a popular programming paradigm for running large-scale data-intensive computation. Recently, many frameworks that implement that paradigm have been developed. To leverage such frameworks, however, developers need to familiarize with each framework’s API and rewrite their code. We present Casper, a new tool that automatically translates sequential Java programs to the MapReduce paradigm. Casper automatically identifies potential code fragments to rewrite and translate them in two steps: first, Casper uses program synthesis to search for a program summary (i.e., a functional specification) of each fragment. The summary is expressed using a high-level intermediate language resembling the MapReduce paradigm. Next, each found summary is verified tobe semantically equivalent to the original using a theorem prover. Casper then generates executable code from the summary, using either the Hadoop, Spark, or Flink API. We have evaluated Casper by automatically converting real-world sequential Java benchmarks to MapReduce. The resulting benchmarks perform up to 32.2x faster compared to the original.

