NEWS

Facebook’s TransCoder AI converts code from one programming language into another

Facebook researchers say they’ve developed what they call a neural transcompiler, a system that converts code from one high-level programming language like C++, Java, and Python into another. It’s unsupervised, meaning it looks for previously undetected patterns in data sets without labels and with a minimal amount of human supervision, and it reportedly outperforms rule-based baselines by a “significant” margin.

Migrating an existing codebase to a modern or more efficient language like Java or C++ requires expertise in both the source and target languages, and it’s often costly. For example, the Commonwealth Bank of Australia spent around $750 million over the course of five years to convert its platform from COBOL to Java. Transcompilers could help in theory — they eliminate the need to rewrite code from scratch — but they’re difficult to build in practice because different languages can have a different syntax and rely on distinctive platform APIs, standard-library functions, and variable types.

Facebook’s system — TransCoder, which can translate between C++, Java, and Python — tackles the challenge with an unsupervised learning approach. TransCoder is first initialized with cross-lingual language model pretraining, which maps pieces of code expressing the same instructions to identical representations regardless of programming language. (Input streams of source code sequences are randomly masked out, and TransCoder is tasked with predicting the masked-out portions based on context.) A process called denoising auto-encoding trains the system to generate valid sequences even when fed with noisy input data, and back-translation allows TransCoder to generate parallel data that can be used for training.

When translating from C++ to Java, 74.8% of TransCoder’s generations returned the expected outputs.
When translating from C++ to Python, 67.2% of TransCoder’s generations returned the expected outputs.
When translating from Java to C++, 91.6% of TransCoder’s generations returned the expected outputs.
When translating from Python to Java, 56.1% of TransCoder’s generations returned the expected outputs.
When translating from Python to C++, 57.8% of TransCoder’s generations returned the expected outputs.
When translating from Java to Python, 68.7% of TransCoder’s generations returned the expected outputs


Source : VentureBeat