Tracking atoms of confusion: Brandywine faculty research spotlight

Martin Yeh of Penn State Brandywine presenting research.

Martin Yeh, assistant professor of information sciences and technology at Brandywine, recently traveled to Cyprus to present his team's research. 

Credit: courtesy of Martin Yeh

MEDIA, Pa. — Martin Yeh, assistant professor of information sciences and technology (IST) at Penn State Brandywine, is working with faculty from New York University and the University of Colorado to change the way humans understand computer coding.

Yeh and his colleagues are the masterminds behind the NSF-funded “Atoms of Confusion” project. Their work explores the most common disconnects between programmers’ commands and computer code behavior.

“We thought it was strange that people have been writing programs for decades and seem to still have the same security issues,” said Yeh. “We posited that there must be a problem between how the machine processes code commands and how people understand the machine’s responses.”

According to Yeh, difference in code interpretation was not the only issue they discovered. He and his co-researchers found themselves studying small, self-contained lines of code that are easy to misinterpret and lead naturally to bugs. They call these code lines “atoms of confusion.”

To pinpoint the most common atoms of confusion, Yeh and his team studied winning entries from the International Obfuscated C Code Contest (IOCCC), an annual competition where coders try to write the most confusingly coded programs possible.

From studying IOCCC competition entries, the researchers identified 19 coding features that better allowed them to study atoms of confusion. This research later became a paper, “Understanding Misunderstandings in Source Code,” that the team presented in 2017 at the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering.

“It’s interesting because even with the entries we studied, we found that you can’t trust the code written by experienced developers,” he said. “You might design something to avoid or eliminate atoms of confusion, but some of them are still prevalent in open-source projects, such as MySQL, Vim, Emacs, Httpd, Linux and FreeBSD, and are introduced by further development. We also found that bug fixes and comments are often near the atoms of confusion in the open-source projects we studied.”

Yeh is currently working with a graduate student at University Park to further explore the human side of atoms of confusion. The project uses an electroencephalogram, or EEG, to see whether the human brain responds differently to working with confusing versus nonconfusing code.

The team’s research on atoms of confusion is gaining momentum in other countries. Just recently, Yeh traveled to Cyprus and presented twice on the subject at the International Symposium of Methodologies for Intelligent Systems and the Research Centre on Interactive Media, Smart Systems and Emerging Technologies.

“We have really made some fascinating discoveries about the nature of code and where the most common bugs are sourced,” said Yeh. “We have great expectations moving forward.”