ISSTA2024

Non-reactive Programs	Reactive Programs

Synchronous	Event-Driven Asynchronous

	Accepted	Rejected	Ignored	Total
w/ perf. results	8 (53.3%)	2 (13.3%)	5 (33.3%)	15 (100%)
w/o perf. results	3 (20.0%)	4 (26.7%)	8 (53.3%)	15 (100%)

	Accepted	Rejected	Ignored	Total
w/ perf. results	8 (53.3%)	2 (13.3%)	5 (33.3%)	15 (100%)
w/o perf. results	3 (20.0%)	4 (26.7%)	8 (53.3%)	15 (100%)

Before I begin, I would like to acknowledge that the main credit for this work goes to Arooba Shahoor, a former student of Dongsun Kim. Arooba has recently graduated, and Dongsun has since moved to Korea University. Alright, that was a brief update on the authors. This is Jooyong Yi from UNIST. I am going to tell you something about ...

I am aware that we are in the Debugging session. This is quite apparent when looking at who is chairing this session.

- Failing test → Passing test

But, I am not going to tell you about another LLM-based APR tool evaluated with Defects4J which contains functional bugs. Instead, I am going to ask the following question.

![bg left:40% fit](./img/spa-vs-mpa-architecture.webp)

In fact, we already posed the same question in our previous work presented at ASE 2023. The title of the paper is "LeakPair: Proactive Repairing of Memory Leaks in Single Page Web Applications." The work was well-appreciated and earned a Distinguished Paper Award. In that work, we proposed a technique to fix memory leaks in single-page web applications. In traditional multiple-page applications, the browser reloads the entire page when the webpage needs to be updated, so memory leaks are less of a concern. However, single-page applications update only the necessary parts of the page, and hence memory leaks can pile up, if the program is not properly written.

In this work, we consider reactive Java programs written using reactive libraries such as Reactor, RxJava, and Vert.x. In traditional programs, let's say web applications, each user request is typically handled by a separate thread. If the user request involves time-consuming operations such as a database access, the corresponding thread is blocked until the operation completes. This can result in a waste of computing resources. For example, a new user request might not be processed, even if there are some idle threads simply waiting for their blocked operations to finish. Several reactive libraries such as Reactor and RxJava have been developed to address this issue. Essentially, those libraries support event-driven asynchronous programming. Using those libraries, developers can more easily write non-blocking applications that better utilize computing resources.

Because of such benefits, reactive programming is becoming increasingly popular in various domains such as web, mobile, and IoT applications. Those are the domains where high throughput and low latency are important.

Here is a concrete example of a reactive program taken from the James project, an open-source email server. This example is written using the Reactor library. At the end of this code, a pool of channels is built. And these lines of code specify how a channel is opened and destroyed. Here, the variable connectionMono is of type Mono<Connection>. Mono is a type in Reactor that represents a data stream containing at most a single value. Upon a triggering event, such as 'acquire', the connectionMono emits the value it holds, namely, a Connection object. Then, this emitted object is passed to the 'openChannel' method by this specification using 'flatMap'. The call of the openChannel method returns a value of type Mono<Channel> holding a 'Channel' object. This example also specifies that if a channel is idle for more than 30 seconds, it should be evicted. When a channel is evicted, the destroyHandler's callback function is called and channel.close() is executed.

So, what is the most common bug type in reactive programs? To answer this question, we investigated 29 open-source GitHub projects that meet these criteria. They use popular reactive libraries such as Reactor, RxJava, and Vert.x. They are well-maintained. They have large enough contributors and commits. And they have more than 10 stars, watches, or forks.

We found that blocking-call bugs are the most common bug type in reactive programs. We looked into 189 bug reports related to reactivity and found that more than half of them are blocking-call bugs. Blocking-call bugs are those that block the execution of the program to wait for time-consuming operations such as I/O operations to finish.

Here is a concrete example of a blocking-call bug. This is the same code snippet shown earlier. Suppose we have a channel that has been idle for more than 30 seconds. Then, this channel will be closed by executing channel.close(). Let's also assume that we receive a new request to open a channel. However, the new channel may not open immediately because the close operation, which can take a long time, may block the thread from opening the new channel.

In addition to the finding that blocking-call bugs are the most common in reactive programs, we also found that developers often ignore them. We found that about 40% of the reported blocking-call bugs remain unfixed.

To see why developers often ignore blocking-call bugs, we looked into the comments of the bug reports. Developers mentioned the following.

From the bug patching perspective, developers are essentially indicate that there is insufficient information to accept the patch.

Let's now summarize what we have learned from our investigation of the 29 open-source projects.

Based on these findings, we did the following for the second part of our work.

So, what kind of improvement evidence did we use?

We generated improvement evidence using Java Flight Recorder, a code profiling tool. This tool provides information about CPU usage and heap usage over time. The generated report also shows thread activity, where the green section indicates that the thread is running, the red section indicates that the thread is blocked, and the grey section indicates that the thread is waiting. The upper and lower screenshots show the results before and after applying the fix, respectively. We ran the same test for both cases. The CPU usage at the peak dropped from 92% to 85.6%. We made a pull request with this report to the Apache James project. And two days later, the developer accepted the patch without any further questions. But, is this just a coincidence?

To include, or not to include performance results, that is the question we asked.

To answer that question, we did the following. First, we generated 30 patches across the 29 open-source projects we investigated. I will explain a bit later how we generated those patches. Once we obtained the 30 patches, we randomly assigned 15 patches to include performance results and the other 15 to exclude performance results.

This table shows the results. When performance results were included, the developers accepted the patches in more than half of the cases. They rejected the patches in only two cases. However, when performance results were not included, the results were quite the opposite. The developers rejected the patches in more than half of the cases and accepted the patches in only three cases.

We performed Barnard's exact test and found that the difference between these two groups is statistically significant.

We generate patches using a pattern-based approach. We extracted five common fix patterns from the collected patches.

I know what you are thinking. Why do we not use an LLM to generate patches for blocking-call bugs? LLM might be working.

However, to crack a nut, we do not need a sledgehammer. A nutcracker would be enough.

Alright. This is the first fix pattern. This pattern replaces an expression E of type Mono or Flux with this expression. Both Mono and Flux are types in Reactor that represent a data stream. The difference between Mono and Flux is that Mono contains at most one value, whereas Flux can contain multiple values. This new expression specifies that the subscription to the data stream E should be conducted on a separate worker thread maintained by the boundedElastic scheduler.

![bg right:25% fit](./img/offloaded.png)

This fix pattern matches our running example. After applying this fix pattern, the close operation is offloaded to a separate worker thread, so the main thread can open a new channel without being blocked.

Now, let's look at the second fix pattern. Suppose E here involves a blocking operation such as I/O tasks. The 'just' operator here returns a Mono object containing the value of E. The difference between these two expressions is whether E is evaluated eagerly or lazily. This fix pattern is often used in combination with the first fix pattern. Note here that the 'subscribeOn' operator is added by the first fix pattern.

This slide explains the difference between before and after applying the fix pattern. Before the fix, the 'just' operator eagerly executes the getSomething method, which can block the thread. After the fix, the getSomething method is lazily executed in a separate worker thread. The main thread can be used to do other tasks without being blocked.

Here are the experimental results. We measured CPU usage, heap usage, latency and memory usage before and after applying the fix pattern. Since performance measures can vary even if the same test is run, we ran the same test 10 times. These plots show how the performance measures change at each iteration. The plink line indicate the performance measures before applying the fix pattern, and the purple line show the results after applying the fix pattern. In general, less resources are used after applying the fix pattern. We also ran the regression test suite and found no regression error. This is the result of Apache James.

For the other subjects, similar results were observed.

So, what are the takeaways?

They all sound obvious, aren't they?

Then, let me flip these takeaways into questions. I think the answers are not a definitive yes, which makes us think about the future direction of APR and also debugging research.

# Common Fix Patterns FP3. Reactive Filtering ```diff - 𝐸1.filter(𝐸2 -> 𝐸3) + 𝐸1.filterWhen(𝐸2 -> Mono.fromCallable(() -> 𝐸3)) ``` - 𝐸3 involves a blocking operation --- # Common Fix Patterns FP4. Non-blocking Chaining ```diff - 𝐸.block(); 𝑆 + 𝐸.then(Mono.fromRunnable(() -> 𝑆)) ``` - 𝜏(𝐸) <: Mono - 𝜏(𝑆) <: Mono | Flux --- # Common Fix Patterns FP5. Non-blocking Subscription ```diff - 𝐸.block(); + 𝐸.subscribe(); ``` - 𝜏(𝐸) <: Mono

Preserving Reactiveness: Understanding and Improving the Debugging Practice of Blocking-Call Bugs

Arooba Shahoor, Jooyong Yi, Dongsun Kim

Kyungpook National University, UNIST, Korea University

Automated Program Repair (APR)

Automated Program Repair (APR)

How do we fix non-functional bugs?

Our Previous Work

This Work

Reactive Program Are Getting Popular

Reactive Program Example

Most Common Bug Type in Reactive Programs?

Blocking-Call Bugs Are the Most Common

Blocking-Call Bug Example

Developers Often Ignore Blocking-Call Bugs

Why Are Blocking-Call Bugs Ignored?

Why Are Blocking-Call Bugs Ignored?

Insufficient information to accept the patch!

What We Learned

What We Learned

What We Learned

What We Did

What We Did

What We Did

What We Did

Improvement Evidence

To include, or not to include performance results?

To include, or not to include performance results?

To include, or not to include performance results?

To include, or not to include performance results?

Common Fix Patterns

Why Not Using an LLM?

Why Not Using an LLM?

Why Not Using an LLM?

Common Fix Patterns

FP1: Offloading to Separate Worker Threads

Common Fix Patterns

Common Fix Patterns

Experimental Results

Experimental Results

Experimental Results

Takeaways

Takeaways

Takeaways

Takeaways