poracle

Question	Bug ID	Pattern
Q1	Math73	CC (Complementary Case)
Q2	Math105	EGA (Existing General Assertion)
Q3	Math28	UE (Unexpected Exception)
Q4	Lang58	RI (Reference Implementation)

# Food for Thought - Ideally, incorrect patches are supposed to be identified by the test suite. ![width:1000px](./img/APR-pipeline.jpg) ---

# Things to Think About ![bg left:33% fit](./img/overfitting.jpg) - The large gap between the plausible patch space and the correct patch space suggests that the quality of the test suite is not good enough for patch validation. ---

# Difficulty of Test Generalization $\forall \vec{v}: T(\vec{v}) = \psi(\vec{v})$ - $\vec{v}$: inputs - $T(\vec{v})$: output of test $T$ when $\vec{v}$ is given - $\psi$: the oracle function ---

# Preservation Condition Example - Math105 bug: assertion --reused-> preservation condition ```java public void testSSENonNegative(double d1, ..., double d6) { try { double[]y={d1,d2,d3}; double[]x={d4,d5,d6}; SimpleRegression reg = new SimpleRegression(); for(inti=0;i<x.length;i++) { reg.addData(x[i], y[i]); } double ret = reg.getSumSquaredErrors(); // Original: assertTrue(ret >= 0.0); preserveIf(ret >= 0.0, () −> new Double[] { ret }); } catch (Exception e) { failToPreserve(); } } ``` # Preservation Condition Example - Lang58 bug: exploits a reference implementation ```java public void testLang300(int n, int m) { // NumberUtils.createNumber("1l"); // Original body // Test with a generalized input String s = "" + ((char) n) + ((char) m) + "l"; String actOut = ""; try { actOut = "" + NumberUtils.createNumber(s).longValue(); } catch (Exception e) { actOut = "Exception"; } // Use Long.valueOf as a reference String refOut = ""; try { refOut = "" + Long.valueOf(s); } catch (Exception e) { refOut = "Exception"; } preserveIf(actOut.equals(refOut), () −> new String[] { actOut }); } ```

# Patch Validation with Preservation Condition ![width:1500px](./img/workflow.jpeg) ---

- Patch reviewing cost reduction - The number of patches to be reviewed after filtering

# Patch Reviewing Cost Reduction ![bg left:33% fit](./img/APR-design-find-and-filter.jpg) - JAID returns a ranked list of plausible patches. - We applied Poracle to the obtained ranked list of plausible patches and compared the number of patches to be reviewed before and after filtering. --- # Patch Reviewing Cost Reduction ![width:1200px](./img/cmp-cost.jpg) ---

- For each question, participants were divided into two groups.

--- # Ablation Study ![width:700px](./img/CoincidentallyRejected.jpg) # Example - Failing test for Math95 of Defects4J ```java public void testSmallDegreesOfFreedom() { FDistributionImpl fd = new FDistributionImpl(1.0, 1.0); double p = fd.cumulativeProbability(0.975); double x = fd.inverseCumulativeProbability(p); assertEquals(/* expected output */ 0.975, x, /* delta */ 1e-5); } ``` --- # Example - Generalizing the failing test ```java public void testSmallDegreesOfFreedom() { FDistributionImpl fd = new FDistributionImpl(1.0, 1.0); double p = fd.cumulativeProbability(0.975); double x = fd.inverseCumulativeProbability(p); assertEquals(/* expected output */ 0.975, x, /* delta */ 1e-5); } ``` ↓ ```java public void testSmallDegreesOfFreedom(double d1, double d2, double d3) { FDistributionImpl fd = new FDistributionImpl(d1, d2); double p = fd.cumulativeProbability(d3); double x = fd.inverseCumulativeProbability(p); assertEquals(/* expected output */ ________, x, /* delta */ 1e-5); } ``` --- # Preservation Condition Example - Math95 bug: An unexpected exception occurs. ```java public void testSmallDegreesOfFreedom(double d1, double d2, double d3) { try { FDistributionImpl fd = new FDistributionImpl(d1, d2); double p = fd.cumulativeProbability(d3); double x = fd.inverseCumulativeProbability(p); preserveIf(/* preservation condition */ true, /* outputs to compare */ () -> new Double[] {x}) } catch (Exception e) { failToPreserve(); } } ``` ![width:900px](./img/preserveIf.png) --- # Classification Performance ![width:700px](./img/cmp-bert-lr.jpg) - BERT-LR: Haoye Tian et al., "Evaluating representation learning of code changes for predicting patch correctness in program repair", ASE 2020 --- # Developer Patches ![width:700px](./img/developer-patches.jpg) --- # Correct Answer Ratio | Top 50% Students | Bottom 50% Students | |:---:|:---:| | ![width:600px](./img/high_exp_scores.jpg) | ![width:600px](./img/low_exp_scores.jpg) | --- # Manual Time Cost | All students | Students who submitted correct answers | |:---:|:---:| | ![width:600px](./img/overall_time.jpg) | ![width:600px](./img/overall_time_only_correct.jpg) | --- # Sentiment | All students | Top 50% students | |:---:|:---:| | ![width:600px](./img/poracle_manual_experience_bar.jpg) | ![width:600px](./img/poracle_manual_experience_bar_grade.jpg) |

Poracle: Testing Patches Under Preservation Conditions to Combat the Overfitting Problem of Program Repair

Elkhan Ismayilzada, Md Mazba Ur Rahman, Dongsun Kim, and Jooyong Yi

UNIST, South Korea, Kyungpook National University, South Korea

Test-based APR

Overfitting Problem of APR

APR Design Space

APR Design Space

APR Design Space

Existing Approach: Score-based Patch Classification

Existing Approach: Score-based Patch Classification

Existing Approach: Evidence-based Classification

Existing Approach: Evidence-based Classification

Things to Think About

Things to Think About

Poracle: Testing Patches Under Preservation Conditions to Combat the Overfitting Problem of Program Repair

Preservation Condition

Preservation Condition Example

Differential Fuzzing

Evaluation

Classification Performance

Classification Performance

Classification Performance

Comparison with ODS

User Study

Manual vs Semi-Automated

Correct Answer Ratio

Conclusion