dayonehk.com

Exploring ChatGPT's Code Testing Abilities and Challenges

Written on

Chapter 1: Introduction to ChatGPT's Capabilities

Since its launch on November 30, 2022, OpenAI's ChatGPT has rapidly gained traction, attracting over a hundred million users, making it the fastest-growing application globally. A significant portion of these users consists of developers and data scientists, as research indicates that ChatGPT excels in code generation, explanation, and error correction.

This article aims not to evaluate whether ChatGPT produces correct code or fixes errors but to investigate its ability to test and enhance its own code while working through three distinct coding challenges.

Section 1.1: The Fibonacci Challenge

The Fibonacci sequence is a common coding task that many IT and data science students encounter during their education. To delve deeper into the Fibonacci sequence, click here. I prompted ChatGPT to create a recursive Fibonacci function in Python:

Recursive Fibonacci Function in Python

The function generated appears satisfactory. Next, I requested ChatGPT to write tests for its own function using pytest, a popular testing library for Python:

Pytest Tests for Fibonacci Function

Notably, ChatGPT tested both standard and edge cases, which is commendable. However, the function encountered a RecursionError when tested with an edge case of 100:

RecursionError in Fibonacci Function

Instead of revealing the issue, I provided only the pytest error message, ensuring no external influence in the process.

ChatGPT interpreted the RecursionError and opted to convert the recursive function into a non-recursive one. As a result, the updated tests, including edge cases, passed, although the initial requirement for a recursive implementation was not fulfilled. I then asked ChatGPT to revert to a recursive approach while ensuring that edge cases were still handled correctly.

In the end, ChatGPT successfully produced a recursive Fibonacci function that passed all its well-defined tests and utilized memoization for improved efficiency. Overall, I believe ChatGPT performed exceptionally well in this task.

Section 1.2: The Roman Numerals Challenge

Another frequently encountered task involves converting Roman numerals to decimal numbers and vice versa. For those unfamiliar with Roman numerals, click here for a brief overview. First, I asked ChatGPT to construct a class to tackle this challenge:

Roman Numeral Conversion Class

The code generated seems solid, with two methods: to_roman and to_numerical, effectively converting between the two numeral systems. To ensure accuracy, I requested ChatGPT to create tests for its methods:

Tests for Roman Numeral Conversion Methods

Once again, ChatGPT included edge cases by testing values like 0, an empty string, and 3999, which is the maximum representable value in Roman numerals. All tests passed initially!

However, the code wasn't entirely foolproof, as invalid inputs like 'MMMM' and 4000 produced incorrect results. I prompted ChatGPT to explore additional edge cases.

Unexpectedly, two errors surfaced, one for each method. I allowed ChatGPT to identify the errors independently by supplying the pytest error message.

ChatGPT recognized the errors, demonstrating an understanding of the problems and generating new code to address them. Nevertheless, the solution was lengthy and complicated:

class RomanNumeralConverter:

def __init__(self):

self.roman_numeral_map = {

1000: "M",

900: "CM",

500: "D",

400: "CD",

100: "C",

90: "XC",

50: "L",

40: "XL",

10: "X",

9: "IX",

5: "V",

4: "IV",

1: "I"

}

def to_roman(self, n):

if not 0 < n < 4000:

return ""

result = ""

for value, numeral in self.roman_numeral_map.items():

while n >= value:

result += numeral

n -= value

return result

def from_roman(self, s):

if not s or not all(char in self.roman_numeral_map.values() for char in s):

return 0

result = 0

i = 0

while i < len(s):

if i + 1 < len(s) and self.roman_numeral_map.get(s[i:i+2]):

result += self.roman_numeral_map[s[i:i+2]]

i += 2

else:

result += self.roman_numeral_map[s[i]]

i += 1

return result

After running pytest, only one exception remained:

FAILED test_roman_numerals.py::test_from_roman — KeyError: 'M'.

The invalid input caused a KeyError instead of returning 0, which ChatGPT had anticipated in its tests. When I provided the KeyError message, ChatGPT altered the from_roman method to verify the presence of a letter in the mapping. However, 'M' is a valid Roman numeral, just not in the fourth instance in the same string. Consequently, the tests continued to fail.

In conclusion, while ChatGPT demonstrated proficiency in generating accurate code and tests for the Roman numerals challenge, it struggled to align its code with the test cases effectively.

Chapter 2: Removing House Numbers from Addresses

Next, we turn to a more NLP-focused challenge, specifically the removal of house numbers from Dutch addresses. This task was relevant to a recent project at my job. Dutch addresses can exhibit a variety of formats, such as:

  • Example Street 22, 1010AB, Amsterdam
  • Example Street 22A, 1010AB, Amsterdam
  • Example Street 22-A, 1010AB, Amsterdam
  • Example Street 22–3, 1010AB, Amsterdam

I asked ChatGPT to develop a solution that removes only the house numbers:

Code to Remove House Numbers

ChatGPT employed a Regular Expression (Regex), which appears to be a more suitable choice than using split() in combination with indexing. The regex pattern r'bd{1,4}[a-zA-Z]?bs*' matches numbers ranging from 1 to 4 digits, optionally followed by a letter, surrounded by word boundaries and succeeded by any whitespace characters.

However, it became evident that the last two address examples would fail due to hyphens separating the numbers and their additions. Let’s see how ChatGPT responded when I asked it to write tests for the remove_house_number() function:

Tests for Remove House Number Function

Interestingly, ChatGPT crafted adequate tests that included the possibility of hyphens between the number and its addition, which was overlooked in the remove_house_number() function.

The primary issue causing the tests to fail was that the function inadvertently deleted postal code digits by adding whitespace between the postal numbers and letters, thus conforming to the Regex designed to match house numbers exclusively.

Once again, I refrained from providing direct guidance and merely shared the error message generated by pytest.

In this instance, ChatGPT did not recognize that the Regex also removed postal code digits. Instead of adjusting the remove_house_number() function, ChatGPT opted to rewrite the tests, but the tests remained unchanged.

To assist ChatGPT further, I explicitly requested it to modify the remove_house_number() function. Unfortunately, the tests failed once more, as the output for "Prinsengracht 123A, 1015 DX Amsterdam" remained unchanged.

In summary, while ChatGPT initially produced sound functions and tests for the house number removal task, a few iterations later, we found ourselves further from the correct solution than where we started.

Final Thoughts

ChatGPT shows promise in generating effective code for various coding challenges and creating appropriate tests, including edge cases that could potentially break the code. This suggests that it may exhibit less bias in testing compared to many human developers. However, aligning the code with the tests to ensure all tests pass remains a challenge for ChatGPT.

It's worth noting that the feedback ChatGPT received was limited to pytest output. Providing feedback in natural language from a developer might enhance the results. Additionally, the version of ChatGPT utilized for this article was GPT-3.5, while OpenAI has since released GPT-4. I plan to test the same queries with GPT-4, so stay tuned to see if it has made significant advancements in Python coding and testing compared to GPT-3.5!

Chapter 3: Video Insights

The first video titled "ChatGPT writes my code - AND my unit tests??" explores how ChatGPT approaches coding and testing tasks.

The second video titled "DON'T TRUST ChatGPT's CODE!" discusses the limitations and challenges of relying on ChatGPT for code generation.

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Navigating Apple's Software Evolution: Is iOS 17 the Answer?

Analyzing whether iOS 17 meets the high expectations set by Apple users.

Embarking on a Journey of Self-Discovery: The Art of Getting Lost

Discover the beauty of losing yourself in travel and the unexpected experiences that come with it.

How Long Will It Take to Free Ourselves from Limitations?

Reflecting on the strings that hold us back and how to cut them to pursue our dreams.