On a recent client project, our development team had reached an impasse over an important requirement. We knew that we needed to allow our end-users to enter goals and related activities as free-form text and that then we needed these goals/activities to be classified into a set of categories and subcategories that our client had defined.
Our initial design guidelines had indicated that this would be accomplished by the end user manually specifying this information, choosing a category and subcategory for each of the goals/activities that they enter as free-form text. The impasse came when we did UX testing (with the entire development team as guinea pigs). The front-end team had put together a validation test to help us determine if our proposed approach to classification was viable or not.
The results of this UX testing were very instructive, but not in a positive way. All of us agreed that the process was far more cumbersome than anticipated and that this approach was never going to be acceptable for our end-user audience. We needed to find a different approach. In a meeting shortly after this UX testing, our Project Manager lamented "If we can't find a way to automate the classification of these goals, it could be a show-stopper."
We tried brainstorming a few ideas but kept getting hung up on the requirement to allow the end-users to enter their goals as free-form arbitrary text descriptions –the input “space” was unbounded and essentially infinite.
Later the same day, I was still thinking about this problem when I came across an article online indicating that OpenAI had just reduced the cost of their GPT API by 90%. I had been reading and absorbing as much as I could about GPT but I had been reluctant to dive in, for fear of getting lost down the rabbit hole. But now that I had a project-related use case to try out (and the usage costs appeared to be trivial), it was time for me to get started with some hands-on work. I signed up for access to the ChatGPT and the OpenAI API.
Initial Questions and Prompts
Once I had my account set up, the first question I asked ChatGPT was if it could do the sort of classification/categorization task that we needed.
Me: Can you categorize and sub-categorize goals if I tell you the categories and subcategories in advance?
ChatGPT: Yes, I can categorize and subcategorize goals if you provide me with the categories and subcategories in advance. Please provide me with the necessary information, and I will do my best to assist you.This response was exciting to me. GPT was confirming it could accomplish the task and even further it was prompting me to provide it with "the necessary information." But I was slightly confused about what to do next. Was I going to need to learn about "model fine-tuning" to teach GPT about the categories and related subcategories that our client had defined?
I decided it would be quickest to just ask ChatGPT.
Me: How do I teach you the subcategories and which category each subcategory belongs to?
ChatGPT: You can teach me the subcategories and corresponding categories by providing me with a list of categories and subcategories along with their relationships. for example:
- Category: animals
- Subcategory: mammals
- Subcategory: birds
- Subcategory: fish
- Category: food
- Subcategory: fruits
- Subcategory: vegetables
- Subcategory: dairy
- Subcategory: meat
By providing this type of information, I will be able to understand the hierarchy of your categories and subcategories and categorize details accordingly.
ChatGPT: Sure, I'm ready to help you categorize your activities. just provide me with the goal, and I will let you know which category/subcategory it belongs to.
I grabbed the first example goal/activity text that I had and entered it into ChatGPT.
Me: Activity: Trimming tree in front yard
ChatGPT: Category: Cleanliness, Subcategory: Yardwork
This was the correct answer, matching the category and subcategory indicating in the example data. I was impressed. I then proceeded to test each of the example inputs that I had access to. I had one example for each of the 150 subcategories and 14 associated categories which our client had defined.
The first 25 or so tests worked perfectly. ChatGPT was correctly choosing the same Category/Subcategory combination which the example inputs had been categorized with. As I tested more of the examples and continued to get correct answers back from ChatGPT, my confidence in the viability of this approach was growing.
Poorly Worded example Inputs
Eventually, ChatGPT spit out a response which was much, much longer than expected for one of the example inputs.
Me: How to balance a check book
ChatGPT: Balancing a checkbook is a financial management task that involves comparing the transactions recorded in your checkbook register with the transactions shown on your bank statement. This allows you to keep track of your account balance and ensure that there are no discrepancies.
To balance a checkbook, follow these steps:
1. Start by recording all of your transactions in your checkbook register. This includes check check she write deposits you make and any other transactions that affect your count balance.
2. Make sure to record each transaction accurately, including the date, the amount, and any relevant details about the transaction.
3. At the end of each month, compare…
This caught my attention and I quickly hit the `Stop Generating` button in ChatGPT to interrupt this apparently wrong answer. Somewhat disappointed, I scolded ChatGPT:
Me: You were supposed to tell me the category and subcategory?
ChatGPT responded with an apology and an appropriate response as well.
ChatGPT: I apologize for that! Here is the category and subcategory for “How to balance a checkbook”: Category: Money Management, Subcategory: Manage bank account
Looking more closely, it was clear that part of the problem was the wording of this example input. Also, I had made the situation worse because I got lazy – I had stopped including the prefix `Activity:` with the free-form text describing the goal/activity.
Considering the exact input in this case was: `How to balance a check book` I realized that the response given by ChatGPT was not really a wrong answer -- it was something slightly different. It was a correct answer for the input but it was not expressed in the manner I had asked for -- not even close.
This type of situation is colloquially referred to as ChatGPT going “off-the-rails”. ChatGPT is conversational and non-deterministic. It is not too surprising that an “off-the-rails” response like this was generated.
Part of what makes ChatGPT so interesting and impressive is the conversational and creative nature of its responses. Further, ChatGPT is non-deterministic, meaning that you cannot be guaranteed to get identical answers for the exact same inputs (but this is configurable).
These qualities of ChatGPT can work against us when we are trying to use it for specific, bounded tasks, like this classification problem.
In Part 2, I will closely examine this type of “off-the-rails” problem response and show some approaches to help avoid it by using the following strategies:
- Open ended vs. closed ended chat
- Single-turn instructions vs. Multi-turn instructions
- Proper prompts for defining the task to keep ChatGPT from going “off-the-rails”
- Out-of-Range Classification (i.e. hallucinated Categories)
- GPT Temperature setting