How I reduced OpenAI Costs and Token Usage by 40% with Python

I’ve been using the OpenAI API for several weeks, and I’ve been thinking about how smart ChatGPT is and how it can understand my prompts even when there are missing words from the text.

That’s when the idea struck me: ‘What if I could reduce the words used in my prompts through the API to save money on token usage?’

That’s exactly what I’m going to demonstrate in this post, and you’ll see that it’s a straightforward process. I’ll divide the content into two sections: one for those who want to learn the code and how I accomplished it, and the other with a finished app where you can simply enter the text you desire.

How the prompt reducer works

The first thing you need to know is that our prompt is going to pass in two cleaning phases:

First, we will remove the stop words from the text. Stop words refer to a collection of frequently employed words within a given language. For instance, in the English language, words like “the,” “is,” and “and” are prime examples of stop words
Then, we will lemmatize all the verbs. The lemmatizer is a process that reduces words to their base or root form, which helps in simplifying and standardizing verb forms and makes it easier for language models like GPT to understand and generate text.

So, let’s take an example with this description of the game Baldur’s Gate 3, taken from Wikipedia:

“Baldur’s Gate 3 is a role-playing video game with a single-player and cooperative multiplayer element. Players can create one or more characters and form a party along with a number of pre-generated characters to explore the game’s story”

If we were to put this in the Tokenizer by OpenAI, it tells us we would spend 49 tokens:

After our prompt reducer, the text we get is:

“Baldurs Gate 3 roleplaying video game singleplayer cooperative multiplayer element Players create one character form party along number pregenerated character explore games story”

We went from 49 to 28 tokens, that is a 42.86% reduction in token usage.

Possible Use Cases: Why Would You Want to Reduce Token Usage?

If you’ve already used the OpenAI API, you probably know that it charges based on the number of tokens, including both your inputs and the responses from OpenAI. While using a model like GPT-3.5 may not be too expensive, using GPT-4 with extensive data can become costly.

In my case, I’ve developed an app that retrieves product descriptions from an .xlsx file and rewrites them. For every product on a website, I send a product description to GPT-4 and receive the rewritten product description in return. If you, like me, are sending a substantial amount of content to the API, reducing tokens might be a good idea.

I’ve also created a script that takes descriptions from web pages and generates SEO-optimized meta titles and meta descriptions. This script uses the same text cleaning method that I’m demonstrating here.

How the code in OpenAI Prompt Reducer works

Section 1: Understanding the Code

The code is pretty straightforward, and besides importing the modules, it revolves around 12 lines of code.

First, you begin by importing the modules you need, and that is:

import nltk  # Natural Language Toolkit for text processing
from nltk.corpus import stopwords  # Stopwords dataset for filtering common words
from nltk.stem import WordNetLemmatizer  # Lemmatization for word normalization
import string  # Provides a collection of punctuation characters

Then, you download some resources that you will need, and you only need to run this part once. After downloading it, you can exclude those lines:

# Download required NLTK resources
nltk.download('stopwords')  # Download stopwords dataset
nltk.download('punkt')  # Download punkt tokenizer
nltk.download('wordnet')  # Download WordNet dataset for lemmatization

Next, you insert your entire function to process the text:

# Function for text preprocessing

def preprocess_text(text):

    # Create a lemmatizer object for English
    lemmatizer = WordNetLemmatizer()   

    # Load the stopwords dataset for English
    stop_words = set(stopwords.words('english'))    

    # Split the input text into words (tokenization)
    words = text.split()

    # Apply lemmatization, remove stopwords, strip whitespace, and remove special characters
    filtered_words = [lemmatizer.lemmatize(word) for word in words if word.lower() not in stop_words and word.strip() != '']   

    # Remove single quotes, hyphens, and punctuation marks
    filtered_words = [word.replace("'", "").replace("-", "") for word in filtered_words]
    filtered_words = [''.join(char for char in word if char not in string.punctuation) for word in filtered_words]

    # Reconstruct the processed text from the filtered words
    processed_text = ' '.join(filtered_words)   

    return processed_text

Next, you’ll need to add just three more lines of code to input your text and receive the formatted output:

# Process text

text = "Your text here"
processed_text = preprocess_text(text)
print(processed_text)

Here is the full code on github: https://github.com/ViniciusStanula/gpt-prompt-reducer

Section 2: Using the Finished App

If you don’t know how to code in Python and are just curious to see how this prompt reduction works, I’ve created a simple web application for you to use:

Simply go to the web app at https://gpt-prompt-reducer.streamlit.app/,
Insert your text or prompt,
Click the button,
And get your results.

So, that is basically it. I hope you liked the content, and I would be happy to see your applications and to receive any feedback that you have.

How I reduced OpenAI Costs and Token Usage by 40% with Python

How the prompt reducer works

Possible Use Cases: Why Would You Want to Reduce Token Usage?

How the code in OpenAI Prompt Reducer works

Section 1: Understanding the Code

Section 2: Using the Finished App

About Vinicius Stanula

Leave a Comment Cancel reply

How the prompt reducer works

Possible Use Cases: Why Would You Want to Reduce Token Usage?

How the code in OpenAI Prompt Reducer works

Section 1: Understanding the Code

Section 2: Using the Finished App

About Vinicius Stanula

Related Post

Leave a Comment Cancel reply