Integrate GPT and get it to reply in valid JSON

or, actually integrate GPT into your applications

NOTE: If you just want to see how to get GPT to respond in JSON, just jump to the last section of this post and skip my verbose prose 😛

Are LLMs ready for showtime?

I’ve been in tech long enough to see a few cycles of new technologies enter the market, and get a sense for the rhythm of them. First comes the hype, then the reality of working with it sets in, followed by only a few companies willing to invest the high cost to get it working, before finally, the explosion happens and it gains mainstream adoption.

Right now we’re somewhere in an uncanny valley between all of those steps, where LLMs are having a lot of hype, those working with it are realizing the limits ( and opportunities ) of it rather quickly, some companies are realizing training their own LLM is still expensive and doesn’t have a guaranteed return ( unless you have a high price team of data scientists specialized in NLP ), while at the same time ChatGPT is on nearly every enterprise computer at the moment, and companies are using it effectively. To say nothing of the big banner integrations that OpenAI has been touting.

With GPT, or Llama from Meta, or Falcon ( etc. etc. etc., there’s a new LLM each week 🥵 ), the barrier to entry for NLP driven A.I. is just getting easier. Which for me, I get all kinds of jazzed for ways I can integrate it with the products I’m building. 🎷🕺

but, there’s a catch.

there is no free lunch

I’m truly astonished with what the LLM community has been able to build in SUCH A SHORT TIME. Marc Andressen said:

“This technology [ transformers, the tech that powers GPT ] has been around for several years, but it essentially started working last year.”

I’ve been playing with NLP since 2015, first starting with NLTK, then the Meta product wit.ai , before ending up in an open source chatbot platform as Rassa ( among others that I meandered around ). Seeing all of those head bashing platforms bloom into what the space is now, is nothing short of astounding.

Eager to get my hands dirty on it, I’ve been playing with all the major LLMs, both running them on my own rig, and through API. I’ve noticed one frustrating overtone to them all:

Unless you’re replicating a chat interface, good luck getting it embedded in your application.

While all of these models can do some amazing things, their outputs all needed some human intermediary ( 🙄 ) in order to get the most value out of it. I don’t get super deep into the UX concept of “the best interface is no interface” ( I think that has terrible effects for us as humans, but I digress ), though I do want the system to do the bulk of the work, especially the frustrating work.

When it’s an integrated experience, the system should work the magic and let the user just make the choice from where to go next, not always requiring a mouse and some copy pasta to make it work 🍝.

I have a bit of an obsessive personality when it comes to things that irk me. I’ll practically go mad figuring something out if it bothers me enough; it’s a known flaw, and a known feature.

Well, I got obsessed with trying to get these LLMs to respond in JSON, and ONLY JSON.

Because if I want to integrate an LLMs into an application in a way that’s not a chat interface, ahem in a meaningful way ahem, I need it to respond in JSON.

Let me put it this way, if you want to get what the weather is and have that on a screen in your app, you’re not going to just port over another website and display it inside your app as is ( this isn’t the early 2000’s when I-frames were the rage ). That’s essentially what these LLMs are forcing with a chat interface.

Don’t get me wrong, I think conversation is the way humans naturally want to communicate, so being able to do that with a computer is a HUGE boon for the space. But, if we only have that, we’re leaving a whole lot of horsepower on the table for what else these LLMs can do to help us navigate large sets of data ( read: the whole world right now ).

Just as we’d take a weather API that provides the current temperature and the forecast, to transform it into our app’s style, colors, theme- the experience as we want to define it- we’ll want to do that with LLMs. In order to do that with the engine that is an LLM, we need JSON.

Ah HAAAAAA

After many late nights and exceptionally early mornings, I made it my mission to find a way to get GPT to provide back JSON. I started with a use case I’m going to be a bit stealth on, as it’s something I’m chipping away at that makes my life easier ( and soon to make others too 🙊 🥷 ).

After hundreds of brave searches , I found only a few people have been looking to do this and I stole all their tricks. From prompting the LLM to only respond back in valid JSON, to giving it an example JSON schema.

But, I found that each time I tried any of them, they would work about 60% of the time. So I had to build in all these rules to keep hitting the GPT API over and over ( costing money each time 🤑 ), until it finally fed back valid JSON exactly how I needed it, in order for the rest of my application to take it from there and run with it.

And 🥁 after much head banging 🥁 I FOUND A WAY.

Enter functions and the TypeScript hack

Not so OpenAI has some great API features, but terrible documentation. Namely, a feature called “functions” where you can specify more clearly what you’re asking the LLM to do within the API request. Think of it as a second set of instructions to reinforce what’s in the initial prompt.

“Having worked more with OpenAI functions now, I’m realizing that this is the way to get function like behavior from an LLM that doesn’t do that natively.” ~ Alex Cruikshank

It also allows you to structure the response that you’re getting back to set what the format of it should look like.

But, I’ve found even with specifying in the prompt for only responding in JSON, emphasizing in the function, and layering in exactly what the prompt should look like, you still end up with broken or half json, or some mixed json and prose back.

The TS Hack

That’s when a buddy of mine, Alex , suggested feeding it not just an example json schema into the prompt ( which I was already doing to mixed results ), but prompting it with an example schema in TypeScript.

If I can digress for one moment on how absolutely brilliant of a move this is, as TS is literally designed to make sure in any given situation that the data and form of the object being fed is exactly as it should be. So prompting GPT with TS, given how much code it’s been trained off of, is almost like hacking it to ensure that the response back is structured just like it should be.

Here’s the little example that I made from the menu at my favorite fancy restaurant Baveett’s . I can’t guarantee that it’s going to respond exactly like you want it to, these LLMs take a lot of nudging this way and that before they respond how you want them to. BUUUTTT, I can guarantee that it will respond only in JSON.

Give it a spin and let me know what you end up building with it! 🌪️🤠🌪️

``` import openai import os import json

def gptapi(): # OpenAI API configuration openai.apikey = os.environ['Openaikey']

# TypeScript example to give GPT the schema
example_response = '''const response: MenuItems = {
                        appetizer: '',
                        mainCourse: '',
                        sideDish: '',
                        desert: ''
                    };
                    '''

print(type(example_response))
# The menu from Bavette's that we'll feed into GPT to get what we should eat
text = '''

appetizer

Baked Goat Cheese Sizzling Shrimp Scampi Tenderloin Steak Tartare Shrimp Cocktail Baked Crab Cake

Main Courses

10oz Double Wagyu Cheeseburger Prime Beef French Dip Miso Glazed Black Cod Lobster Frites Big Glory Bay Salmon Bavette’s Spiced Fried Chicken Roasted Chicken Double Cut Berkshire Pork Chop Lamb Chops Shortrib Stroganoff

Sides

Pommes Frites Buttery Mashed Potatoes Creamed Spinach Charred Broccoli Broiled Asparagus Elote Style Corn Baked Sweet Potato Brussels Sprouts Truffle Mac & Cheese Button Mushrooms Loaded Baked Potato Thick-Cut Bacon

deserts

chocolate cake ice cream cookie'''

prompt = 'given this menu {}, choose one menu item for each course at random. without any comment respond in valid JSON like this example schema, {}'.format(
    text, example_response)


response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo-16k",
    max_tokens=700,
    messages=[
        {"role": "user", "content": prompt}
    ],
    functions=[
        {    
            # this is the unique name to give the function 
            "name": "get_dinner_chosen",
            # the description gives a second set of instructions to GPT past only the prompt
            "description": "choose an item for each of the menu items for a meal and respond in json",
            "parameters": {
                # the parameters tells GPT what the structure of the data should look like back, and what type it should be. I've found this needs to mimic what's above with the TS in the prompt, but even still, don't expect that you're going to get it back looking like it should each time.
                "type": "object",
                "properties": {
                    "menuItems": {
                        "type": "string",
                        "appetizer": "",
                        "mainCourse": "",
                        "sideDish": "",
                        "desert": ""

                    }
                }
            }
        }
    ],
    # this calls the function you created above, make sure this name matches what's the name above
    function_call={"name": "get_dinner_chosen"}
)

print(response)

output = json.loads(response["choices"][0]["message"]["function_call"]["arguments"])

menuChoices = output["menuItems"]

print(menuChoices)

```