After ChatGPT took the world by storm late last year, it was refreshing to see last week’s announcement of the imminent arrival of the chatbot’s next iteration received relatively modest coverage.
German publication Heise quoted senior members of Microsoft’s in-country leadership last week as saying GPT-4 would arrive this week.
Microsoft Germany CTO Andreas Braun said: “We will introduce GPT-4 next week, there we will have multimodal models that will offer completely different possibilities – for example videos,” Braun said. ChatGPT (or GPT-3.5) launched in December 2022, while GPT-3 launched in 2020.
Both Braun and Microsoft Germany CEO Marianne Janik stressed that while generative AI was a game changer, it wouldn’t be replacing human jobs. The big takeaway from the GPT-4 announcement was the fact that it will be “multimodal”, meaning it will be able to generate text, audio, images and videos.
This may sound like a big deal, but in reality it’s simply a consolidation of pre-existing AI technologies — including OpenAi’s own DALL-e image generator. In fact, GPT-4 sounds like it will tread on the toes of a range of third-party software, including MidJourney, ElevenLabels and D-ID, which together can create an AI animated avatar with voice overs.
Small Steps
All in all, this is a small step forward and might disappoint others that were expecting something more profound. However, it bears out the point that OpenAI CEO Sam Altman made back in January when he warned that “people are begging to be disappointed” by GPT-4”.
Altman didn’t mince his words when talking at StrictlyVC, discounting viral projections that the number of parameters in GPT-4 would climb to 100 trillion from 175 billion in GPT-3 as “complete bullshit”.
And yet, Altman acknowledged that he was caught off guard by the ChatGPT hype train. He said he simply viewed GPT-3.5 as an iterative step on from GPT-3, which didn’t generate as much excitement when it launched.
It seems Altman underestimated humanity’s preoccupation with the self. There are a number of reasons why seeing our own reflection is important to us, including the fact that reflections “help us develop our sense of self”. ChatGPT is the closest an AI program has come to mirroring humans’ conversational ability, creating a shared lightbulb for the cultural consciousness.
It managed this thanks to the focused training on a smaller dataset and with human feedback. I’m not even going to try to distill the technical side of this endeavor here, but if you want a more in-depth take on GPT-3.5’s development process then head on over to Jesus Rodriguez’s exploration of reinforcement learning with human feedback (RLHF).
A Tool’s a Tool
At the end of the day, ChatGPT is just a tool, albeit an exciting and powerful one. While it shows us the potential of generative AI, we still need to put the work in on our end to get the most out of it.
Over the weekend I read Business Insider’s Aaron Mok’s take on leveraging AI tools to boost his productivity. The long and short of it was that these tools made his life harder, with one notable exception. It got me thinking about our expectations when it comes to software learning curves.
The best software solution is one that makes the underlying technology invisible. For all intents and purposes, Google is one of the best examples here. When it comes to AI, ChatGPT is also at the forefront when it comes to invisibility. People see a world where a low-tech conversational input with a chatbot leads to completed tax returns.
Content from our partners
And that world is coming, but we’re some ways off. As it stands, the AI tools we have now can deliver very specific outcomes and we need to remember that or we’re going to be unnecessarily outraged every time generative AI hallucinates information.
Abraham Maslow’s famous quote about hammers and nails doesn’t quite fit the context of this topic, but I’m going to hammer it in regardless. We need to stop thinking about AI as a one size fits all solution to productivity woes. AI comes in many forms that each require a serious time commitment from their users to extract true value — in other words, humans need to upskill.
Picking up a hammer doesn’t mean you can build a house. You’ll need an array of other tools to get the job done and, even then, if you don’t have any skills you’re just asking for trouble.