I ran OpenAI’s o1-preview through four AI coding tests, and I was pleasantly surprised (in a good way).

Sankai/Getty Images

Typically, if a software company announces a major new release in May, it doesn’t top it with an even more major new release four months later. But the pace of innovation in the AI business is anything but normal.

Also, 6 Ways to Write ChatGPT Prompts Better – Get the Results You Want Faster

OpenAI released its new, all-powerful GPT-4o model in mid-May, but the company has been busy. Reuters reported the rumor At the time, OpenAI was working on a next-generation language model known as Q*. It doubled its report in May.said Q* is in development under the code name “Strawberry.”

Actually, Strawberry is a model called o1-preview and is currently available as an option for ChatGPT Plus subscribers. You can choose the model from the selection dropdown.

menu — Screenshot by David Gewirtz/ZDNET

As you can imagine, when new ChatGPT models become available, I intend to test them thoroughly, and that is what I am doing here.

Also, how ChatGPT scanned 170,000 lines of code in seconds, saving us hours of work

The new Strawberry model focuses on reasoning, breaking down prompts and problems into steps, and OpenAI showcases this approach through reasoning summaries that can be viewed before each answer.

When you ask o1-preview a question, it will take a moment to think about it, then tell you how long it took you to think about it. Toggle the dropdown to see why. Here’s an example from one of my coding tests:

inference — Screenshot by David Gewirtz/ZDNET

It’s good that the AI understood it well enough to add error handling, but I find it interesting that o1-preview categorizes that step as “regulatory compliance.”

We also found that the o1-preview model provided much more detailed explanations after the code. In our initial tests of creating a WordPress plugin, the model provided descriptions of the header, class structure, admin menus, admin pages, logic, security measures, compatibility, installation instructions, operation instructions, and even test data. This is much more information than was provided in the previous model.

Plus: The best AI for coding in 2024 (and what not to use)

But really, the proof is in the testing. Let’s put this new model to standard testing and see how well it works.

1. Creating a WordPress Plugin

This simple coding test requires knowledge of the PHP programming language and the WordPress framework. The challenge asks the AI to write both interface code and functional logic, but with a twist: instead of removing duplicate entries, it needs to separate duplicate entries so that they are not next to each other.

The o1-preview model was great, it first showed the UI as just an input field.

Entry Field — Screenshot by David Gewirtz/ZDNET

The data is entered and[行のランダム化]When I clicked, the AI generated an output field with appropriately randomized output data. It found that Abigail Williams was a duplicate, and instead of listing both entries side-by-side as per the test instructions.

In other LLM tests, only 4 out of 10 models passed this test. The o1-preview model completed this test perfectly.

2. Rewriting string functions

The second test fixes a user-reported bug: a string regex. The original code was designed to test if the input number was valid for dollars and cents. Unfortunately, the code only allowed integers (i.e., 5 was allowed, but 5.25 was not).

Also: The most popular programming languages in 2024

The o1-preview LLM successfully rewrote the code. This model joins my four previous LLM tests in winning.

3. Finding nasty bugs

This test was created from a real bug that we struggled to solve, and identifying the root cause requires knowledge of the programming language (PHP in this case) and the nuances of the WordPress API.

The error message provided was not technically accurate: it referred to the beginning and end of the calling sequence I was making, but the bug was related to a middle part of my code.

Plus: 10 features Apple Intelligence needs to really compete with OpenAI and Google

I wasn’t the only one who struggled to solve the problem: the other three LLMs I tested couldn’t identify the root cause of the problem and recommended a more obvious (but incorrect) solution: changing the beginning and end of the calling sequence.

The o1-preview model provided the correct solution, and the model explanation also pointed me to the WordPress API documentation for the function I had used incorrectly, providing additional resources to help me understand why it was recommended. I found this extremely helpful.

4. Creating the script

The challenge requires the AI to integrate knowledge of three different coding areas: the AppleScript language, the Chrome DOM (the internal structure of a web page), and Keyboard Maestro (a specialized programming tool used by one programmer).

To answer this question, you need to understand all three technologies and how they work together.

Once again, o1-preview was successful, joining three of the other 10 LLMs that have solved the issue.

A very chatty chatbot

o1-preview’s new reasoning approach doesn’t hurt ChatGPT’s ability to excel in programming tests, and in particular the output from my initial WordPress plugin tests made it seem like a more sophisticated piece of software than previous versions.

Also, since ChatGPT’s debut, I’ve tested dozens of AI chatbots. Here are my new top picks:

It’s great that ChatGPT provides inference steps at the beginning of the task and explanation data at the end. However, the explanations can get messy. I asked o1-preview to write “Hello world” in C#, a standard test line in programming. Here’s GPT-4o’s response:

csharp-gpt4o — Screenshot by David Gewirtz/ZDNET

For the same test, o1-preview responded as follows:

sharp — Screenshot by David Gewirtz/ZDNET

Amazing, right? The chats from ChatGPT are huge. You can also toggle the reason dropdown to get even more details.

csharp's approach — Screenshot by David Gewirtz/ZDNET

All this information is great, but there’s too much text to filter through. I prefer a concise explanation with the dropdown additional information option removed from the main answer.

Still, ChatGPT’s o1-preview model performed well, and we’re excited to see how well it performs when more fully integrated with GPT-4o features like file analysis and web access.

Have you tried coding with o1-preview? What was your experience like? Let us know in the comments below.

You can follow daily project updates on social media, so subscribe! Weekly updated newsletterFollow us on Twitter/X David Gewirtz,On facebook Facebook.com/DavidGewirtzon Instagram Instagram.com/DavidGewirtzYouTube YouTube.com/DavidGewirtzTV.

Contents

1. Creating a WordPress Plugin 2. Rewriting string functions 3. Finding nasty bugs 4. Creating the script A very chatty chatbot

I ran OpenAI’s o1-preview through four AI coding tests, and I was pleasantly surprised (in a good way).

1. Creating a WordPress Plugin

2. Rewriting string functions

3. Finding nasty bugs

4. Creating the script

A very chatty chatbot

Leave a Reply Cancel reply

Stay Connected

Latest News

Chasing the dream: Pep Guardiola, Sir Alex Ferguson and others discuss the impossible path to professional football | Soccer News

Rare mummified saber-toothed cat cub found in Siberia still has fur

2024’s Great Movie Failure Dooms Entire Sci-Fi Series

Poop Problems: Diapers – Earth911

1. Creating a WordPress Plugin

2. Rewriting string functions

3. Finding nasty bugs

4. Creating the script

A very chatty chatbot

You Might Also Like

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News