OpenAI has launched gpt-realtime, a brand new conversational mannequin that higher follows advanced directions, produces extra natural-sounding voices, and is 20% cheaper. The mannequin comes with two new voices.
In October 2024, OpenAI launched the Realtime API, which permits builders to create low-latency, multimodal experiences. Since then, 1000’s of builders have used this API to construct natural-language dialog experiences into their purposes. Now, a extra superior conversational mannequin, gpt-realtime, has been introduced, which incorporates enhanced options.
Extra Superior and Cheaper

The brand new mannequin is healthier at following advanced directions, and the error charge in duties like hailing a trip has been lowered. The voices generated by the mannequin are additionally famous to be extra pure and expressive. OpenAI states that the mannequin can interpret system messages and developer instructions a lot better than earlier fashions.
The preliminary launch of the Realtime API supplied six voice choices, with two new voices added later. Now, two extra new voices named Marin and Cedar have been introduced. Moreover, the eight present voices have been up to date to supply a extra pure and fluid expertise.
The brand new mannequin additionally performs impressively in efficiency assessments. Within the Large Bench Audio check, gpt-realtime achieved an 82.8% accuracy, surpassing the 65.6% rating of the earlier mannequin from December 2024. Within the MultiChallenge Audio Benchmark check, the mannequin scored 30.5%, exceeding the earlier rating of 20.6%.
Together with the brand new mannequin and voices, there have additionally been updates to the Realtime API. The API now helps distant MCP servers, visible enter, and cellphone calls through Session Initiation Protocol (SIP). Builders also can save and reuse their prompts.
Regardless of all these enhancements, OpenAI has lowered the value of the Realtime API. gpt-realtime is now 20% cheaper in comparison with the earlier gpt-4o-realtime-preview. The associated fee is $32 per 1M voice enter tokens and $64 per 1M voice output tokens.
You May Additionally Like;
Observe us on TWITTER (X) and be immediately knowledgeable concerning the newest developments…
Copy URL