Part of Microsoft’s multi-billion dollar investment into OpenAI includes access to its latest GPT models which now includes GPT-4o that will now be part of Azure AI starting today.
As part of the keynote speech given by Microsoft CEO Satya Nadella during the company’s annual developer conference, BUILD attendees were privy to the announcement that GPT-4o is now available in Azure AI studio as an API.
Developers can now make use of multimodal models that integrate texts, images, and audio processing within a single model.
Another multimodal experience announced during BUILD was the inclusion of Phi-3-vision into the Phi-3 family which is now home to Phi-3 small, Phi-3 medium, and Phi-3 vision. Microsoft has been working diligently on creating several specialized small language models to complement its investments in large language models and the Phi-3 family represents some of those efforts with Phi-3 vision specifically designed for personal devices.
According to Microsoft, Phi-3 vision will off “the availability to input images and text and receive text responses. For example, users can ask questions about a chart or ask an open-ended question about specific images.”
At 4.2 billion parameters, Microsoft’s new Phi-3-vision should easily support general visual reasoning tasks as well as chart/graph/table reasoning operations.
Depending on how developers implement the new APIs, users could in theory prompt AI solutions about charts or open-ended questions regarding specific images.
Microsoft also announced that its enabling new capabilities across current APIs for more multimodal experiences that include speech analytics and universal translations.