Understanding Visual Language Models

Salesforce releases ‘xGen-MM’ open-source multimodal AI models to advance visual language understanding

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Salesforce, the enterprise software giant, ...

Nasdaq

Alibaba Cloud Releases Latest AI Models For Enhanced Visual Understanding

(RTTNews) - Chinese tech giant Alibaba Cloud on Wednesday unveiled its latest visual-language model, Qwen2.5-VL, which it claims to be a significant improvement from its predecessor, Qwen2-VL. The ...

The Robot Report

Vision-language-action models are the next leap in autonomous robotics

Explore how vision-language-action models like Helix, GR00T N1, and RT-1 are enabling robots to understand instructions and act autonomously.

EurekAlert!

Assessing and understanding creativity in large language models

A TTCT-inspired dataset was constructed to evaluate LLMs under varied prompts and role-play settings. GPT-4 served as the evaluator to score model outputs. In recent years, the realm of artificial ...

Ars Technica

Microsoft unveils AI model that understands image content, solves visual puzzles

On Monday, researchers from Microsoft introduced Kosmos-1, a multimodal model that can reportedly analyze images for content, solve visual puzzles, perform visual text recognition, pass visual IQ ...

Forbes

The Next Leap In AI: From Large Language Models To Large World Models?

The realm of artificial intelligence (AI) may be on the cusp of a new transformative leap, transitioning from Large Language Models (LLMs) to an innovative and expansive concept, which we may call ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results