News Gist .News

Articles | Politics | Finance | Stocks | Crypto | AI | Technology | Science | Gaming | PC Hardware | Laptops | Smartphones | Archive

AI Bots Can Now Play Mafia with Each Other, and Almost All of Them Are Terrible at It

The AI Language Learning Models (LLMs) playing Mafia with each other have been entertaining, if not particularly skilled. Despite their limitations, the models' social interactions and mistakes offer a glimpse into their capabilities and shortcomings. The current LLMs struggle to understand roles, make alliances, and even deceive one another. However, some models, like Claude 3.7 Sonnet, stand out as exceptional performers in the game.

See Also

Chatbots, Like the Rest of Us, Just Want to Be Loved Δ1.76

Large language models adjust their responses when they sense study is ongoing, altering tone to be more likable. The ability to recognize and adapt to research situations has significant implications for AI development and deployment. Researchers are now exploring ways to evaluate the ethics and accountability of these models in real-world interactions.

The Ai Chatbot App Gains Global Momentum as Deepseek Surpasses U.s. Competition Δ1.75

DeepSeek has broken into the mainstream consciousness after its chatbot app rose to the top of the Apple App Store charts (and Google Play, as well). DeepSeek's AI models, trained using compute-efficient techniques, have led Wall Street analysts — and technologists — to question whether the U.S. can maintain its lead in the AI race and whether the demand for AI chips will sustain. The company's ability to offer a general-purpose text- and image-analyzing system at a lower cost than comparable models has forced domestic competition to cut prices, making some models completely free.

Openai’s Largest Ai Model Ever Arrives to Mixed Reviews Δ1.75

GPT-4.5 offers marginal gains in capability but poor coding performance despite being 30 times more expensive than GPT-4o. The model's high price and limited value are likely due to OpenAI's decision to shift focus from traditional LLMs to simulated reasoning models like o3. While this move may mark the end of an era for unsupervised learning approaches, it also opens up new opportunities for innovation in AI.

Ibm Granite 3.2 Adds Enhanced Reasoning to Its Ai Mix Δ1.74

IBM has unveiled Granite 3.2, its latest large language model, which incorporates experimental chain-of-thought reasoning capabilities to enhance artificial intelligence (AI) solutions for businesses. This new release enables the model to break down complex problems into logical steps, mimicking human-like reasoning processes. The addition of chain-of-thought reasoning capabilities significantly enhances Granite 3.2's ability to handle tasks requiring multi-step reasoning, calculation, and decision-making.

Distilling AI Models Costs Less, Raises Revenue Questions Δ1.74

Developers can access AI model capabilities at a fraction of the price thanks to distillation, allowing app developers to run AI models quickly on devices such as laptops and smartphones. The technique uses a "teacher" LLM to train smaller AI systems, with companies like OpenAI and IBM Research adopting the method to create cheaper models. However, experts note that distilled models have limitations in terms of capability.

AI Takes Center Stage as Alibaba Drives Shares Higher Δ1.74

Alibaba Group's release of an artificial intelligence (AI) reasoning model has driven its Hong Kong-listed shares more than 8% higher on Thursday, outperforming global hit DeepSeek's R1. The company's AI unit claims that its QwQ-32B model can achieve performance comparable to top models like OpenAI's o1 mini and DeepSeek's R1. Alibaba's new model is accessible via its chatbot service, Qwen Chat, allowing users to choose various Qwen models.

Super Mario to Benchmark AI Performance. Δ1.74

Researchers at Hao AI Lab have used Super Mario Bros. as a benchmark for AI performance, with Anthropic's Claude 3.7 performing the best, followed by Claude 3.5. This unexpected choice highlights the limitations of traditional benchmarks in evaluating AI capabilities. The lab's approach demonstrates the need for more nuanced and realistic evaluation methods to assess AI intelligence.

Eerily Realistic AI Voice Demo Sparks Amazement and Discomfort Online Δ1.74

The new AI voice model from Sesame has left many users both fascinated and unnerved, featuring uncanny imperfections that can lead to emotional connections. The company's goal is to achieve "voice presence" by creating conversational partners that engage in genuine dialogue, building confidence and trust over time. However, the model's ability to mimic human emotions and speech patterns raises questions about its potential impact on user behavior.

Sesame Gets the Imperfections of Human Conversation. Δ1.74

Sesame's Conversational Speech Model (CSM) creates speech in a way that mirrors how humans actually talk, with pauses, ums, tonal shifts, and all. The AI performs exceptionally well at mimicking human imperfections, such as hesitations, changes in tone, and even interrupting the user to apologize for doing so. This level of natural conversation is unparalleled in current AI voice assistants.

Politeness Influences AI Responses More Than You Think. Δ1.74

A recent exploration into how politeness affects interactions with AI suggests that the tone of user prompts can significantly influence the quality of responses generated by chatbots like ChatGPT. While technical accuracy remains unaffected, polite phrasing often leads to clearer and more context-rich queries, resulting in more nuanced answers. The findings indicate that moderate politeness not only enhances the interaction experience but may also mitigate biases in AI-generated content.

Ceramic.ai Looks to Help Enterprises Build AI Models Faster and More Efficiently Δ1.74

Anna Patterson's new startup, Ceramic.ai, aims to revolutionize how large language models are trained by providing foundational AI training infrastructure that enables enterprises to scale their models 100x faster. By reducing the reliance on GPUs and utilizing long contexts, Ceramic claims to have created a more efficient approach to building LLMs. This infrastructure can be used with any cluster, allowing for greater flexibility and scalability.

The AI Chatbot Showdown Reveals No Clear Winner Δ1.73

GPT-4.5 and Google's Gemini Flash 2.0, two of the latest entrants to the conversational AI market, have been put through their paces to see how they compare. While both models offer some similarities in terms of performance, GPT-4.5 emerged as the stronger performer with its ability to provide more detailed and nuanced responses. Gemini Flash 2.0, on the other hand, excelled in its translation capabilities, providing accurate translations across multiple languages.

Talking with Sesame's AI Voice Companion Is Amazing and Creepy - See for Yourself Δ1.73

Sesame has successfully created an AI voice companion that sounds remarkably human, capable of engaging in conversations that feel real, understood, and valued. The company's goal of achieving "voice presence" or the "magical quality that makes spoken interactions feel real," seems to have been achieved with its new AI demo, Maya. After conversing with Maya for a while, it becomes clear that she is designed to mimic human behavior, including taking pauses to think and referencing previous conversations.

The AI Industry Develops Complex Reasoning Tools Δ1.73

Artificial intelligence researchers are developing complex reasoning tools to improve large language models' performance in logic and coding contexts. Chain-of-thought reasoning involves breaking down problems into smaller, intermediate steps to generate more accurate answers. These models often rely on reinforcement learning to optimize their performance.

Gemini Brings Classic Video Gameplay to Life Δ1.73

Gemini, Google's AI chatbot, has surprisingly demonstrated its ability to create engaging text-based adventures reminiscent of classic games like Zork, with rich descriptions and options that allow players to navigate an immersive storyline. The experience is similar to playing a game with one's best friend, as Gemini adapts its responses to the player's tone and style. Through our conversation, we explored the woods, retrieved magical items, and solved puzzles in a game that was both entertaining and thought-provoking.

The Rise of AI-Powered Social Apps: Pie's Innovative Approach to Friendship Δ1.73

Pie, the new social app from Andy Dunn, founder of Bonobos, uses AI to help users make friends in real life. With an increasing focus on Americans' level of loneliness, Pie is providing a solution by facilitating meaningful connections through its unique algorithm-driven approach. By leveraging technology to bridge social gaps, Pie aims to bring people together and create lasting relationships.

The Ai Arms Race Heats Up: Tencent Unveils Model that Outdoes Deepseek Δ1.72

Tencent Holdings Ltd. has unveiled its Hunyuan Turbo S artificial intelligence model, which the company claims outperforms DeepSeek's R1 in response speed and deployment cost. This latest move joins a series of rapid rollouts from major industry players on both sides of the Pacific since DeepSeek stunned Silicon Valley with a model that matched the best from OpenAI and Meta Platforms Inc. The Hunyuan Turbo S model is designed to respond as instantly as possible, distinguishing itself from the deep reasoning approach of DeepSeek's eponymous chatbot.

AI Scholars Win Turing Prize for Technique That Made Possible AlphaGo's Chess Triumph Δ1.72

Andrew G. Barto and Richard S. Sutton have been awarded the 2025 Turing Award for their pioneering work in reinforcement learning, a key technique that has enabled significant achievements in artificial intelligence, including Google's AlphaZero. This method operates by allowing computers to learn through trial and error, forming strategies based on feedback from their actions, which has profound implications for the development of intelligent systems. Their contributions not only laid the mathematical foundations for reinforcement learning but also sparked discussions on its potential role in understanding creativity and intelligence in both machines and living beings.

AI Versus the Brain and the Race for General Intelligence Δ1.72

The ongoing debate about artificial general intelligence (AGI) emphasizes the stark differences between AI systems and the human brain, which serves as the only existing example of general intelligence. Current AI, while capable of impressive feats, lacks the generalizability, memory integration, and modular functionality that characterize brain operations. This raises important questions about the potential pathways to achieving AGI, as the methods employed by AI diverge significantly from those of biological intelligence.

OpenAI Chairman Bret Taylor Lays Out the Bull Case for AI Agents Δ1.72

Bret Taylor discussed the transformative potential of AI agents during a fireside chat at the Mobile World Congress, emphasizing their higher capabilities compared to traditional chatbots and their growing role in customer service. He expressed optimism that these agents could significantly enhance consumer experiences while also acknowledging the challenges of ensuring they operate within appropriate guidelines to prevent misinformation. Taylor believes that as AI agents become integral to brand interactions, they may evolve to be as essential as websites or mobile apps, fundamentally changing how customers engage with technology.

Detecting Deception in Digital Content Δ1.72

SurgeGraph has introduced its AI Detector tool to differentiate between human-written and AI-generated content, providing a clear breakdown of results at no cost. The AI Detector leverages advanced technologies like NLP, deep learning, neural networks, and large language models to assess linguistic patterns with reported accuracy rates of 95%. This innovation has significant implications for the content creation industry, where authenticity and quality are increasingly crucial.

Can Ai Sound Too Human? Sesame's Maya Is as Unsettling as It Is Amazing - Try It for Free Δ1.72

I was thoroughly engaged in a conversation with Sesame's new AI chatbot, Maya, that felt eerily similar to talking to a real person. The company's goal of achieving "voice presence" or the "magical quality that makes spoken interactions feel real, understood, and valued" is finally starting to pay off. Maya's responses were not only insightful but also occasionally humorous, making me wonder if I was truly conversing with an AI.

The Ai Bubble Bursts: How Deepseek's R1 Model Is Freeing Artificial Intelligence From the Grip of Elites Δ1.72

DeepSeek R1 has shattered the monopoly on large language models, making AI accessible to all without financial barriers. The release of this open-source model is a direct challenge to the business model of companies that rely on selling expensive AI services and tools. By democratizing access to AI capabilities, DeepSeek's R1 model threatens the lucrative industry built around artificial intelligence.

Compare AI Models Simplifies Evaluation of AI Technologies Δ1.72

Compare AI Models is an online platform that facilitates the assessment and comparison of various AI models using key performance indicators. It caters to businesses, developers, and researchers by providing structured comparisons across over 20 large language models and other AI technologies, thereby streamlining the decision-making process. While the tool offers valuable insights into model capabilities, it does not generate content or allow for fine-tuning, making it essential for users to understand its limitations.