Models — AI news — AI News Digest

Snowflake CEO finds GLM-5.2 competitive with Opus 4.7 at a fraction of the cost

Zhipu AI's GLM-5.2 has demonstrated competitive performance with Anthropic's Claude Opus 4.7 in a benchmark involving 103 coding tasks, achieving similar results at one-fifth the cost per output token. However, GLM-5.2 consumes nearly twice as many tokens per task, highlighting a trade-off between cost efficiency and token usage. This pricing disparity could impact the market dynamics and valuations of Western AI companies.

The Decoder32 d agofound 12 d ago#glms#benchmark#cost

How Baidu's newly released Unlimited-OCR transcribes dozens of pages in one forward pass

Baidu has released Unlimited-OCR, an end-to-end OCR model capable of transcribing dozens of pages in a single forward pass using a novel attention mechanism called Reference Sliding Window Attention (R-SWA). This model, based on DeepSeek-OCR, maintains a fixed visual context while allowing generated text to attend only to a sliding window of previous tokens, significantly reducing memory overhead. Benchmarks indicate a performance improvement to 93.92% on OmniDocBench v1.6 compared to DeepSeek-OCR's 87.01%, although independent validation is recommended before final assessments. The model is available under the MIT license on platforms like Hugging Face and ModelScope, making it a valuable tool for practitioners in the OCR domain.

Reddit r/LocalLLaMA33 d agofound 12 d ago#ocr#unlimited-ocr#baidu

llama.cpp updates - granite-speech-4.1-2b, LFM2.5-ColBERT/Embedding-350M, Vulkan backend related changes & Misc items

The latest updates to llama.cpp include support for the granite-speech-4.1-2b model and the LFM2.5-ColBERT/Embedding-350M models. Significant enhancements to the Vulkan backend have been made, including support for 3D convolutions and various mathematical operations, which can improve performance in high-throughput scenarios. These updates are crucial for AI practitioners as they enhance model compatibility and computational efficiency, enabling better utilization of GPU resources.

Reddit r/LocalLLaMA33 d agofound 12 d ago#llama#updates#granite-speech

New EU model (Domyn) will be 400b.

The startup Domyn has announced the development of a new 400 billion parameter language model, building on their existing closed 260 billion parameter model, Domyn Large, aimed at enterprise applications, and a smaller 10 billion parameter model available on Hugging Face. This release signifies a significant increase in model size, potentially offering enhanced capabilities for complex language tasks. For AI practitioners, the availability of these models could provide new tools for developing applications that require advanced natural language understanding and generation.

Reddit r/LocalLLaMA33 d agofound 12 d ago#domyn#model

Qwen-AgentWorld-397B-A17B

The Qwen-AgentWorld-397B-A17B model has been announced, expanding on the previously released Qwen-AgentWorld-35B-A3B. While specific technical details such as model size and benchmark results are not provided in the current content, this release indicates ongoing advancements in the Qwen series, which may offer enhancements in performance and capabilities for practitioners working with large language models.

Reddit r/LocalLLaMA33 d agofound 12 d ago#qwen#model

Unlimited-OCR is now on ModelScope! A 3.3B multilingual OCR model for one-shot parsing across single images, multi-page documents, and PDFs. License: MIT

The Unlimited-OCR model, a 3.3 billion parameter multilingual OCR system, has been released on ModelScope, enabling one-shot parsing across single images, multi-page documents, and PDFs. It supports full-document parsing with a maximum output length of 32K tokens, offers base and gundam image modes for various document layouts, and utilizes Transformers inference with SGLang for OpenAI-compatible streaming requests. This model enhances capabilities in document parsing, making it significant for developers focusing on advanced OCR applications.

Reddit r/LocalLLaMA33 d agofound 12 d ago#ocr#multilingual#modelscope

ScaleToT: Generalizing Structured LLM Reasoning for Billion-Scale Low-Activity User Modeling

ScaleToT is a new model designed to enhance user modeling for billions of low-activity users by leveraging structured reasoning techniques. It employs a bounded entropy-guided Tree-of-Thought (ToT) refinement to create typed user-state chains from a small LLM-processed subset, which are then used to train a lightweight profile encoder via supervised fine-tuning and Outcome-Driven Segment-Aware Implicit Reward Policy Optimization (OSIPO). This approach significantly reduces computational costs associated with LLM inference while improving lifetime value (LTV) prediction, as demonstrated by a 6.738% increase in LT30 during a billion-scale advertising deployment.

arXiv cs.AI33 d agofound 10 d ago#llm#user modeling#structured reasoning

video-SALMONN-R$^3$: Learning to ReWatch, ReAsk, and ReAnswer for Efficient Video Understanding

The paper presents video-SALMONN-R$^3$, an end-to-end video large language model designed for efficient video understanding through a two-stage approach that includes re-watching segments at higher fidelity. This model leverages reinforcement learning to eliminate the need for costly chain-of-thought data annotations and incorporates a re-answer strategy to enhance response accuracy after re-watching, along with a re-ask mechanism to maintain query relevance. Experimental results indicate that video-SALMONN-R$^3$ outperforms previous models and benchmarks while reducing computational costs, making it a significant advancement for practitioners focused on video question answering.

arXiv cs.AI33 d agofound 10 d ago#video-llm#reinforcement-learning#qa

Datalab Releases lift: A 9B Open-Weights Vision Model That Extracts Structured JSON From PDFs Using Schemas

Datalab has released lift, a 9B open-weights vision model designed to convert PDFs and images into structured JSON that adheres to predefined schemas. It employs schema-constrained decoding to ensure valid output and utilizes trained abstention to avoid hallucinating absent fields, achieving a field accuracy of 90.2% on a benchmark of 225 documents. This model is significant for practitioners as it enhances the extraction of structured data from unstructured documents, improving data processing workflows in various applications.

MarkTechPost33 d agofound 21 d ago#datalab#vision_model#json

FUTO Swipe – A new swipe typing model

FUTO Swipe is a newly announced swipe typing model designed to enhance text input on mobile devices. While specific technical details such as model size or architecture changes were not disclosed, the model aims to improve user experience in typing efficiency and accuracy. This development is significant for practitioners focusing on natural language processing and user interface design in mobile applications.

Hacker News33 d agofound 12 d ago#typing#model

7 Chinese companies are already shipping H100/H200-class AI chips, most IPO'd in the last 6 months. I mapped all of them.

Seven Chinese companies have begun shipping AI accelerators comparable to NVIDIA's H100 and H200, with most having gone public in the last six months. Notably, Huawei shipped 812,000 AI cards last year, and their Ascend 950 targets H200-class performance. Additionally, Alibaba's new server can host 1.5TB of VRAM, sufficient for running frontier models fully on-premises, indicating a significant shift in the AI hardware landscape in China as local production gains ground against NVIDIA's market share.

Reddit r/LocalLLaMA33 d agofound 21 d ago#ai-chips#h100#h200

Cursor announces its own AI model, a new Git platform, and a mobile app

Cursor has announced its first in-house trained AI model, alongside a new Git platform and a mobile application. Specific technical details regarding the model's architecture, size, or benchmark results were not disclosed. This release is significant for practitioners as it indicates a growing trend of companies developing proprietary AI models and tools, potentially impacting the competitive landscape of AI development environments.

The Decoder33 d agofound 21 d ago#cursor#ai_model#git

OpenAI says new GPT-5.5-Cyber outperforms Anthropic's Mythos on cybersecurity benchmark

OpenAI announced the release of the GPT-5.5-Cyber model, which reportedly outperforms Anthropic's Mythos on cybersecurity benchmarks. This model is part of the expanded Daybreak cybersecurity initiative, featuring an updated Codex Security plugin that emphasizes automatic patching of vulnerabilities rather than just detection. This development is significant for practitioners as it enhances automated security measures, potentially reducing the time and resources needed for vulnerability management in AI applications.

The Decoder34 d agofound 21 d ago#openai#gpt#cybersecurity

NNiT: Width-Agnostic Neural Network Generation with Structurally Aligned Weight Spaces

The paper introduces Neural Network Diffusion Transformers (NNiTs), which enable width-agnostic generation of neural network weights by tokenizing weight matrices into patches and utilizing Graph HyperNetworks (GHNs) with a CNN decoder for structural alignment. This method allows for the generation of fully functional Multilayer Perceptrons (MLPs) across various architectures, achieving over 85% success on unseen architecture topologies in ManiSkill3 robotics tasks, while traditional approaches struggle with generalization. This advancement is significant for practitioners as it facilitates the creation of adaptable neural network architectures that can effectively generalize across diverse applications.

arXiv cs.AI34 d agofound 14 d ago#neural networks#weight generation#diffusion

TinyGiantALM: A Compact Audio-Language Model for Intent-Aware Reasoning under Resource Constraints

TinyGiantALM is a compact 1.5B parameter audio-language model designed for efficient intent-aware reasoning in resource-constrained environments. Utilizing an Instruction-Aware Feature Refinement framework with a Query-guided Projector and Semantic Gating, it achieves 46.4% zero-shot accuracy on the MMAR benchmark, outperforming larger 7B-13B models. This model presents a viable solution for practitioners needing robust audio reasoning capabilities without the resource demands of larger models, particularly in mixed-modality scenarios.

arXiv cs.CL34 d agofound 12 d ago#audio-language-model#intent-aware#resource-constraints

FiLM-Coordinated Dual-Branch Transformer for Global-Local Dependency Modeling in Language Modeling

The article introduces a FiLM-coordinated dual-branch Transformer architecture designed for improved modeling of global and local dependencies in language tasks. This model features distinct global and local branches within each layer, utilizing feature-wise linear modulation (FiLM) for dynamic coordination, which enhances channel-wise calibration over traditional methods. Experimental results demonstrate that this architecture outperforms single-branch baselines on benchmarks like TinyShakespeare and a subset of WikiText-2, indicating its potential for more efficient representation learning in language modeling.

arXiv cs.AI34 d agofound 16 d ago#transformers#language modeling#attention

BioMatrix: Towards a Comprehensive Biological Foundation Model Spanning the Modality Matrix of Sequences, Structures, and Language

BioMatrix is introduced as the first multimodal foundation model that integrates molecular sequences, structures, and natural language within a single decoder-only architecture. It utilizes a unified tokenization scheme to map various biological modalities into a shared discrete token space, allowing for uniform processing under a next-token prediction objective. Built on the Qwen3 model with sizes of 1.7B and 4B parameters, it has been pretrained on 304.4 billion tokens and demonstrates state-of-the-art performance across 77 out of 80 tasks in biological applications, highlighting its potential as a versatile tool for practitioners in the field.

arXiv cs.AI34 d agofound 16 d ago#foundation model#biological#multimodal

B[FM]$^2$: Brain Foundation Model via Flow Matching with SplitUNet

The article presents B[FM]$^2$, a novel EEG foundation model that utilizes continuous-time flow matching to learn from raw EEG signals without discretization, tokenization, or masking. It introduces SplitUNet, an architecture that separates 1D temporal and electrode convolutions to address the challenges posed by the dense sampling of time and limited electrode channels. B[FM]$^2$ achieves state-of-the-art performance on 7 out of 9 standard EEG classification tasks with significantly reduced pretraining requirements, demonstrating its potential for efficient transfer learning in clinical and brain-computer interface applications.

arXiv cs.AI34 d agofound 16 d ago#eeg#foundation#model

What Language is This? Ask Your Tokenizer

UniLID is a novel language identification (LID) method that utilizes the UnigramLM tokenization algorithm, focusing on probabilistic language modeling to determine the most likely language of a given string. It is designed to be data- and compute-efficient, allowing for the incremental addition of new languages without retraining, and integrates seamlessly into existing language model pipelines. Empirical results indicate that UniLID achieves approximately 70% accuracy with only five labeled samples per language, outperforming traditional baselines like fastText and CLD3, particularly in low-resource and dialect identification scenarios, making it a valuable tool for enhancing multilingual NLP applications.

arXiv cs.CL34 d agofound 12 d ago#language identification#tokenization

YOLO26 vs. YOLOv8: A Comprehensive Architectural Benchmark of Next-Generation Real-Time Object Detection Models

This paper evaluates the performance of Ultralytics YOLO26 against YOLOv8, focusing on NMS-free architectures for real-time object detection in edge deployment scenarios. YOLO26 features a spectral-constrained CSP-Muon backbone and eliminates Distribution Focal Loss, achieving superior accuracy on the Pascal VOC dataset (0.635 mAP_50:95) while showing minimal performance differences on the VisDrone dataset. The study highlights that YOLOv8 maintains lower GPU inference latency (6.92 ms vs. 8.38 ms for YOLO26), indicating that NMS-free designs may not universally outperform traditional architectures, thus informing practitioners about architecture selection based on specific application needs.

arXiv cs.AI34 d agofound 12 d ago#yolo#object detection#benchmark

BLUEX v2: Benchmarking LLMs on Open-Ended Questions from Brazilian University Entrance Exams

BLUEX v2 introduces a new benchmark for evaluating Large Language Models (LLMs) on open-ended, discursive tasks derived from the second-phase entrance exams of Brazil's UNICAMP and USP, covering exam years 2022-2025. The benchmark includes 395 questions and 919 graded subquestions, with 55.7% containing images, and evaluates 21 state-of-the-art LLMs using an LLM-as-a-judge protocol, revealing a performance spread from 4.18 to 9.10 on a 0-10 scale. This resource is significant for practitioners as it addresses the gap in Portuguese-language LLM assessment, particularly for complex reasoning tasks, and provides a publicly available dataset and evaluation framework.

arXiv cs.CL34 d agofound 13 d ago#llm#benchmark#portuguese

Vesta: A Generalist Embodied Reasoning Model

Vesta is introduced as a unified embodied reasoning model that integrates localization, spatial reasoning, navigation, and long-horizon planning into a single foundation model, addressing the inefficiencies of multi-model stacks. It utilizes a large curated corpus for spatial grounding and a multimodal memory system, achieving over 20% improvement on average against individual state-of-the-art (SOTA) models and over 10% against ensembles of top-performing models across various benchmarks. This model significantly enhances task success rates by more than 35% in real-world robotic applications, suggesting that generalist models can effectively replace specialized systems, offering a scalable solution for practitioners in AI and robotics.

arXiv cs.AI34 d agofound 16 d ago#robotics#reasoning#generalist

Structured Recurrent Mixers for Massively Parallelized Sequence Generation

The article introduces the Structured Recurrent Mixer (SRM), a novel architecture that enables the conversion between sequence parallel representation during training and recurrent representation during inference, optimizing both efficiency and throughput. Experimental results indicate that SRMs achieve 12x higher throughput and 170x greater concurrency compared to traditional Transformers, alongside a 30% increase in compute-constant GSM8k Pass@k performance. This architecture is particularly significant for practitioners as it enhances the handling of extended sequence lengths and information-rich inputs while maintaining training efficiency and scalability in batch processing.

arXiv cs.CL34 d agofound 12 d ago#sequence-generation#training#architecture

dMoE: dLLMs with Learnable Block Experts

The dMoE framework introduces a block-level Mixture-of-Experts (MoE) architecture designed for Diffusion Large Language Models (dLLMs), addressing the inefficiencies of token-level expert selection during parallel decoding. By aggregating token-level expert distributions into a unified block-level distribution, dMoE significantly decreases the number of uniquely activated experts from 69.5 to 14.6, while maintaining 99.11% of original performance. This innovation results in a 76.64% to 79.84% reduction in memory usage and a latency speedup of 1.14× to 1.66×, making it a valuable advancement for practitioners working with large-scale language models.

arXiv cs.CL34 d agofound 12 d ago#diffusion-models#mixture-of-experts#inference

LALE: Lightweight-Transformer Architecture for Land-Cover Estimation

LALE (Lightweight-transformer Architecture for Land-cover Estimation) is a novel end-to-end semantic segmentation model designed for remote sensing imagery, integrating ConvMixer and transformer stages to optimize for both local and global features while minimizing computational overhead. The architecture achieves a strong efficiency-performance trade-off, with its smallest variant having only 1.6M parameters, yet performing within 2.6 F1 points of the best baseline (UPerNet) while using significantly fewer resources (4.5x fewer parameters, 7x less storage, and 17x fewer GMACs). This model is particularly relevant for practitioners seeking efficient segmentation solutions in resource-constrained environments.

arXiv cs.AI34 d agofound 12 d ago#semantic segmentation#remote sensing#lightweight

VoidPadding: Let [VOID] Handle Padding in Masked Diffusion Language Models so that [EOS] Can Focus on Semantic Termination

VoidPadding introduces a novel padding token, [VOID], to separate the roles of padding and semantic termination in Masked Diffusion Language Models (MDLMs), specifically addressing the issue of [EOS] token overflow during large-block decoding. Implemented on the Dream-7B-Instruct model, VoidPadding achieves a significant improvement of +17.84 points in a four-task mean across mathematical reasoning and code generation benchmarks, while also reducing decoding NFE by 55.7% compared to previous methods. This approach enhances the efficiency and effectiveness of instruction tuning in MDLMs, making it a valuable development for practitioners in AI model training.

arXiv cs.CL34 d agofound 12 d ago#masked-diffusion#padding#llm

ACTIVA: Amortized Causal Effect Estimation via Transformer-based Variational Autoencoder

ACTIVA is a transformer-based conditional variational autoencoder designed for amortized causal effect estimation from observational data. It introduces a conditional latent prior enabling zero-shot inference and demonstrates superior performance on synthetic datasets and gene-expression simulations compared to correlational and other amortized baselines. This model addresses the challenges of causal ambiguity and restrictive assumptions, making it relevant for practitioners focused on causal inference and interventional distribution estimation in AI applications.

arXiv cs.AI34 d agofound 14 d ago#causal inference#variational autoencoder#transformers

Latent Personal Memory: Represent personal memory as dynamic soft prompts

The article introduces Latent Personal Memory (LPM), a framework for personalizing large language models (LLMs) by encoding user-specific behavioral patterns as a compact matrix of latent slots. LPM utilizes a cross-attention projection network to generate dynamic soft prompts that are prepended to the input of a frozen LLM, demonstrating superior performance on the PersonaMem v1 and LoCoMo benchmarks with Qwen3 models, achieving up to 8.8% and 54.4% accuracy improvements over LoRA and Prompt Tuning, while significantly reducing KV-cache usage. This approach enhances efficiency with increasing context lengths, making it a valuable method for practitioners looking to optimize personalization in LLMs.

arXiv cs.AI34 d agofound 16 d ago#llm#memory#personalization

L20-Edu-135M: An Auditable Single-GPU Study of Data-Efficient Small Language Modeling

The L20-Edu-135M model, a 134.5M-parameter language model, has been released, trained on approximately 13 billion tokens using a single NVIDIA L20 GPU. Key technical features include data-efficient training techniques like cross-source MinHash/LSH near-deduplication and supervised fine-tuning with weight interpolation, achieving a mean score of 0.4150 on the GSM8K benchmark, which is 87.1% of the performance of larger models SmolLM-135M and SmolLM2-135M. This work highlights the potential for effective small language models in resource-constrained environments, offering insights into their performance relative to larger counterparts.

arXiv cs.AI34 d agofound 16 d ago#language modeling#data-efficient#small models

GLM-5.2 vs Claude Opus

The article compares the GLM-5.2 model with Claude Opus, focusing on their respective architecture and performance metrics. It highlights differences in model size, training methodologies, and benchmark results that impact their efficiency in various AI applications. This comparison is significant for practitioners as it provides insights into model selection based on specific use cases and performance requirements in LLM deployments.

Reddit r/LocalLLaMA34 d agofound 21 d ago#GLM-5.2#Claude

The text in Claude Code’s “Extended Thinking” output

The article does not provide any specific information regarding releases, model sizes, benchmark results, or architecture changes related to Claude Code's "Extended Thinking" output. Therefore, no relevant summary can be generated based on the provided content.

Hacker News34 d agofound 21 d ago#claude#extended thinking

GLM5.2 @7tg on 4x3090 + 192GB on budget motherboard + cpu

A user has successfully built a home lab using four NVIDIA GeForce RTX 3090 GPUs, 192GB of overclocked DDR5 RAM, and a budget motherboard, running the GLM5.2 model at 7 teragroups (tg). This setup allows for efficient local execution of various AI models, including MiniMax 2.7 for coding and Qwen3.6 for testing, demonstrating the viability of consumer hardware for AI workloads despite the limitations of non-ECC memory. This approach is significant for practitioners looking to optimize costs while maintaining performance in AI model deployment.

Reddit r/LocalLLaMA34 d agofound 21 d ago#GLM5.2#hardware#setup

GLM 5.2 vs. Opus

The article compares GLM 5.2 with Opus, highlighting differences in performance and architecture. GLM 5.2, a large language model, offers improvements in efficiency and accuracy, while Opus focuses on multi-modal capabilities. This comparison is significant for AI practitioners as it provides insights into model selection based on specific application needs and performance benchmarks.

Hacker News35 d agofound 21 d ago#glm#opus

GLM-5.2 is on DeepSWE

GLM-5.2 has been released on the DeepSWE benchmarking platform, which provides insights into model performance and pricing dynamics. The benchmarks suggest that pricing becomes more competitive as model scores improve, with DeepSWE indicating a significant reduction in costs for certain models post-discount. This release is relevant for practitioners as it offers a new resource for evaluating model performance and cost-effectiveness, potentially influencing decisions on model selection and deployment in real-world applications.

Reddit r/LocalLLaMA35 d agofound 21 d ago#GLM-5.2#DeepSWE

Gemma 4 26b a4b is genuinely the best model I have tried for language learning and scientific queries!

Gemma 4, a 26 billion parameter model, has been highlighted for its superior performance in language learning and scientific queries, particularly in health-related fields, compared to competitors like Qwen 3.5/3.6. Despite its limitations in coding tasks, its efficacy in niche applications could inform practitioners focusing on specialized use cases beyond traditional coding and agentic tasks. The discussion underscores the need for more models in the 20-30 billion parameter range to address diverse application requirements.

Reddit r/LocalLLaMA36 d agofound 22 d ago#gemma#language learning#scientific queries

[NEW MODEL] SupraLabs just released supra-title-FFT-preview, 115K samples, almost 10x our first chat title dataset

SupraLabs has released the supra-title-FFT-preview model, which is trained on 115K samples from a filtered dataset, significantly increasing the dataset size compared to its predecessor, the Supra-Title-350M-exp, which was trained on only 12K samples. The model, based on the LiquidAI/LFM2.5-350M-Base architecture with approximately 0.4 billion parameters, utilizes BF16 precision and is designed specifically for chat title generation without the need for a system prompt. This enhanced model offers improved coverage and performance for practitioners focusing on generating relevant titles in conversational AI applications.

Reddit r/LocalLLaMA37 d agofound 22 d ago#supra#model#dataset

VibeThinker-3B: A 3B Dense Reasoning Model Built on Qwen2.5-Coder-3B With the Spectrum-to-Signal Post-Training Pipeline

VibeThinker-3B is a 3 billion parameter dense reasoning model developed on the Qwen2.5-Coder-3B architecture, utilizing the Spectrum-to-Signal post-training pipeline. It demonstrates competitive performance against models such as DeepSeek V3.2 and Kimi K2.5 on verifiable benchmarks. This model is significant for practitioners as it offers a robust option for reasoning tasks, enhancing capabilities in applications requiring dense reasoning.

MarkTechPost37 d agofound 22 d ago#vibethinker#model#reasoning

GLM-5.2-REAP50-GGUF

The GLM-5.2-REAP50 models have been released in two variants: GLM-5.2-REAP50-Q3_K_M-GGUF with a size of 182 GB and GLM-5.2-REAP50-Q2_K-GGUF at 139 GB. The release is significant as it provides large model options for practitioners, potentially enhancing performance in tasks compared to existing models like Qwen 3.6 27b. The models are available on Hugging Face, enabling easy access for integration and experimentation.

Reddit r/LocalLLaMA37 d agofound 24 d ago#glm#qwen#comparison

GPT-5.5 hallucinates 3x more than MIT-licensed GLM-5.2

The article compares the hallucination rates of GPT-5.5 and MIT-licensed GLM-5.2, revealing that GPT-5.5 exhibits three times more hallucinations than GLM-5.2. This finding is significant for practitioners as it highlights the reliability issues of newer models like GPT-5.5, suggesting that developers may need to implement additional validation mechanisms when integrating such models into applications.

Hacker News37 d agofound 22 d ago#gpt-5.5#hallucination

What's more impressive, GLM 5.1 -> 5.2 or Qwen 3.5 -> 3.6?

GLM 5.2 has been released with improved capabilities, particularly in handling German language inputs related to specific topics, such as Döner kebabs, showcasing enhanced contextual understanding. Meanwhile, Qwen 3.6 has been updated to a model size of 35 billion parameters and utilizes Unsloth Q8 K XL quantization via llama cpp, which may provide performance benefits in deployment. These updates are significant for practitioners as they enhance model robustness and efficiency, enabling better performance in specialized applications.

Reddit r/LocalLLaMA37 d agofound 24 d ago#glm#qwen#model#release

GLM-5.2 is the new leading open weights model on the Artificial Analysis Intelligence Index

GLM-5.2 has been announced as the leading open weights model on the Artificial Analysis Intelligence Index. While specific technical details such as model size, benchmark results, and architecture changes are not provided in the content, its status as a leading model indicates significant advancements in performance metrics relevant to practitioners. This development may influence the selection of models for applications requiring robust performance in AI tasks.

Reddit r/LocalLLaMA37 d agofound 24 d ago#glm#open weights#ai index

spec: support eagle3 for qwen3.5 & 3.6 by ruixiang63 · Pull Request #24593 · ggml-org/llama.cpp

A pull request (PR #24593) has been submitted to the ggml-org/llama.cpp repository to add support for the Eagle3 architecture in the Qwen 3.5 and 3.6 models. This enhancement aims to evaluate performance improvements over the existing MTP configuration. For practitioners, this integration could facilitate better model efficiency and performance tuning when working with Qwen models on the Eagle3 architecture.

Reddit r/LocalLLaMA38 d agofound 24 d ago#qwen#eagle#support

Lagrange: An Open-Vocabulary, Energy-Based Sparse Framework for Generalized End-to-End Driving

The article introduces Lagrange, an open-vocabulary, energy-based sparse framework designed for end-to-end autonomous driving, which addresses the limitations of existing models in handling complex, open-world environments. It utilizes Masked Latent Fields (MLF) and Vision-Language Models (VLMs) to generate continuous semantic visual tokens, implementing an intent-driven masked cross-attention mechanism for effective entity filtering and decision-making through Lagrangian action minimization. The framework shows promising results in offline evaluations on nuScenes and CODA benchmarks, offering a robust and interpretable solution for real-world driving scenarios that require compliance with vehicle kinematics and collision avoidance.

arXiv cs.AI38 d agofound 24 d ago#autonomous driving#energy-based#planning

Scaling Generative Foundation Models for Chest Radiography with Rectified Flow Transformers

The article presents a new generative foundation model for chest radiograph synthesis, boasting over 1.3 billion parameters and trained on 1.6 trillion tokens from a diverse dataset of 1.2 million radiographs. This model enhances the fidelity of synthesized images, achieving results indistinguishable from real radiographs, and supports controlled generation across various demographic subgroups and pathologies. This advancement is significant for practitioners as it addresses the limitations of existing models in generalization and clinical applicability, facilitating the creation of more robust diagnostic tools.

arXiv cs.AI38 d agofound 23 d ago#generative-models#chest-radiography#foundation-model

eCNNTO: A Highly Generalizable ConvNet for Accelerating Topology Optimization

The article presents eCNNTO, a novel element-based Convolutional Neural Network designed to accelerate density-based Topology Optimization (TO) by predicting near-optimal densities for finite element analysis. This approach incorporates residual connections to capture spatial correlations among neighboring elements and introduces a training strategy that utilizes final stage density histories, significantly improving optimization efficiency and generalization across various conditions. eCNNTO demonstrates impressive performance, achieving up to 90% and 97% reductions in iterations for two-dimensional and three-dimensional problems, respectively, making it a valuable tool for practitioners seeking to enhance TO processes.

arXiv cs.AI38 d agofound 24 d ago#cnn#topology-optimization

PrototypeNAS: Rapid Design of Deep Neural Networks for Microcontroller Units

PrototypeNAS is a zero-shot neural architecture search (NAS) method designed to optimize deep neural networks (DNNs) for microcontroller units (MCUs) without the need for extensive training of multiple models. It employs a three-step search strategy that integrates structural optimization of various architectures, utilizes an ensemble of zero-shot proxies for optimization, and applies Hypervolume subset selection for effective model distillation. This approach enables rapid identification of compact DNN models that maintain high accuracy, making it significant for practitioners aiming to deploy efficient AI solutions on resource-constrained edge devices.

arXiv cs.AI38 d agofound 23 d ago#neural architecture search#dnn#edge devices

From Construction to Injection: Edit-Based Fingerprints for Large Language Models

The article presents an end-to-end injected fingerprinting framework for large language models (LLMs) aimed at improving ownership verification against unauthorized use. It introduces Code-mixing Fingerprints (CF) and Multi-Candidate Editing (MCEdit) to overcome imperceptibility trade-offs and maintain persistent behaviors despite model modifications. This framework is crucial for practitioners as it enhances the robustness of model ownership verification while minimizing impacts on model utility.

arXiv cs.AI38 d agofound 23 d ago#llm#fingerprinting#model_protection

Large Language Models Do Not Always Need Readable Language

The paper presents BabelTele, a novel model-centric textual representation that allows for encoding semantic information in non-human-readable forms while maintaining high semantic fidelity. It demonstrates that LLMs can achieve 99.5% semantic fidelity with representations condensed to 27.9% of their original length, suggesting a potential reduction in context overhead for downstream tasks. This approach indicates a shift towards model-native representations, which could enhance efficiency and performance in multi-agent communication and cross-model transfer scenarios.

arXiv cs.AI38 d agofound 23 d ago#large language models#representation#BabelTele

Variable-Length Tokenization via Learnable Global Merging for Diffusion Transformers

The article presents a novel variable-length tokenizer for Latent Diffusion Models (LDMs) that uses learnable global merging to improve token representation across varying lengths. This approach allows for adaptive compression without the semantic inconsistencies associated with traditional token truncation methods. The proposed tokenizer achieves a superior generative performance on ImageNet 256×256, demonstrating better quality-compute trade-offs than existing variable-length tokenization methods, which is critical for practitioners aiming to optimize LDMs in visual synthesis tasks.

arXiv cs.AI38 d agofound 23 d ago#tokenization#diffusion models#transformers

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

The DeepSeek-V4 series introduces two Mixture-of-Experts (MoE) language models: DeepSeek-V4-Pro with 1.6 trillion parameters (49 billion activated) and DeepSeek-V4-Flash with 284 billion parameters (13 billion activated), both capable of processing contexts up to one million tokens. Key architectural advancements include a hybrid attention mechanism utilizing Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA), along with Manifold-Constrained Hyper-Connections (mHC) and the Muon optimizer, which collectively enhance efficiency and stability. This development significantly reduces inference FLOPs and KV cache usage for long-context scenarios, making it a valuable resource for practitioners focusing on large-scale, long-horizon tasks in AI applications.

arXiv cs.AI38 d agofound 23 d ago#Mixture-of-Experts#context length#language models

IHUBERT: Vector-Based Semantic Deduplication and Domain-Balanced Pretraining for Persian Resources

IHUBERT is a newly released monolingual Persian pretrained language model based on the RoBERTa-base architecture with 125 million parameters, trained on a 45 GB subset of the Sepahr-Danesh collection, amounting to approximately 7-8 billion tokens. It incorporates a multi-stage preprocessing pipeline for semantic deduplication and distribution balancing, and it achieves state-of-the-art results on several Persian NLU benchmarks, particularly in extractive question answering. This model enhances the capabilities for practitioners working with Persian language processing by providing a high-quality, semantically curated resource that addresses previous limitations in the availability of training data and evaluation diversity.

arXiv cs.AI38 d agofound 23 d ago#persian#language models#pretraining

Benchmarking Local LLMs for Natural-Language-to-SQL Querying in Biopharmaceutical Manufacturing: An Empirical Benchmark on Consumer-Grade Hardware

This study benchmarks four open-source local LLMs (Qwen 2.5 Coder 7B, Llama 3.1 8B, Mistral 7B, and Meditron 7B) for natural-language-to-SQL query generation in biopharmaceutical manufacturing, utilizing a synthetic database with 63,000 records. The evaluation, conducted via a FastAPI platform, revealed that Llama 3.1 8B achieved the highest SQL compliance, while Qwen 2.5 Coder 7B excelled in text similarity and factual consistency, indicating that general-purpose LLMs can outperform specialized models in this domain. These findings suggest that while local, GxP-aligned systems are viable on consumer-grade hardware, they still necessitate human oversight for compliance in regulated environments.

arXiv cs.CL38 d agofound 22 d ago#natural-language-to-sql#biopharmaceuticals#llm-benchmarking

Emyx: Fast and efficient all-atom protein generation

Emyx is a new 140M-parameter conditional flow matching model designed for efficient all-atom protein generation, emphasizing geometric accuracy and structural diversity. It utilizes standard transformer blocks with lightweight conditional representations, significantly reducing training costs to 682 GPU-hours—approximately four times less than RFdiffusion3. Emyx demonstrates superior performance on the AME enzyme design benchmark, outperforming larger models like Proteína-Complexa and RFdiffusion3 in terms of strict evaluation metrics, making it a valuable tool for practitioners in computational enzyme design.

arXiv cs.AI38 d agofound 23 d ago#protein-generation#generative-models#all-atom#emyx

Residual-Space Evolutionary Optimization via Flow-based Generative Models

The article presents a novel framework called residual-space evolutionary optimization, which integrates flow-based generative models with evolutionary algorithms to enhance data editing capabilities. This model-agnostic approach leverages conditional flow matching (CFM) to operate in residual space, allowing for targeted local exploitation and broader exploration of data. Validation on the MorphoMNIST dataset and crystal data showcases its effectiveness in balancing target alignment, instance preservation, and diversity, making it relevant for practitioners aiming to improve generative editing in both synthetic and scientific applications.

arXiv cs.AI38 d agofound 24 d ago#llm#bim#benchmark

QMFOL: Benchmarking Large Language Model Reasoning via Quantifiable Monadic First-Order Logic Test Case Generation

QMFOL is a newly proposed framework for generating monadic first-order logic reasoning tasks, aimed at enhancing the evaluation of large language models (LLMs) in deductive reasoning. It allows for precise control over logical complexity through the creation of formal structures, which are then translated into natural language, ensuring logical consistency via external provers. The accompanying benchmark, QMFOLBench, includes 2880 instances and reveals that model performance varies significantly with logical complexity and semantic diversity, highlighting the need for more nuanced evaluation metrics in LLMs.

arXiv cs.AI38 d agofound 24 d ago#llm#benchmarking#reasoning

Giving GLM-5.2 a spin locally on CPU only! (poor man's rig for big models)

The article discusses running the GLM-5.2 model locally on a CPU-only setup using a Dell PowerEdge R740 with dual Xeon 6248R CPUs and 768 GB of RAM. Utilizing the ik_llama.cpp framework for improved CPU inference, the author reports generation speeds of 4 to 5.5 tokens per second with a context size of 1 million tokens, although performance declines with increasing context length. This exploration demonstrates the feasibility of deploying large models on local hardware, highlighting potential advancements in accessibility for practitioners working with AI models.

Reddit r/LocalLLaMA38 d agofound 24 d ago#glm-5.2#cpu-inference

LFM2.5-Embedding-350M & LFM2.5-ColBERT-350M

LFM2.5-Embedding-350M and LFM2.5-ColBERT-350M have been released, offering state-of-the-art multilingual retrieval capabilities. The LFM2.5-Embedding-350M model is a dense bi-encoder that produces a single vector per document for efficient cross-lingual search across 11 languages, while LFM2.5-ColBERT-350M employs a late interaction approach with token-level vector storage and MaxSim matching for high-accuracy retrieval. Both models demonstrate inference speeds comparable to smaller models and can be integrated into existing RAG pipelines, making them valuable for practitioners seeking to enhance multilingual retrieval performance.

Reddit r/LocalLLaMA38 d agofound 25 d ago#lfm#embedding#colbert

poolside/Laguna-M.1 · Hugging Face - 225B-A23B

Laguna M.1 is a newly released 225 billion parameter Mixture-of-Experts (MoE) model featuring 23 billion activated parameters per token, optimized for agentic coding and long-horizon tasks. It employs a 70-layer architecture with 67 sparse MoE layers, 256 experts, and global attention, achieving competitive benchmark results on SWE-bench and Terminal-Bench 2.0. The model supports interleaved reasoning and has a context window of 262,144 tokens, making it a significant advancement for practitioners focusing on high-capacity AI applications.

Reddit r/LocalLLaMA38 d agofound 25 d ago#laguna#hugging_face#moe

Two Qwen3 models on one DGX Spark: the residency math

The article discusses the deployment of two Qwen3 models on a single DGX Spark system, focusing on the computational requirements and residency calculations for effective utilization. It highlights the model's architecture and performance benchmarks, emphasizing the efficiency of running multiple models concurrently on high-performance hardware. This information is crucial for practitioners aiming to optimize resource allocation and performance when working with large language models in a distributed environment.

Hacker News38 d agofound 21 d ago#qwen3#models#dgx#spark

unsloth GLM-5.2-GGUF , including 2bit at 238GB

The unsloth GLM-5.2-GGUF model has been released, featuring a size of 238GB with 2-bit quantization. This model is significant for practitioners as it offers a more efficient storage option while maintaining performance, potentially enabling broader deployment of large language models in resource-constrained environments.

Reddit r/LocalLLaMA38 d agofound 25 d ago#glm#gguf#open_source

GLM's founder says GLM-fable before the end of the year?!

GLM's founder has indicated that the GLM-fable model is expected to be released before the end of the year. While specific technical details such as model size or architecture changes were not disclosed, the announcement suggests potential advancements in the GLM framework that could impact practitioners working with large language models (LLMs) and AI applications. This release may enhance capabilities or performance benchmarks relevant to the development of AI systems.

Reddit r/LocalLLaMA38 d agofound 25 d ago#glm#open_source#model_release

Kwai-Keye/Keye-VL-2.0-30B-A3B-GGUF · Hugging Face

Kwai-Keye has released Keye-VL-2.0-30B-A3B, a 30 billion parameter model designed for advanced long-video understanding and multimodal agent capabilities. It features a DSA-native long-context architecture that utilizes sparse attention for efficient processing of hour-long videos, and it leads benchmarks in video comprehension while offering robust post-training mechanisms to enhance reasoning and reduce hallucinations. This model is significant for practitioners as it integrates agent functionalities that support complex tasks like tool usage and web searches, marking a notable advancement in multimodal AI capabilities.

Reddit r/LocalLLaMA39 d agofound 25 d ago#keye#hugging_face#video_understanding

Google's Gemini co-lead Noam Shazeer joins OpenAI after two-year return stint

Noam Shazeer, a key figure in the development of the "Attention Is All You Need" architecture and co-lead of Google's Gemini models, has joined OpenAI after a brief return to Google. His transition follows significant talent movements in the AI sector, including Andrej Karpathy's shift to Anthropic, highlighting ongoing competitive dynamics in AI leadership. This move could influence OpenAI's future model development and research direction, given Shazeer's expertise in transformer architectures.

The Decoder39 d agofound 25 d ago#google#gemini#openai

Dango: A Strictly L1-Only Large Language Model for Studying Second Language Acquisition

Dango is a newly introduced 1.8B-parameter large language model specifically designed for studying L1-to-L2 transfer in second language acquisition, focusing on Japanese-to-English. It utilizes a novel filtering method to mitigate L2 contamination during pretraining, enabling more realistic simulations of L2 acquisition. The model has been fine-tuned on L2-learning lessons and demonstrates superior performance in generating human-like L2 production patterns compared to existing multilingual baselines, making it a valuable tool for researchers and practitioners in computational SLA.

arXiv cs.CL39 d agofound 25 d ago#llm#language model#second language acquisition

DreamReasoner-8B: Block-Size Curriculum Learning for Diffusion Reasoning Models

DreamReasoner-8B is an open-source block diffusion reasoning model that employs a block-size curriculum learning approach to enhance long chain-of-thought reasoning. The model demonstrates that training with small block sizes significantly improves reasoning performance compared to larger block sizes, achieving competitive results against leading autoregressive models like Qwen3-8B on mathematical and code reasoning benchmarks. This advancement provides a practical framework for developing efficient diffusion models capable of complex reasoning tasks.

arXiv cs.CL39 d agofound 25 d ago#block diffusion#reasoning#training

Sumi: Open Uniform Diffusion Language Model from Scratch

The article introduces Sumi, a 7 billion parameter uniform diffusion language model (UDLM) pretrained from scratch on 1.5 trillion tokens. Sumi demonstrates competitive performance on knowledge, reasoning, and coding benchmarks compared to autoregressive models trained with similar token budgets, although it underperforms in commonsense tasks due to its specific data mixture. This release, including model weights and training recipes, aims to provide a reference point for future research on the scaling behavior and dynamics of uniform diffusion models in AI.

arXiv cs.CL39 d agofound 25 d ago#diffusion#language-model#pretraining

OpenAI Releases LifeSciBench, a 750-Task Benchmark Grading AI Models on Real Life-Science Research With Expert-Written Rubric

OpenAI has released LifeSciBench, a benchmark comprising 750 expert-authored tasks designed to evaluate AI models on their capabilities in real-life scientific research across seven workflows and biological domains. The benchmark, developed by 173 PhD scientists with 19,020 rubric criteria, emphasizes reasoning and decision-making rather than mere recall, with the top-performing model, GPT-Rosalind, achieving a passing rate of only 36.1%, indicating significant potential for improvement in AI's performance in life sciences. This benchmark is crucial for practitioners as it provides a structured way to assess and enhance AI models in complex scientific contexts.

MarkTechPost39 d agofound 25 d ago#benchmark#lifescibench#openai

I released Inflect-Nano, an ultra-extreme tiny 4.63m parameter TTS model.

The Inflect-Nano-v1 model has been released, featuring a total of 4.63 million parameters, with 3.46 million dedicated to the acoustic model and 1.17 million for the vocoder. It operates at 24 kHz and is designed for English speech synthesis using a single male voice, making it approximately 17 times smaller than Kokoro and 108 times smaller than Chatterbox. This model is significant for practitioners focusing on ultra-compact TTS solutions suitable for low-resource environments, embedded systems, and local voice applications, despite its limitations in quality and performance compared to larger models.

Reddit r/LocalLLaMA39 d agofound 25 d ago#tts#inflect#nano

Zhipu AI's GLM-5.2 closes in on closed-source leaders in coding marathons

Zhipu AI has released GLM-5.2, an open-source model featuring a stable 1-million-token context, under the MIT license. In the FrontierSWE benchmark for coding tasks, GLM-5.2 closely competes with Anthropic's Claude Opus 4.8, trailing by only one percentage point, although it still lags behind closed-source models in reasoning capabilities. This advancement is significant for practitioners as it demonstrates the growing competitiveness of open-source models in complex coding scenarios.

The Decoder39 d agofound 29 d ago#zhipu-ai#glm-5.2#coding-marathons

GLM-5.2 is a win for local AI

The GLM-5.2 model has been released with a substantial 753 billion parameters, trained on a corpus of 28.5 trillion tokens, and features a native context window of 1 million tokens with the ability to generate up to 131,072 output tokens. Its architecture allows for significant scaling of VRAM based on quantization levels, with requirements ranging from 176 GB for 1-bit dynamic quantization to 890 GB for FP8 weights. This release is significant for practitioners as it enables the potential for fine-tuning smaller models on GLM-5.2's datasets, promising improvements in local AI applications.

Reddit r/LocalLLaMA39 d agofound 29 d ago#glm-5.2#local-ai#coding-agent

GLM-5.2 is the new leading open weights model on Artificial Analysis

GLM-5.2 has been released as the new leading open weights model on Artificial Analysis. While specific technical details such as model size, benchmark results, and architectural changes were not provided in the content, this release is significant for practitioners as it represents an advancement in open-access language models, potentially enhancing the capabilities and accessibility of AI tools for various applications.

Hacker News40 d agofound 25 d ago#glm#open weights#model

GLM-5.2: Built for Long-Horizon Tasks

GLM-5.2 has been released with a focus on enhancing performance in long-horizon tasks. Key improvements include an updated architecture designed to better manage extended contexts, although specific model size and benchmark results were not detailed. This model's advancements are significant for practitioners working on applications requiring sustained reasoning over longer sequences, potentially improving task completion and user interaction in complex scenarios.

Reddit r/LocalLLaMA40 d agofound 29 d ago#glm-5.2#long-horizon#huggingface

GLM-5.2: Built for Long-Horizon Tasks

The release of GLM-5.2 introduces enhancements tailored for long-horizon tasks, featuring an expanded model size and refined architecture to improve performance in sequential decision-making scenarios. Key updates include optimizations in memory management and inference speed, which are crucial for applications requiring extended context understanding. This model's advancements are significant for practitioners focused on developing AI systems that operate over longer timeframes and complex task sequences.

Hugging Face Blog40 d agofound 25 d ago#glm#model#long-horizon

It looks like Rio 3.5 397B could've simply been a semi-failed embezzling of funding

The article discusses the Rio 3.5 397B model, which was initially claimed to be an advanced version built on Qwen 3.5 with significant enhancements but was later revealed to be merely a merge with the Nex N2 Pro without any additional training. The project's funding of R$500K (approximately $100K USD) has come under scrutiny, as the developers admitted to delivering a subpar result and stated that they would need to start the training process from scratch after the initial model was removed from Hugging Face. This incident highlights the importance of transparency and accountability in AI model development, particularly when public funding is involved.

Reddit r/LocalLLaMA40 d agofound 29 d ago#rio-3.5#model-training

GLM-5.2 (max) is currently the third best model available, across both open and proprietary.

GLM-5.2 (max) has been identified as the third best model available, ranking among both open-source and proprietary options. The specific details regarding model size, architecture changes, or benchmark results are not provided in the content. This ranking is significant for practitioners as it highlights GLM-5.2's competitive performance, suggesting it may be a viable choice for applications requiring high-quality language models.

Reddit r/LocalLLaMA40 d agofound 29 d ago#glm-5.2#performance

Blueprint First, Model Second: A Framework for Deterministic LLM Workflow

The article introduces the \textsc{Source Code Agent} framework, which employs a "Blueprint First, Model Second" approach to enhance the deterministic execution of large language models (LLMs) in structured environments. By separating workflow logic from generative processes, this framework allows for expert-defined Execution Blueprints to guide operations, resulting in a 35.56% pass rate on the TravelPlanner benchmark, a significant improvement over the ATLAS baseline. This architecture notably reduces constraint violations by 96% and improves execution efficiency by 27.1%, making it applicable for reliable autonomous agent deployment in procedural and constraint-heavy tasks.

arXiv cs.AI40 d agofound 28 d ago#llm#workflow#deterministic

SoftMoE: Soft Differentiable Routing for Mixture-of-Experts in LLMs

SoftMoE introduces a soft differentiable routing mechanism for Sparse Mixture-of-Experts (MoE) architectures in large language models, replacing the discrete top-$k$ routing with a truncated soft top-$k$ LapSum relaxation. This innovation allows for gradient-based optimization of expert selection, enabling a flexible allocation of expert capacity across layers while maintaining autoregressive compatibility. Practitioners can leverage SoftMoE to achieve improved performance on language modeling tasks with reduced computational costs, as it activates fewer experts while still meeting a global budget constraint.

arXiv cs.AI40 d agofound 28 d ago#moe#routing

OmniSapiens: A Foundation Model for Social Behavior Processing via Heterogeneity-Aware Relative Policy Optimization

OmniSapiens-7B 2.0 is a newly released foundation model designed for processing social behavior, leveraging Heterogeneity-Aware Relative Policy Optimization to effectively learn from diverse and imbalanced behavioral data. This model achieves superior performance across 10 behavioral tasks and five held-out benchmarks, with improvements of up to +12.02% and +9.37%, respectively, while providing more consistent and interpretable reasoning traces. This advancement is significant for practitioners in AI, as it enhances the ability to develop socially intelligent systems capable of adapting to varied human behaviors and contexts.

arXiv cs.AI40 d agofound 28 d ago#foundation-model#social-behavior#policy-optimization

Human-in-the-Loop Atlas-Based 3D Asset Segmentation for Interactive Content Workflows

The article presents a human-in-the-loop pipeline for generating a segmented 2D parameterized atlas from 3D models, enhancing interactive workflows in media, gaming, and XR. The method employs a greedy set cover strategy to select rendered views and utilizes SAM~2 and Label Studio for interactive segmentation, producing unified segmented atlases for tasks like material assignment and style transfer. The evaluation demonstrated effectiveness across diverse geometries, highlighting areas needing manual correction, which is crucial for practitioners aiming to improve segmentation accuracy in complex 3D assets.

arXiv cs.AI40 d agofound 28 d ago#3D segmentation#interactive workflows

LiveStarPro: Proactive Streaming Video Understanding with Hierarchical Memory for Long-Horizon Streams

LiveStarPro is a new live streaming assistant designed for proactive video understanding, addressing challenges in processing continuous streams and maintaining long-horizon contextual memory. It features three key components: Streaming Verification Decoding (SVeD) for response timing, Streaming Causal Attention Masks (SCAM) for video-language alignment, and Tree-Structured Hierarchical Memory (TSHM) for efficient retrieval of historical data. LiveStarPro demonstrates significant improvements over existing methods, with a 28.9% increase in semantic correctness and an 18.2% reduction in timing error, along with a 1.58x speedup in inference, making it a valuable tool for practitioners working with long-duration video data.

arXiv cs.AI40 d agofound 28 d ago#video understanding#long-horizon streams#video-llms

Discrete Autoregressive Transformer for Generative Mechanism Synthesis

The article presents a Discrete Autoregressive Transformer for synthesizing mechanisms that match specified trajectories, utilizing a dataset of over one million mechanisms. The model employs a decoder-only transformer architecture, integrating a variational autoencoder (VAE) for latent representation and conditional autoregressive sequence modeling, achieving a mean Chamfer distance of 0.0132 and mean dynamic time warping of 0.153 on held-out tests. This approach enables diverse and accurate mechanism generation without the need for dataset lookup, which is significant for practitioners in robotics and mechanical design seeking efficient synthesis of complex mechanisms.

arXiv cs.AI40 d agofound 28 d ago#autoregressive#mechanism synthesis#transformer

RooseBERT: A New Deal For Political Language Modelling

RooseBERT, a novel pre-trained language model specifically designed for political discourse analysis, has been released, trained on an 11GB corpus of political debates and speeches. It has been fine-tuned on various downstream tasks such as stance detection, sentiment analysis, and named entity recognition, demonstrating improved performance over general-purpose language models in these areas. This model addresses the unique challenges of analyzing political language, providing practitioners with a specialized tool for enhancing the understanding of political debates.

arXiv cs.AI40 d agofound 28 d ago#political language#language modeling

Fixed-Point Reasoners: Stable and Adaptive Deep Looped Transformers

The paper introduces the Fixed-Point Reasoning Model (FPRM), a Transformer-based architecture designed to enhance compositional reasoning through looped structures. It employs pre-norm layers and residual scaling to mitigate signal propagation issues, enabling fixed-point convergence as a halting mechanism that adapts computational resources based on task difficulty. FPRM demonstrates effectiveness on reasoning benchmarks such as Sudoku, Maze, state-tracking, and ARC-AGI, highlighting its potential for practitioners focused on improving reasoning capabilities in AI systems.

arXiv cs.AI40 d agofound 29 d ago#transformers#reasoning#architecture#deep learning

GLM-5.2 is now 1st on Design Arena — ahead of the now unavailable Claude Fable 5.

GLM-5.2 has achieved the top position on Design Arena, surpassing the now unavailable Claude Fable 5. While specific model size and benchmark results are not provided, this ranking indicates GLM-5.2's competitive performance in design tasks. Its leading position is significant for practitioners as it highlights advancements in model capabilities, potentially influencing future developments in AI design applications.

Reddit r/LocalLLaMA40 d agofound 29 d ago#glm-5.2#design-arena#claude-fable

GLM-5.2 just dropped open weights and it already looks weirdly strong for coding

GLM-5.2 has been released with open weights under an MIT license, featuring a 1M context window and two reasoning effort modes. Initial benchmarks indicate strong performance in coding tasks, suggesting its potential utility for practitioners looking for robust models that extend beyond typical API offerings. This release may provide a valuable alternative for developers needing high-performance capabilities in coding applications.

Reddit r/LocalLLaMA40 d agofound 34 d ago#glm-5.2#coding#open-weights

Hashicorp founder thinks local models "aren't good ENOUGH yet"

The article discusses a statement from the founder of Hashicorp, who expressed skepticism about the current capabilities of local models, suggesting they are "not good enough yet" for widespread use. This sentiment contrasts with the experiences of developers who have successfully utilized small local models (SLMs) for coding tasks over the past year. The discussion highlights ongoing debates regarding the effectiveness and readiness of local AI models in practical applications, which is crucial for practitioners evaluating model deployment strategies.

Reddit r/LocalLLaMA40 d agofound 34 d ago#local-models#hashicorp#models

GLM-5.2 is the first open-weights model to cross 80% on Terminal-Bench and beats every other open model available

GLM-5.2 has been announced as the first open-weights model to achieve over 80% on the Terminal-Bench benchmark, outperforming all other available open models, including Gemini. This development indicates a significant advancement in model performance at a reduced cost, making it a notable option for practitioners looking to deploy high-performing LLMs without proprietary restrictions.

Reddit r/LocalLLaMA40 d agofound 34 d ago#glm#benchmark#open weights

GLM-5.2 Takes #2 Spot on WebDew Arena

GLM-5.2 has achieved the #2 ranking on the WebDew Arena benchmark. While specific model size and architectural details are not disclosed in the announcement, this ranking indicates competitive performance in natural language processing tasks. This development is significant for practitioners as it highlights GLM-5.2's capabilities, potentially influencing model selection for applications in AI-driven projects.

Reddit r/LocalLLaMA40 d agofound 34 d ago#glm-5.2#webdew#models

A benchmark for tiny LLMs based on a real world problem: natural language file search (using monkeSearch)

A new benchmark for small language models (LLMs) under 3 billion parameters has been introduced through the monkeSearch project, which focuses on natural language file search. The benchmark evaluates models like Gemma-3 (270M), SmolLM2 (360M), and TinyLlama (1.1B) on their ability to parse queries into structured JSON, assessing file type, temporal awareness, and specificity across 80 queries. Initial results indicate that models between 0.8B and 1.5B parameters outperform those below 0.5B, suggesting potential benefits from fine-tuning smaller models for enhanced performance in CPU-inference environments.

Reddit r/LocalLLaMA40 d agofound 33 d ago#llm#benchmark#search

GPT‑NL: a sovereign language model for the Netherlands

The article introduces GPT-NL, a sovereign language model specifically designed for the Dutch language. It features a model size of 1.5 billion parameters and has been fine-tuned on a diverse dataset that includes Dutch literature, news articles, and social media content. This release is significant for practitioners as it provides a tailored solution for natural language processing tasks in Dutch, enhancing the performance of applications in local contexts.

Hacker News40 d agofound 29 d ago#gpt#nl#language model

zai-org/GLM-5.2 is here!

The zai-org/GLM-5.2 model has been released, available on Hugging Face. While specific architectural details and benchmark results are not provided in the article, the update is significant for practitioners as it potentially enhances the performance and capabilities of generative language models, contributing to advancements in local LLM applications.

Reddit r/LocalLLaMA40 d agofound 34 d ago#glm#open weights

Meet Qwen-RobotSuite: Three Embodied AI Models for VLA Manipulation, Video World Modeling, and Navigation

Qwen-RobotSuite introduces three embodied AI models: RobotManip, a Vision-Language-Action model based on Qwen3.5 with 4 billion parameters for manipulation tasks; RobotWorld, a language-conditioned video world model featuring a 60-layer MMDiT architecture; and RobotNav, a navigation model utilizing Qwen3-VL available in sizes of 2B, 4B, and 8B parameters. Each model is detailed with its architecture, data pipelines, and benchmark results, providing valuable insights for practitioners focused on advanced manipulation, video modeling, and navigation in AI applications.

MarkTechPost40 d agofound 33 d ago#embodied#ai#vla

Be wary of Qwen/Claude distillations - they're often worse than the base model

Recent discussions highlight concerns regarding the efficacy of Qwen and Claude distillation models, particularly the "Qwopus" model, which utilizes only 4,000 samples for fine-tuning. This sample size is deemed insufficient for meaningful performance improvements, with evidence suggesting that these distillations often perform worse than their base models, such as Qwen 3.6. Practitioners should critically evaluate these models, as they may introduce coherence issues and not deliver the expected enhancements in capabilities or efficiency.

Reddit r/LocalLLaMA41 d agofound 34 d ago#distillation#qwen#claude

ChatGPT’s market share slips below 50% for first time

ChatGPT's market share has fallen below 50% for the first time, although it continues to be the leading AI assistant with over 1.1 billion monthly users. In comparison, Gemini has 662 million users and Claude has 245 million. This shift highlights the increasing competition in the conversational AI space, which may influence development strategies and user engagement approaches for practitioners.

TechCrunch AI41 d agofound 33 d ago#chatgpt#market share#gemini#claude

PVminerLLM2: Improving Structured Extraction of Patient Voice via Preference Optimization

PVminerLLM2 introduces an enhanced framework for structured extraction of patient-generated text, utilizing preference optimization to mitigate token-critical errors that traditional supervised fine-tuning struggles with. Key innovations include a token-level gated stabilization term and confusion-aware preference pair construction, along with token-importance weighting to address class skew. Benchmark results show PVminerLLM2 surpassing previous models by up to 4.43% in various extraction tasks, highlighting its potential for improving patient-centered outcomes research in AI applications.

arXiv cs.AI41 d agofound 32 d ago#patient voice#extraction#llm

PH-KAN: Port-Hamiltonian Kolmogorov-Arnold Network

The article introduces the Port-Hamiltonian Kolmogorov-Arnold Network (PH-KAN), a novel framework for identifying nonlinear port-Hamiltonian systems using Kolmogorov-Arnold Networks (KANs). This model enhances interpretability by parameterizing key components such as the interconnection and dissipation matrices, as well as the Hamiltonian, while ensuring adherence to port-Hamiltonian constraints. This approach is significant for practitioners as it allows for clearer insights into the learned physical relationships in complex systems, improving the usability of machine learning in physics-informed modeling.

arXiv cs.AI41 d agofound 33 d ago#port-hamiltonian#nonlinear systems#kan

Unifying Acoustic Features and Text with Multimodal LLMs for Neurodegenerative Screening

The paper introduces NeurMLLM, a multimodal generative framework that integrates acoustic features and text for the staging of neurodegenerative diseases like Alzheimer's and Parkinson's. Utilizing vision transformers to encode audio spectrograms and Mel-frequency cepstral coefficients, NeurMLLM combines these representations with transcript and demographic data within a large language model's embedding space. It employs Low-Rank Adaptation for instruction tuning, achieving superior performance on the Bridge2AI-Voice dataset compared to traditional machine learning and existing LLM methods, highlighting its potential for enhancing diagnostic accuracy and accessibility in clinical settings.

arXiv cs.AI41 d agofound 33 d ago#neurodegenerative#multimodal#screening

GeoRoPE: Ground-Aware Rotary Adaptation for Remote Sensing Foundation Models

GeoRoPE is a proposed spatial adaptation method for remote-sensing foundation models (RSFMs) that addresses scale mismatches during downstream adaptation by providing ground-aware positional corrections. It employs Geo-Coordinate Calibration (GCC) to rescale token-grid offsets based on ground distances and Geo-Frequency Calibration (GFC) to adjust RoPE frequencies for scene-specific adaptations. This lightweight adapter enhances cross-resolution robustness and improves scale-sensitive representation learning in RSFMs, making it a significant advancement for practitioners working with heterogeneous remote sensing data.

arXiv cs.AI41 d agofound 33 d ago#remote sensing#foundation models#adaptation

LLMs on Tabular Data with Limited Semantics: Evidence from Industrial Car Retrofit Prediction

The study evaluates the performance of LLM-based strategies against traditional tabular machine learning models for industrial car retrofit prediction, using a dataset of 284,271 vehicles linked to retrofit management. Key findings indicate that while classical tree ensembles outperform LLMs in standalone tasks, embedding features from LLMs (e.g., Amazon Titan) show utility, achieving a binary AUC of 0.982, whereas direct prompting and hashing significantly degrade performance. The results emphasize that LLMs can serve as complementary tools in privacy-sensitive environments, rather than as replacements for established tabular methods.

arXiv cs.AI41 d agofound 32 d ago#tabular data#llm#industrial prediction

Separable Neural Architectures as Physical World Models: from Mathematical Theory to Applications

The article presents the Separable Neural Architecture (SNA), a model that integrates neural approximation with tensor decomposition to effectively solve partial differential equations (PDEs). Utilizing a variational SNA (VSNA) framework, it achieves well-posedness and stability while significantly reducing computational costs, scaling algebraically in high-dimensional settings. The SNA demonstrates impressive performance, executing a 1,000,000-query Monte Carlo sweep in 102 seconds on a standard CPU, offering a 150,000x speedup compared to traditional finite element methods, making it a valuable tool for real-time simulations and optimizations in engineering applications.

arXiv cs.AI41 d agofound 32 d ago#neural architecture#PDE#tensor decomposition