☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.ml

Machine Learning

machinelearning@lemmy.ml

PostsComments

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 2 days ago

The Attention Mechanism Born for Cost Optimization

oilbeater.com

The Attention Mechanism Born for Cost Optimization

oilbeater.com

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 2 days ago

thickertoofan@lemm.ee

thickertoofan@lemm.eeEnglish · 4 days ago

dcdaML - devanagari character detection dataset training framework

github.com

dcdaML - devanagari character detection dataset training framework

github.com

thickertoofan@lemm.eeEnglish · 4 days ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 10 days ago

Neural Graffiti is an experiment in adding a "Spray Layer" to a transformer model, which injects a memory trace into the final stages of inference without finetuning or retraining

github.com

Neural Graffiti is an experiment in adding a "Spray Layer" to a transformer model, which injects a memory trace into the final stages of inference without finetuning or retraining

github.com

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 10 days ago

fubarx@lemmy.world

fubarx@lemmy.world · 17 days ago

Breaking GPT-5 News!

fubarx@lemmy.world · 17 days ago

4Robato@lemmy.world

4Robato@lemmy.worldEnglish · 17 days ago

I want to open source a dataset but I'm not sure what license to use

4Robato@lemmy.worldEnglish · 17 days ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 20 days ago

Why do LLMs make stuff up? New research peers under the hood.

arstechnica.com

Why do LLMs make stuff up? New research peers under the hood.

arstechnica.com

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 20 days ago

oba@lemmy.world

oba@lemmy.worldEnglish · 29 days ago

MLOps tips I gathered recently

www.readyforagents.com

MLOps tips I gathered recently

www.readyforagents.com

oba@lemmy.worldEnglish · 29 days ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 2 months ago

DeepSeek open source DeepEP – library for MoE training and Inference

github.com

DeepSeek open source DeepEP – library for MoE training and Inference

github.com

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 2 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 2 months ago

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

transformer-circuits.pub

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

transformer-circuits.pub

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 2 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 2 months ago

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

transformer-circuits.pub

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

transformer-circuits.pub

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 2 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 3 months ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

arxiv.org

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

arxiv.org

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 3 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 4 months ago

Neurosymbolic AI -- Why, What, and How

arxiv.org

Neurosymbolic AI -- Why, What, and How

arxiv.org

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 4 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 4 months ago

Classical Sorting Algorithms as a Model of Morphogenesis: self-sorting arrays reveal unexpected competencies in a minimal model of basal intelligence

arxiv.org

Classical Sorting Algorithms as a Model of Morphogenesis: self-sorting arrays reveal unexpected competencies in a minimal model of basal intelligence

arxiv.org

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 4 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 4 months ago

Genie 2: A large-scale foundation world model

deepmind.google

Genie 2: A large-scale foundation world model

deepmind.google

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 4 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 5 months ago

A good primer on what to expect running local LLMs

nullprogram.com

A good primer on what to expect running local LLMs

nullprogram.com

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 5 months ago

Shamar@feddit.it

Shamar@feddit.itEnglish · 6 months ago

A community statement supporting the Open Source Definition (OSD)

osd.fyi

A community statement supporting the Open Source Definition (OSD)

osd.fyi

Shamar@feddit.itEnglish · 6 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 7 months ago

How ‘Embeddings’ Encode What Words Mean

www.quantamagazine.org

How ‘Embeddings’ Encode What Words Mean

www.quantamagazine.org

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 7 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 7 months ago

New AI model “learns” how to simulate Super Mario Bros. from video footage

arstechnica.com

New AI model “learns” how to simulate Super Mario Bros. from video footage

arstechnica.com

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 7 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 7 months ago

Reflection 70B holds its own against even the top closed-source models (Claude 3.5 Sonnet, GPT-4o)

huggingface.co

Reflection 70B holds its own against even the top closed-source models (Claude 3.5 Sonnet, GPT-4o)

huggingface.co

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 7 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 8 months ago

It’s Not Intelligent If It Always Halts: A Critical Perspective on Current Approaches to AGI

www.lifeiscomputation.com

It’s Not Intelligent If It Always Halts: A Critical Perspective on Current Approaches to AGI

www.lifeiscomputation.com

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 8 months ago