Or about half a year if we’re only counting the time during which I’ve been alive.
Or about half a year if we’re only counting the time during which I’ve been alive.
13.787 ± 0.020 billion years
Why does look like another bot post?
The simlutation terminates.
I’m curious, how do you run the 4x3090s? The FE Cards would be 4x3=12 PCIe slots and 4x16=64 PCIe lanes… Did you nvlink them? What about transient power spikes? Any clock or even VBIOS mods?
I’m also on p2p 2x3090 with 48GB of VRAM. Honestly it’s a nice experience, but still somewhat limiting…
I’m currently running deepseek-r1-distill-llama-70b-awq with the aphrodite engine. Though the same applies for llama-3.3-70b. It works great and is way faster than ollama for example. But my max context is around 22k tokens. More VRAM would allow me more context, even more VRAM would allow for speculative decoding, cuda graphs, …
Maybe I’ll drop down to a 35b model to get more context and a bit of speed. But I don’t really want to justify the possible decrease in answer quality.
Thanks for the writeup! So far I’ve been using ollama, but I’m always open for trying out alternatives. To be honest, it seems I was oblivious to the existence of alternatives.
Your post is suggesting that the same models with the same parameters generate different result when run on different backends?
I can see how the backend would have an influence hanfling concurrent api calls, ram/vram efficiency, supported hardware/drivers and general speed.
But going as far as having different context windows and quality degrading issues is news to me.
Is there an inherent benefit for using NVLINK? Should I specifically try out Aprodite over the other recommendations when having 2x 3090 with NVLINK available?
Please tell ^^
I cleaned my bin.
All that’s left is a symlink: sh -> /nix/store/…
Trivium - Vengeance Falls
This, or slackhq/nebula
I feel like the floating point suggestion would backfire quickly due to imprecisions.
I mean, it’s a hard problem to solve if you never worked with moduli before.
We don’t do that here, we have a: Great Compiler
This work is licensed under CC BY-NC-SA 4.0. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-sa/4.0/
Nix kann meinem Distro-Hoppen einhalt gewaehren.
What would you tell a direct ancestor of yourself, living in the year 2024?
I was in a building that was rebuild after a fighter jet crashed into the one before it…