• 0 Posts
  • 419 Comments
Joined 2 years ago
cake
Cake day: June 13th, 2023

help-circle



  • Assuming we had proper bike infrastructure(which we don’t); you’d be hard pressed to top the speed a car can go, and you would still have to stop frequently at lights, just like a car

    Here it’s the exact opposite. There is no way a car can keep up with a bike in the city. Let’s say I wanted to go to the city center by car, which is about 2 kilometers. I would encounter 5 traffic lights just in that short drive. On a working day it would be slow, on a Saturday? Forget it. It would probably be faster to walk. Alternatively, I could go by bike and encounter exactly zero traffic lights. I would ride from my house to the bicycle highway (few hundred meters), and from there it’s an uninterrupted route to the city center. It’s a completely separate path and there are bridges crossing all major roads. Near the city center it turns into a shared space where bicycles have priority over cars. The city center isn’t accessible by car at all, so if you go by car you have to park your car at the edge of the city center (paid) and walk the rest. By contrast, I can cycle right up to any store and park my bicycle right in front of it.









  • Yes, there are massive advantages. It’s basically what makes unified memory possible on modern Macs. Especially with all the interest in AI nowadays, you really don’t want a machine with a discrete GPU/VRAM, a discrete NPU, etc.

    Take for example a modern high-end PC with an RTX 4090. Those only have 24GB VRAM and that VRAM is only accessible through the (relatively slow) PCIe bus. AI models can get really big, and 24GB can be too little for the bigger models. You can spec an M2 Ultra with 192GB RAM and almost all of it is accessible by the GPU directly. Even better, the GPU can access that without any need for copying data back and forth over the PCIe bus, so literally 0 overhead.

    The advantages of this multiply when you have more dedicated silicon. For example: if you have an NPU, that can use the same memory pool and access the same shared data as the CPU and GPU with no overhead. The M series also have dedicated video encoder/decoder hardware, which again can access the unified memory with zero overhead.

    For example: you could have an application that replaces the background on a video using AI. It takes a video, decompresses it using the video decoder , the decompressed video frames are immediately available to all other components. The GPU can then be used to pre-process the frames, the NPU can use the processed frames as input to some AI model and generate a new frame and the video encoder can immediately access that result and compress it into a new video file.

    The overhead of just copying data for such an operation on a system with non-unified memory would be huge. That’s why I think that the AI revolution is going to be one of the driving factors in killing systems with non-unified memory architectures, at least for end-user devices.