• Buddahriffic@lemmy.world
    link
    fedilink
    arrow-up
    5
    ·
    7 days ago

    Another angle to try is to set the date one day ahead and see if the bug shows up then. Might need to disconnect from network and set it in the BIOS for the test to work properly.

    I could be wrong, but I figure after being off for an hour, all capacitors should have discharged by then, so it’s probably not based on how long the hardware has been unpowered.

    Though one other angle I just thought of, if you have something that runs periodically, maybe the bug is related to that period being missed once or n times. Or it could be related to something that is meant to wake the computer to run some job and then go back to sleep but instead just sets it in a bad state.

    • ramenshaman@lemmy.world
      link
      fedilink
      arrow-up
      5
      ·
      7 days ago

      The date/time aspect is an interesting thought. For a bit more context, this machine is a Raspberry Pi connected to several other devices, some via USB and some via a CAN network. The system gets powered on manually, the user performs a task, then shuts it down until they need it again. We only use the date/time for logging. The system is connected to our wifi at our facility but after we ship it then it’s likely it will never be connected to the internet except maybe when we’re servicing it and updating code. I don’t think the Pi has a RTC. I don’t really see how the date/time could be causing the issue I’m seeing (seems to be lag in communication with the devices on the CAN network) but I guess stranger things have happened.

      • Buddahriffic@lemmy.world
        link
        fedilink
        arrow-up
        4
        ·
        7 days ago

        Ah that’s interesting. If you can swap the devices from one pi to another, try powering it all up on machine A, then swap the devices to machine B and power that on. Might tell you if the issue is with on the pi side or with the devices.

        Is latency higher on the first boot than on subsequent ones? I’d be looking into race conditions if you’re seeing a bit of lag cascade out into bigger problems. Race conditions are the worst, especially when the race most often goes the right way and just occasionally goes the wrong way. Though you can force the wrong way by adding delays in your code, if you have an idea of where the race is happening.

        • ramenshaman@lemmy.world
          link
          fedilink
          arrow-up
          2
          ·
          7 days ago

          We have 3 theoretically identical systems here and this same issue occurs on 2 of them. The 3rd one… has bigger issues right now. That would be interesting to see what happens if I swap the Pis around but I’d give it >95% chance the same thing happens.

          • Buddahriffic@lemmy.world
            link
            fedilink
            arrow-up
            2
            ·
            7 days ago

            The important bit is to power one on first before the swap, then you’ll have one setup where the pi was recently powered on and another setup where the connected devices were recently powered on. You might see the issue on only one of the devices, at which point you can say if it’s the pi being off for a while or the devices that triggers the issue.

            • ramenshaman@lemmy.world
              link
              fedilink
              arrow-up
              2
              ·
              edit-2
              7 days ago

              Good point. I disabled the internet on both systems so when I come in on Monday hopefully I can confirm whether or not the date/time aspect is a problem. I’ll try this as well.

      • skuzz@discuss.tchncs.de
        link
        fedilink
        arrow-up
        1
        ·
        7 days ago

        Oh, apologies for my suggestion before seeing this comment hahaha!

        CAN devices I have limited experience with, but I know at least in the automotive industry, vehicles often have various CAN devices that have various sleep states. Like, shut car off, it holds brake system for a few minutes and then unlocks the brakes and that ECU shuts down. Later on, an emissions ECU may run a self-diagnostic. After a few days being powered off, the security ECU goes into low power and turns off wireless doorlocks. After the voltage drops too low, the ECU in the head unit ostensibly shuts down, and the next time the car is started, the head unit has to do a cold-reboot and takes a fortnight.

        Could be one of those CAN devices takes some time to get into the “off-adjacent” state to manifest the bug?

        • ramenshaman@lemmy.world
          link
          fedilink
          arrow-up
          1
          ·
          7 days ago

          Largely for safety reasons, anytime the system is turned off power is instantly cut to the entire system. All of our CAN devices boot up much faster than the Pi does. Once the Pi boots, it sets up CAN communication.