incompetent half-assing is rarely this morally righteous of an act too, since your one act of barely-competent-enough incompetence is transmuted into endless incompetence by becoming training data/qc feedback

  • folkrav@lemmy.ca
    link
    fedilink
    arrow-up
    6
    arrow-down
    1
    ·
    3 days ago

    That kind of data sanitization is just standard practice. You need some level of confidence on your data’s accuracy, and for anything normally distributed, throwing out obvious outliers is a safe assumption.

    • supersquirrel@sopuli.xyzOP
      link
      fedilink
      arrow-up
      6
      arrow-down
      2
      ·
      edit-2
      3 days ago

      If you cut the outliers out of a dataset of whom 30% are bullshitters who are skilled and motivated to bullshit, that doesn’t magically make the system more accurate it only makes it more precise since bullshitters have been training their whole lives to bullshit in a convincing way (some went to school for it starting at a very young age) and can often present much more authentic than non-bullshitters and honestly it makes me happy that I know big tech thinks the same way you do on this. It is glorious how poorly positioned it makes these much more dangerous bullshitters to respond or anticipate how these systems will naturally decay.

      At a certain point, and it doesn’t surprise me in the least that people who think rigidly along the lines of statistics and automation don’t get this at all, when misinformation is rampant in a system it is often the outliers that are the critical voices of truth.

      If you discard outliers because they are outliers and keep doing it you will get a more refined system precisely because it has gotten better at bullshitting and now everybody always jumps on the bandwagon and meaning collapses into byzantine conformism.

      I take my schadenfreude where I can get it : )

      • folkrav@lemmy.ca
        link
        fedilink
        arrow-up
        1
        arrow-down
        1
        ·
        edit-2
        2 days ago

        We do get what you mean (extremely condescending and reductive take, if you ask me). I was thinking rigidly along the lines of data engineering, as this is, well, a data engineering problem… There just isn’t 30% of people doing this on Google captchas, and this isn’t a “take”, just a reality of the scale and amount of people interacting with Google products. Have fun all you want, you do this, your data most likely gets thrown out, that’s all.

        We’re still talking about image recognition, aren’t we? This feels like a general commentary on how Big Tech sees their customer base, which I don’t disagree with, but in my mind was just another discussion entirely…

        • supersquirrel@sopuli.xyzOP
          link
          fedilink
          arrow-up
          1
          arrow-down
          1
          ·
          edit-2
          2 days ago

          condescending and reductive

          I consider big tech’s relationship and valuing of human beings condescending and reductive, so shrugs don’t come at me I am the powerless one.

          I didn’t light trust and decency on fire, excuse me if sometimes I don’t extend the grace they refuse to extend to all of us.

          There just isn’t 30% of people doing this on Google captchas

          Damn, find me something else people are forced to do for no gain to themselves that isn’t 30% full of bullshitters, what makes people so honest and hardworking when it comes to captchas?