• VonReposti@feddit.dk
    link
    fedilink
    English
    arrow-up
    19
    arrow-down
    6
    ·
    1 day ago

    Already is, take a look at devstral, qwen3.6, deepseek coder. All can be run on a hugh end GPU and if you’re a developer you likely have one.

    • makeshift0546@lemmy.today
      link
      fedilink
      English
      arrow-up
      28
      arrow-down
      1
      ·
      1 day ago

      The vast majority of users ain’t running anything but 27b max, more likely 14b, and that shit just ain’t nearly as good as older saas models much less dominant like opus. Maybe for small shit but complex talks just ain’t fitting on home hardware.

      • VonReposti@feddit.dk
        link
        fedilink
        English
        arrow-up
        6
        ·
        1 day ago

        Completely agree, I forgot to mention that part. I am testing a few models ranging from 18b to 26b on my 7900xt. It is far from “make this complete system”, but it can handle some smaller tasks. I think that will be the end goal anyway since cloud models fail a lot at maintainability, security, and other higher levels of thought that goes into coding. They can make a convincing prototype but I wouldn’t hook it up to production.

        Local models are already functioning well as a force multiplier. It can help explain logic, do minor refactoring, debugging etc. but with a bit of latency. I do think this is where we’re headed since the frontier models required for generating a full prototype can’t make production quality code and it is prohibitively expensive to do so. As far as I’ve heard, they’re generally running spending ten times as much as they earn per token.

        • partofthevoice@lemmy.zip
          link
          fedilink
          English
          arrow-up
          2
          ·
          21 hours ago

          My guess is the next big thing to come out is, we can probably squeeze a lot more reliability out of smaller models. But their workflows, context, validations, etc will need to be very tightly optimized.

          I can see harnesses coming with their own highly specialized lightweight models in the future. Some for very efficiently converting a basic prompt into chain-of-thought steps. Some for very efficiently determining relevant parts of a repository. Some for… a lot of highly specialized stuff. Then the harness would orchestrate these under the hood, reducing the cognitive load placed on any larger generalized LLMs. Those “larger generalized LLMs” could be something like 12b parameters.

          Hopefully, soon after, we can start benchmarking how much different harnesses and augmentations improve baseline model performance. Ideally, in the long run, with a deeper understanding of how to tailor harness to workload and produce more procedural determinism. Then we can start configuring harnesses like data pipelines and run them through higher-level orchestration like Airflow too.

      • naeap@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        3
        ·
        1 day ago

        Sadly, that’s true

        Tried to refactor a spaghetti code state machine and thought, well, AI should handle this well. All the logic is there, just separate it into small functions to clean up the large one.

        None was able to, alone because of the context window already

        To be fair though, I tried Mistral online and it also stumbled around. ChatGPT was a complete clusterfuck - haven’t tried Claude.

        To be even fairer… it’s a really large state machine, which was written on site during a fever and in stress - so… To defend myself a bit as well, how it even came to that ;⁠-⁠)

        But seems, I’ll need to go through this myself
        Actually thought, that this would be a perfect example for using AI…

        • BeigeAgenda@lemmy.ca
          link
          fedilink
          English
          arrow-up
          4
          ·
          1 day ago

          Yeah LLM’s can help with many tasks but then there are times they just spout nonsense, or syntactically correct nonsense, the model size and context window just changes when they hit their limit.

          Sometimes you have to call it quits, and try another way.

        • GoatSynagogue@lemmy.world
          link
          fedilink
          English
          arrow-up
          5
          ·
          23 hours ago

          Most developers use their work provided machines, which aren’t gaming machines with giant GPUs because again, GPUs don’t help development at all.

        • NotMyOldRedditName@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          23 hours ago

          Also developers often want more ram, and if youre on the mac side, the M series ram works as video ram for loading and running models, so there’s a good chance you can already run something better than is typical of others, and apple is focusing on this by adding more NPUs and increasing memory bandwidth. They arent good at training, but can do inference.

          • partofthevoice@lemmy.zip
            link
            fedilink
            English
            arrow-up
            1
            ·
            21 hours ago

            I’m on a MacBook with M2, 32GB ram. Literally just tried:

            • gemma4:12b - very slow, unworkable
            • qwen3:8b - very slow, unworkable
            • qwen2.5-coder:7b - slow but workable. Doesn’t use tools properly in OpenCode.

            Well, I guess I’ll try again next year.

            For context: my home pc is running gemma4:31b just fine. It’s also a beefy ass desktop, though.

            • fluxx@mander.xyz
              link
              fedilink
              English
              arrow-up
              2
              ·
              8 hours ago

              Are you running an mlx model? If not, try that. My m4 macbook runs qwen3.6-35b-a3b lightning fast. Has its issues, but fast nonetheless.

            • NotMyOldRedditName@lemmy.world
              link
              fedilink
              English
              arrow-up
              3
              ·
              20 hours ago

              You might be doing something wrong, models that size shouldn’t be that slow if properly configured on a 32gb m2

              You need a metal optimized client and model, not the same models you’d run on your desktop machine.