Wondering about services to test on either a 16gb ram “AI Capable” arm64 board or on a laptop with modern rtx. Only looking for open source options, but curious to hear what people say. Cheers!

  • L_Acacia@lemmy.ml
    link
    fedilink
    English
    arrow-up
    1
    ·
    22 hours ago

    Well they are fully closed source except for the open source project they are a wrapper on. The open source part is llama.cpp

    • ikidd@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      22 hours ago

      Fair enough, but it’s damn handy and simple to use. And I don’t know how to do speculative decoding with ollama, which massively speeds up the models for me.

      • L_Acacia@lemmy.ml
        link
        fedilink
        English
        arrow-up
        1
        ·
        22 hours ago

        Their software is pretty nice. That’s what I’d recommand to someone who doesn’t want to tinker. It’s just a shame they don’t want to open source their software and we have to reinvent the wheel 10 times. If you are willing to tinker a bit koboldcpp + openewebui/librechat is a pretty nice combo.

        • ikidd@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          18 hours ago

          That koboldcpp is pretty interesting. Looks like I can load a draft model for spec decode as well as a pile of other things.

          What local models have you been using for coding? I’ve been disappointed with things like deepseek-coder and the qwen-coder, it’s not even a patch on Claude, but that damn cost for anthropic has been killing me.

          • L_Acacia@lemmy.ml
            link
            fedilink
            English
            arrow-up
            1
            ·
            5 hours ago

            As much as I’d like to praise the open-weight models. Nothing comes close to Claude sonnet in my experience too. I use local models when info are sensitive and claude when the problem requires being somewhat competent.

            What setup do you use for coding? I might have a tip for minimizing claude cost you depending on what your setup is.