• 0 Posts
  • 116 Comments
Joined 1 year ago
cake
Cake day: July 30th, 2023

help-circle


  • I mean… DX 9, 10, and 11 were all released prior to Nadella being CEO/chairman.

    But in software, it’s very commonplace for library versions not to be backwards compatible without recompiling the software. This isn’t the same thing as being able to open a word doc last saved on a floppy disk in 1997 on Word 365 2024 version, this is about loading executable code. Even core libraries in Linux (like OpenSSL and ncurses) respect this same schema, and more strongly than MS.

    Using OpenSSL as an example, RHEL 7 provides an interface to OpenSSL 1.0. But 1.1 is not available in the core OS, you’d have to install it separately. 1.1 was introduced to the core in RHEL 8, with a compatibility library on a separate package to support 1.0 packages that hadn’t been recompiled against 1.1 yet. In RHEL 9, the same was true of OpenSSL 3 - a compatibility library for 1.1, and 1.0 support fully dropped from core. So no matter which version you use, you still have to install the right library package. That library package will then also have to work on your version of libc - which is often reasonably wide, but it has it limits just the same.

    Edit because I forgot a sentence in the last paragraph - like DirectX, VC++, and OpenGL, you have to match the version of ncurses, OpenSSL, etc exactly to the major (and often the minor) version or else the executable won’t load up and will generate a linking error. Even if you did mangle the binary code to link it, you’d still end up with data corruption or crashes because the library versions are too different to operate.



  • DirectX, OpenGL, Visual C++ Redist and many other support libraries in software programs typically require the same major version of the support libraries that they were shipped with.

    For DirectX, that major version is 9, 10, 11, 12. Any major library change has to be recompiled into the game by the original developer. (Or a very VERY dedicated modder with solid low level knowledge)

    Same goes for OpenGL, except I think they draw the line at the second number as well - 2.0, 3.0, 4.0, 4.1, 4.2, 4.3, 4.4.

    For VC++, these versions come in years - typically you’ll see 2008, 2010, 2013, and the last version 2015-2022 is special. Programs written in the 2013 version or lower only require the latest version of that year to run. For the 2015-2022 library, they didn’t change the major version spec so any program requiring 2015+ can (usually) just use the latest version installed.

    The one library that does weird things to this rule is DXVK and Intel’s older DX9-on-12. These are translation shim libraries that allow the application to speak DX9 etc and translate it on the fly to the commands of a much more modern library - Vulkan in the case of DXVK or DX12 in Intel’s case.

    Edited to remove a reference to 9-on-12 that I think I had backwards.










  • Getting production servers back online with a low level fix is pretty straightforward if you have your backup system taking regular snapshots of pet VMs. Just roll back a few hours. Properly managed cattle, just redeploy the OS and reconnect to data. Physical servers of either type you can either restore a backup (potentially with the IPMI integration so it happens automatically), but you might end up taking hours to restore all data, limited by the bandwidth of your giant spinning rust NAS that is cost cut to only sustain a few parallel recoveries. Or you could spend a few hours with your server techs IPMI booting into safe mode, or write a script that sends reboot commands to the IPMI until the host OS pings back.

    All that stuff can be added to your DR plan, and many companies now are probably planning for such an event. It’s like how the US CDC posted a plan about preparing for the zombie apocalypse to help people think about it, this was a fire drill for a widespread ransomware attack. And we as a world weren’t ready. There’s options, but they often require humans to be helping it along when it’s so widespread.

    The stinger of this event is how many workstations were affected in parallel. First, there do not exist good tools to be able to cover a remote access solution at the firmware level capable of executing power controls over the internet. You have options in an office building for workstations onsite, there are a handful of systems that can do this over existing networks, but more are highly hardware vendor dependent.

    But do you really want to leave PXE enabled on a workstation that will be brought home and rebooted outside of your physical/electronic perimeter? The last few years have showed us that WFH isn’t going away, and those endpoints that exist to roam the world need to be configured in a way that does not leave them easily vulnerable to a low level OS replacement the other 99.99% of the time you aren’t getting crypto’d or receive a bad kernel update.

    Even if you place trust in your users and don’t use a firmware password, do you want an untrained user to be walked blindly over the phone to open the firmware settings, plug into their router’s Ethernet port, and add https://winfix.companyname.com as a custom network boot option without accidentally deleting the windows bootloader? Plus, any system that does that type of check automatically at startup makes itself potentially vulnerable to a network-based attack by a threat actor on a low security network (such as the network of an untrusted employee or a device that falls into the wrong hands). I’m not saying such a system is impossible - but it’s a super huge target for a threat actor to go after and it needs to be ironclad.

    Given all of that, a lot of companies may instead opt that their workstations are cattle, and would simply be re-imaged if they were crypto’d. If all of your data is on the SMB server/OneDrive/Google/Nextcloud/Dropbox/SaaS whatever, and your users are following the rules, you can fix the problem by swapping a user’s laptop - just like the data problem from paragraph one. You just have a team scale issue that your IT team doesn’t have enough members to handle every user having issues at once.

    The reality is there are still going to be applications and use cases that may be critical that don’t support that methodology (as we collectively as IT slowly try to deprecate their use), and that is going to throw a Windows-sized monkey wrench into your DR plan. Do you force your uses to use a VDI solution? Those are pretty dang powerful, but as a Parsec user that has operated their computer from several hundred miles away, you can feel when a responsive application isn’t responding quite fast enough. That VDI system could be recovered via paragraph 1 and just use Chromebooks (or equivalent) that can self-reimage if needed as the thin clients. But would you rather have annoyed users with a slightly less performant system 99.99% of the time or plan for a widespread issue affecting all system the other 0.01%? You’re probably already spending your energy upgrading from legacy apps to make your workstations more like cattle.

    All in trying to get at here with this long winded counterpoint - this isn’t an easy problem to solve. I’d love to see the day that IT shops are valued enough to get the budget they need informed by the local experts, and I won’t deny that “C-suite went to x and came back with a bad idea” exists. In the meantime, I think we’re all going to instead be working on ensuring our update policies have better controls on them.

    As a closing thought - if you audited a vendor that has a product that could get a system back online into low level recovery after this, would you make a budget request for that product? Or does that create the next CrowdStruckOut event? Do you dual-OS your laptops? How far do you go down the rabbit hole of preparing for the low probability? This is what you have to think about - you have to solve enough problems to get your job done, and not everyone is in an industry regulated to have every problem required to be solved. So you solve what you can by order of probability.