Solution 1 (https://lemmy.ml/post/25346014/16383487)

#! /bin/bash

mdlinks="$(grep -Po ']\((?!https).*\)' "$files")"
mdlinks2="$(grep -Po '#.*' <<<$mdlinks)"

while IFS= read -r line; do
	#Converts 1.2 to 1-2 (For a third level heading needs to add a supplementary [0-9]) 
	dashlink="$(echo "$line" | sed -r 's|(.+[0-9]+)\.([0-9]+.+\))|\1-\2|')"
	sed -i "s/$line/${dashlink}/" "$files"

	#Puts everything to lowercase after a hashtag
	lowercaselink="$(echo "$dashlink" | sed -r 's|#.+\)|\L&|')"
	sed -i "s/$dashlink/${lowercaselink}/" "$files"

	#Removes spaces (%20) from markdown links after a hashtag
	spacelink="$(echo "$lowercaselink" | sed 's|%20|-|g')"
	sed -i "s/$lowercaselink/${spacelink}/" "$files"

done <<<"$mdlinks2"

Solution 2 (https://lemmy.ml/post/25346014/16453351)

sed -E ':l;s/(\[[^]]*\]\()([^)#]*#[^)]*\))/\1\n\2/;Te;H;g;s/\n//;s/\n.*//;x;s/.*\n//;/^https?:/!{:h;s/^([^#]*#[^)]*)(%20|\.)([^)]*\))/\1-\3/;th;s/(#[^)]*\))/\L\1/;};tl;:e;H;z;x;s/\n//;'

Solution 3 (https://lemmy.ml/post/25346014/16453161)

perl -pe 's/\[[^]]+\]\((?!https?)[^#]*#\K[^)]+(?=\))/lc $&=~s:%20|\d\K\.(?=\d):-:gr/ge'

Hi everyone !

I’m in need for some assistance for string manipulation with sed and regex. I tried a whole day to trial & error and look around the web to find a solution however it’s way over my capabilities and maybe here are some sed/regex gurus who are willing to give me a helping hand !

With everything I gathered around the web, It seems it’s rather a complicated regex and sed substitution, here we go !

What Am I trying to achieve?

I have a lot of markdown guides I want to host on a self-hosted forgejo based git markdown. However the classic markdown links are not the same as one github/forgejo…

Convert the following string:

[Some text](#Header%20Linking%20MARKDOWN.md)


[Some text](#header-linking-markdown.md)

As you can see those are the following requirement:

  • Pattern: [Some text](#link%20to%20header.md)
  • Only edit what’s between parentheses
  • Replace space (%20) with -
  • Everything as lowercase
  • Links are sometimes in nested parentheses
    • e.g. (look here [Some text](#link%20to%20header.md))
  • Do not change a line that begins with https (external links)

While everything is probably a bit complex as a whole the trickiest part is probably the nested parentheses :/

What I tried

The furthest I got was the following:

sed -Ei 's|\(([^\)]+)\)|\L&|g' test3.md #make everything between parentheses lowercase

sed -i '/https/ ! s/%20/-/g' test3.md #change every %20 occurrence to -

These sed/regx substitution are what I put together while roaming the web, but it has a lot a flaws and doesn’t work with nested parentheses. Also this would change every %20 occurrence in the file.

The closest solution I found on stackoverflow looks similar but wasn’t able to fit to my needs. Actually my lack of regex/sed understanding makes it impossible to adapt to my requirements.

I would appreciate any help even if a change of tool is needed, however I’m more into a learning processes, so a script or CLI alternative is very appreciated :) actually any help is appreciated :D !

Thanks in advance.

  • N0x0n@lemmy.mlOP
    1 month ago

    Second, YOU MISSED A DAMNED parentheses you fool xD ! mdlinks="$(grep -Po ']\((?!https).*\)' ~/mkdn)" Took me some time to figured it out with a very non informative error bashscript.sh: line 8: unexpected EOF while looking for matching "' but as expected it works !

    Second, YOU MISSED A DAMNED parentheses you fool xD ! mdlinks="$(grep -Po ']\((?!https).*\)' ~/mkdn)" Took me some time to figured it out with a very non informative error bashscript.sh: line 8: unexpected EOF while looking for matching "' but as expected it works !

    [Just a test](#Just%20a%20test.md)
    [Just a link](https://mylink/%20with%20space.com)
    [Just a test](#Just-a-test.md)
    [Just a link](https://mylink/%20with%20space.com)

    Next to show you my appreciation and not to take everything for granted and being spoon feed for everything, I tried to find a solution myself for something else, I will try to explain the best I can how I solved it.

    [Just a test](Another%20markdown%20file.md#Hello%20World)
    [Just a test](Another%20markdown%20file.md#hello-world)

    The part before the hashtag needs to keep it’s initial form (it links to the original markdown file). So, because just playing around with Pearl and regex (which doesn’t end well doing this blindly without the proper knowledge) I did some simple string manipulation. It’s not very elegant but does the trick, thankfully to your well written breakdown.

    • I printed out the $mdlinks variable just to see what it prints out
    • Copied and changed your Pearl/regex to find the first hashtag (#) and save it into a new variable ($mdlinks2)
    • Feed your $mdlinks variable into my new Pearl/regex
    • Feed my new variable into done? (I’m a bit confused here but okay xD)
    #! /bin/bash
    mdlinks="$(grep -Po ']\((?!https).*\)' "/home/dany/newtest.md")"
    echo $mdlinks
    mdlinks2="$(grep -Po '#.*' <<<$mdlinks)"
    echo $mdlinks2
    while IFS= read -r line; do
    	dashlink="$(echo "$line" | sed 's|%20|-|g')"
    	sed -i "s/$line/${dashlink}/" "/home/dany/newtest.md"
    done <<<"$mdlinks2"

    Yes, not very elegant but It’s the best I could do currently :/ However, I still got a YES effect :P

    To answer your question:

    Quick question as I’m working on this, in the new link example, is the BDMV and other capitalized text in this link supposed to be converted to lowercase, or to remain uppercase?

    As you can see in my string manipulation above, the part before the # needs to keep it’s original form :) (Sorry wasn’t aware of this before working with the original files) I solved it with some string manipulation as shown above.

    I’m a bit tired from all this searching/trail&error, tomorrow I will try to wrap everything up and answer your post below :) ! Also, I need to clean up the mess I made in my home directory xD.

    Thanks again for your help ! Have a good night/day !

    • harsh3466@lemmy.ml
      1 month ago

      Oh god! I’m sorry about the missing )! I must have dropped it when copying things from my notes over to post the comment! (≧▽≦)

      Despite my error, I’m glad it worked, and even happier that you were able to take what we had worked out and modify it further to fit your other requirements. It’s fun helping each other out, and it’s also great learning.

      I learn by problem solving, so I’ve got all my notes from working on this in my knowledge base as well!

      In the future, feel free to ping me if you need help with other linux/cli/bash things. As I’ve mentioned before I’m no expert, but happy to help where I can.

      • N0x0n@lemmy.mlOP
        1 month ago

        Hello :) I promise this is the last time I will bother you (I know what you are going to say :P) ! If it’s not to much could you give me just a few hints on how I could improve a bit the final script?

        #! /bin/bash
        mdlinks="$(grep -Po ']\((?!https).*\)' "$files")"
        mdlinks2="$(grep -Po '#.*' <<<$mdlinks)"
        while IFS= read -r line; do
        	#Converts 1.2 to 1-2 (For a third level heading needs to add a supplementary [0-9]) 
        	dashlink="$(echo "$line" | sed -r 's|(.+[0-9]+)\.([0-9]+.+\))|\1-\2|')"
        	sed -i "s/$line/${dashlink}/" "$files"
        	#Puts everything to lowercase after a hashtag
        	lowercaselink="$(echo "$dashlink" | sed -r 's|#.+\)|\L&|')"
        	sed -i "s/$dashlink/${lowercaselink}/" "$files"
        	#Removes spaces (%20) from markdown links after a hashtag
        	spacelink="$(echo "$lowercaselink" | sed 's|%20|-|g')"
        	sed -i "s/$lowercaselink/${spacelink}/" "$files"
        done <<<"$mdlinks2"

        This works perfectly en fulfills all my needs (thanks !!) ! However I’m not very fond of the variable string manipulation ($mdlinks2), if you have some tips without spoiling to much, would be great, otherwise it’s okay, it works exactly how I have imagined it and ticks all use cases. Also If you could give some pointer for an overall improvement or if you see something that could potentially create some strange loop or looks off feel free to comment in your spare time :).

        Another question which has nothing to do with the post and gets a bit off topic… You gave me the right push I needed and I saw the power and usefulness of proper knowledge with sed/bash/Pearl. It’s time I finally learn a scripting language ! I want to hear your opinion on what tools would you recommend? Most people would say Python for beginners but I heard so much good things about Pearl (Exiftool is a good example of how powerful Pearl can be) but the syntax scares me out a little bit compared to Python.

        Any good book material you have in mind for a beginner?

        Thanks again for everything !!!

        • harsh3466@lemmy.ml
          30 days ago

          Hello! I will take a look at it, I just haven’t had a chance over the last day. Give me a couple days and I will give some feedback. Bear in mind I am not an expert, so I might not have much to offer, but I’ll share what I can. :)

          • N0x0n@lemmy.mlOP
            29 days ago

            Hey take your time :) Don’t worry even if you forget, you did more than enough to help some random on the web ! 2 other users came up with a plain/bare bone regex solution if you want to have a look and maybe there’s something you can learn out of it? (I doubt it xD).

            Plain sed regex (https://lemmy.ml/post/25346014/16453351)

            sed -E ':l;s/(\[[^]]*\]\()([^)#]*#[^)]*\))/\1\n\2/;Te;H;g;s/\n//;s/\n.*//;x;s/.*\n//;/^https?:/!{:h;s/^([^#]*#[^)]*)(%20|\.)([^)]*\))/\1-\3/;th;s/(#[^)]*\))/\L\1/;};tl;:e;H;z;x;s/\n//;'

            Plain Pearl regex (https://lemmy.ml/post/25346014/16453161)

            perl -pe 's/\[[^]]+\]\((?!https?)[^#]*#\K[^)]+(?=\))/lc $&=~s:%20|\d\K\.(?=\d):-:gr/ge'

            Nonetheless, I really prefere your solution because as someone else said I will have an easier time to change a script I “understand”. Soo thanks again !