Hi everyone !
I’m in need for some assistance for string manipulation with sed and regex. I tried a whole day to trial & error and look around the web to find a solution however it’s way over my capabilities and maybe here are some sed/regex gurus who are willing to give me a helping hand !
With everything I gathered around the web, It seems it’s rather a complicated regex and sed substitution, here we go !
What Am I trying to achieve?
I have a lot of markdown guides I want to host on a self-hosted forgejo based git markdown. However the classic markdown links are not the same as one github/forgejo…
Convert the following string:
[Some text](#Header%20Linking%20MARKDOWN.md)
Into
[Some text](#header-linking-markdown.md)
As you can see those are the following requirement:
- Pattern:
[
]( - Only edit what’s between parentheses
- Replace
space (%20)
with-
- Everything as lowercase
- Links are sometimes in nested parentheses
- e.g. (look here
[
) ](
- e.g. (look here
- Do not change a line that begins with
https
(external links)
While everything is probably a bit complex as a whole the trickiest part is probably the nested parentheses :/
What I tried
The furthest I got was the following:
sed -Ei 's|\(([^\)]+)\)|\L&|g' test3.md #make everything between parentheses lowercase
sed -i '/https/ ! s/%20/-/g' test3.md #change every %20 occurrence to -
These sed/regx substitution are what I put together while roaming the web, but it has a lot a flaws and doesn’t work with nested parentheses. Also this would change every %20
occurrence in the file.
The closest solution I found on stackoverflow looks similar but wasn’t able to fit to my needs. Actually my lack of regex/sed understanding makes it impossible to adapt to my requirements.
I would appreciate any help even if a change of tool is needed, however I’m more into a learning processes, so a script or CLI alternative is very appreciated :) actually any help is appreciated :D !
Thanks in advance.
Oh god! I’m sorry about the missing
)
! I must have dropped it when copying things from my notes over to post the comment! (≧▽≦)Despite my error, I’m glad it worked, and even happier that you were able to take what we had worked out and modify it further to fit your other requirements. It’s fun helping each other out, and it’s also great learning.
I learn by problem solving, so I’ve got all my notes from working on this in my knowledge base as well!
In the future, feel free to ping me if you need help with other linux/cli/bash things. As I’ve mentioned before I’m no expert, but happy to help where I can.
Hello :) I promise this is the last time I will bother you (I know what you are going to say :P) ! If it’s not to much could you give me just a few hints on how I could improve a bit the final script?
#! /bin/bash files="/home/USER/projects/test.md" mdlinks="$(grep -Po ']\((?!https).*\)' "$files")" mdlinks2="$(grep -Po '#.*' <<<$mdlinks)" while IFS= read -r line; do #Converts 1.2 to 1-2 (For a third level heading needs to add a supplementary [0-9]) dashlink="$(echo "$line" | sed -r 's|(.+[0-9]+)\.([0-9]+.+\))|\1-\2|')" sed -i "s/$line/${dashlink}/" "$files" #Puts everything to lowercase after a hashtag lowercaselink="$(echo "$dashlink" | sed -r 's|#.+\)|\L&|')" sed -i "s/$dashlink/${lowercaselink}/" "$files" #Removes spaces (%20) from markdown links after a hashtag spacelink="$(echo "$lowercaselink" | sed 's|%20|-|g')" sed -i "s/$lowercaselink/${spacelink}/" "$files" done <<<"$mdlinks2"
This works perfectly en fulfills all my needs (thanks !!) ! However I’m not very fond of the variable string manipulation ($mdlinks2), if you have some tips without spoiling to much, would be great, otherwise it’s okay, it works exactly how I have imagined it and ticks all use cases. Also If you could give some pointer for an overall improvement or if you see something that could potentially create some strange loop or looks off feel free to comment in your spare time :).
Another question which has nothing to do with the post and gets a bit off topic… You gave me the right push I needed and I saw the power and usefulness of proper knowledge with sed/bash/Pearl. It’s time I finally learn a scripting language ! I want to hear your opinion on what tools would you recommend? Most people would say Python for beginners but I heard so much good things about Pearl (Exiftool is a good example of how powerful Pearl can be) but the syntax scares me out a little bit compared to Python.
Any good book material you have in mind for a beginner?
Thanks again for everything !!!