Let’s work hard

We are going to extract coordinates from a string like this

COORDINATES(((1 2,3 4)),((5 6,7 8,9 10)))

The desired output will be:

((1 2, 3 4))
((5 6,7 8,9 10))

So, the expression for finding these values will be:

$ grep -Po "\(\([0-9]{1,}\s{1}[0-9]{1,}(?:,[0-9]{1,}\s{1}[0-9]{1,}){1,}\)\)" file.txt
((1 2,3 4))
((5 6,7 8,9 10))

OMG, what it is? Let’s breath and understand what we did

\(\(            : two parenthesis opening the expression, since ( is a metacharacter we need to scape them using \
[0-9]{1,} : 1 or more numbers from 0 to 9
\s{1} : a blank space
(?:REGEX){1,} : 1 or more expressions (REGEX is the same that openning expression explained above, quite simple!!! )
\)\) : two parenthesis closing the expression, since ) is a metacharacter we need to scape them using \

Now, we are going to make it a little bit difficult, let’s say numbers expressing the coordinates can be signal decimals like: -4.875 or 9.669 Let’s use this string:

COORDINATES(((-102.084546466 26.45688853338312,-102.1490725018084 26.13520439439342,-102.15047246071138 26.1264975143505,-102.14981952668641 26.1384657784035,-102.12671666102124 26.149049533554354,-102.1188178 26.152923456807,-102.10091893897877 26.149049533554354,-102.0878160633126 26.1384657784035,-102.08653386170432 26.13460053338312)),((-102.15271126918533 26.129748541924556,-102.14764154329896 26.12984171790165,-102.14755705371499 26.13408877990621,-102.14815585687757 26.13514319847587,-102.13090187683702 26.13494986238747,-102.10652676348738 26.13471792646315,-102.0865197014023 26.13455784661961,-102.08302007795753 26.1240081,-102.0878160633126 26.109550421596495,-102.10091893897877 26.09896666644564,-102.1188178 26.09509274319299,-102.12671666102124 26.09896666644564,-102.14981952668641 26.109550421596495,-102.15461552204248 26.1240081,-102.15271126918533 26.129748541924556))) 

The desired output will be:

((-102.084546466 26.45688853338312,-102.1490725018084 26.13520439439342,-102.15047246071138 26.1264975143505,-102.14981952668641 26.1384657784035,-102.12671666102124 26.149049533554354,-102.1188178 26.152923456807,-102.10091893897877 26.149049533554354,-102.0878160633126 26.1384657784035,-102.08653386170432 26.13460053338312))
((-102.15271126918533 26.129748541924556,-102.14764154329896 26.12984171790165,-102.14755705371499 26.13408877990621,-102.14815585687757 26.13514319847587,-102.13090187683702 26.13494986238747,-102.10652676348738 26.13471792646315,-102.0865197014023 26.13455784661961,-102.08302007795753 26.1240081,-102.0878160633126 26.109550421596495,-102.10091893897877 26.09896666644564,-102.1188178 26.09509274319299,-102.12671666102124 26.09896666644564,-102.14981952668641 26.109550421596495,-102.15461552204248 26.1240081,-102.15271126918533 26.129748541924556))

So, the expression for finding these values will be:

grep -Po "\(\([-+]?\d+(\.\d+)?${1,}\s{1}[-+]?\d+(\.\d+)?${1,}(?:,[-+]?\d+(\.\d+)?${1,}\s{1}[-+]?\d+(\.\d+)?${1,}){1,}\)\)" file.txt
((-102.084546466 26.45688853338312,-102.1490725018084 26.13520439439342,-102.15047246071138 26.1264975143505,-102.14981952668641 26.1384657784035,-102.12671666102124 26.149049533554354,-102.1188178 26.152923456807,-102.10091893897877 26.149049533554354,-102.0878160633126 26.1384657784035,-102.08653386170432 26.13460053338312))
((-102.15271126918533 26.129748541924556,-102.14764154329896 26.12984171790165,-102.14755705371499 26.13408877990621,-102.14815585687757 26.13514319847587,-102.13090187683702 26.13494986238747,-102.10652676348738 26.13471792646315,-102.0865197014023 26.13455784661961,-102.08302007795753 26.1240081,-102.0878160633126 26.109550421596495,-102.10091893897877 26.09896666644564,-102.1188178 26.09509274319299,-102.12671666102124 26.09896666644564,-102.14981952668641 26.109550421596495,-102.15461552204248 26.1240081,-102.15271126918533 26.129748541924556))

OMG, what it is? Don’t cry please, we only changed the last expression replacing [0-9] by [-+]?\d+(.\d+)?$, that’s it.

If you like this post please pay me with a click on the ads :)