Regex to match a HTML tag with content

In our previous post, we discussed how to match only html tags but not the content between the tag. Here we discussing slightly different question, how to select HTML tag with the content between the tag. We have tested and given the best possible regex to match this situation.

The regex is as follows,

For example, if you want to match p tag with its content then use the below regex. And to change HTML tag you only need to change p tag to another tag in the regex.

<p>(“[^”]”|'[^’]’|[^’”>])*<\/p>

This regex also accepts the enter key.

Explanation of the Regex

<p> matches the characters <p> literally (case sensitive)

1st Capturing Group (“[^”]”|'[^’]’|[^’”>])** matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy). A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you’re not interested in the data

1st Alternative “[^”]”“ matches the character “ with index 822010 (201C16 or 200348) literally (case sensitive)

Match a single character not present in the list below [^”]” matches the character ” with index 822110 (201D16 or 200358) literally (case sensitive)” matches the character ” with index 822110 (201D16 or 200358) literally (case sensitive)

2nd Alternative ‘[^’]’’ matches the character ‘ with index 3910 (2716 or 478) literally (case sensitive)

Match a single character not present in the list below [^’]


Regex to match all html tags with content

This problem might sounds weird to you, but many people also ask this question, so if you have already found the solution for your question then you can go or read this, you may be need solution for this topic in the future.

The solution for this problem is similar to above problem, we just need a small change in the previous regex to make this regex working for this problem, Here we are,

<(“[^”]”|'[^’]’|[^’”;])*>

We added forward and backward arrows at the starting and ending of the regex so it will select all content between two arrows, ultimately all content between two html tags will be selected.


That’s all for this topic, if you have any problem or question about this topic, then ask in the comment section. You can also ask your new problems related regex here.

Leave a Comment