Regex - extract all headers from markdown string
I am using gray-matter in order to parse .MD files from the file system into a string. The result the parser produces is a string like this:
\n# Clean-er ReactJS Code - Conditional Rendering\n\n## TL;DR\n\nMove render conditions into appropriately named variables. Abstract the condition logic into a function. This makes the render function code a lot easier to understand, refactor, reuse, test, and think about.\n\n## Introduction\n\nConditional rendering is when a logical operator determines what will be rendered. The following code is from the examples in the official ReactJS documentation. It is one of the simplest examples of conditional rendering that I can think of.\n\n
I am now trying to write a regular expression that would extract all the heading text from the string. Headers in markdown start with a # (there can be from 1-6), and in my case always end with a new line.
I've tried using the following regular expression but calling it on my test string returns nothing:
const testString = "\n# Clean-er ReactJS Code - Conditional Rendering\n\n## TL;DR\n\nMove render conditions into appropriately named variables. Abstract the condition logic into a function. This makes the render function code a lot easier to understand, refactor, reuse, test, and think about.\n\n## Introduction\n\nConditional rendering is when a logical operator determines what will be rendered. The following code is from the examples in the official ReactJS documentation. It is one of the simplest examples of conditional rendering that I can think of.\n\n"
const HEADING_R = /(?<!#)#{1,6} (.*?)(\\r(?:\\n)?|\\n)/gm;
const headings = HEADING_R.exec(content);
console.log('headings: ', headings);
This console logs headings
as null
(no matches found). The result that I am looking for would be: ["# Clean-er ReactJS Code - Conditional Rendering", "## TL;DR", "## Introduction"]
.
I believe the regular expression is wrong, but have no idea why.
Solution 1:
/#{1,6}.+(?=\n)/g
-
#{1,6}
... matches the character#
at least once or as sequence of maximum 6 equal characters. -
.+
matches any character (except for line terminators) at least once and as many times as possible (greedy) -
does so until the positive lookahead
(?=\n)
matches ...- which is ...
\n
... a newline / line-feed.
- which is ...
-
uses the
g
lobal modifier which does match everything.
Edit
Having mentioned
"matches any character (except for line terminators)"
thus a regex like ... /#{1,6}.+/g
... should already do the job (no need for a positive lookahead) for the OP's use case which is ...
"Headers in markdown start with a # (there can be from 1-6), and in my case always end with a new line."
The result that I am looking for would be:
["# Clean-er ReactJS Code - Conditional Rendering", "## TL;DR", "## Introduction"]
.
const testString = `\n# Clean-er ReactJS Code - Conditional Rendering\n\n## TL;DR\n\nMove render conditions into appropriately named variables. Abstract the condition logic into a function. This makes the render function code a lot easier to understand, refactor, reuse, test, and think about.\n\n## Introduction\n\nConditional rendering is when a logical operator determines what will be rendered. The following code is from the examples in the official ReactJS documentation. It is one of the simplest examples of conditional rendering that I can think of.\n\n`;
// see...[https://regex101.com/r/n6XQub/2]
const regXHeader = /#{1,6}.+/g
console.log(
testString.match(regXHeader)
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
Bonus
Refactoring the above regex into e.g. /(?<flag>#{1,6})\s+(?<content>.+)/g
by utilizing named capturing groups alongside with matchAll
and a map
ping task, one could achieve a result like computed by the next provided example code ...
const testString = `\n# Clean-er ReactJS Code - Conditional Rendering\n\n## TL;DR\n\nMove render conditions into appropriately named variables. Abstract the condition logic into a function. This makes the render function code a lot easier to understand, refactor, reuse, test, and think about.\n\n## Introduction\n\nConditional rendering is when a logical operator determines what will be rendered. The following code is from the examples in the official ReactJS documentation. It is one of the simplest examples of conditional rendering that I can think of.\n\n`;
// see...[https://regex101.com/r/n6XQub/4]
const regXHeader = /(?<flag>#{1,6})\s+(?<content>.+)/g
console.log(
Array
.from(
testString.matchAll(regXHeader)
)
.map(({ groups: { flag, content } }) => ({
heading: `h${ flag.length }`,
content,
}))
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
Solution 2:
The issue is that you are using a literal for the regex and should not double escape the backslash, so you can write it as (?<!#)#{1,6} (.*?)(\r(?:\n)?|\n)
You can shorten the pattern capturing what you want and match the trailing newline instead of using a lookbehind assertion.
(#{1,6} .*)\r?\n
Retrieving all capture group 1 values:
const testString = "\n# Clean-er ReactJS Code - Conditional Rendering\n\n## TL;DR\n\nMove render conditions into appropriately named variables. Abstract the condition logic into a function. This makes the render function code a lot easier to understand, refactor, reuse, test, and think about.\n\n## Introduction\n\nConditional rendering is when a logical operator determines what will be rendered. The following code is from the examples in the official ReactJS documentation. It is one of the simplest examples of conditional rendering that I can think of.\n\n"
const HEADING_R = /(#{1,6} .*)\r?\n/g;
const headings = Array.from(testString.matchAll(HEADING_R), m => m[1]);
console.log('headings: ', headings);