Best HashTag Regex

I'm trying to find all the hash tags in a string. The hashtags are from a stream like twitter, they could be anywhere in the text like:

this is a #awesome event, lets use the tag #fun

I'm using the .NET framework (c#), I was thinking this would be a suitable regex pattern to use:

#\w+

Is this the best regex for this purpose?


If you are pulling statuses containing hashtags from Twitter, you no longer need to find them yourself. You can now specify the include_entities parameter to have Twitter automatically call out mentions, links, and hashtags.

For example, take the following call to statuses/show:

http://api.twitter.com/1/statuses/show/60183527282577408.json?include_entities=true

In the resultant JSON, notice the entities object.

"entities":{"urls":[{"expanded_url":null,"indices":[68,88],"url":"http:\/\/bit.ly\/gWZmaJ"}],"user_mentions":[],"hashtags":[{"text":"wordpress","indices":[89,99]}]}

You can use the above to locate the specific entities in the tweet (which occur between the string positions denoted by the indices property) and transform them appropriately.

If you just need the regular expression to locate the hashtags, Twitter provides these in an open source library.

Hashtag Match Pattern

(^|[^&\p{L}\p{M}\p{Nd}_\u200c\u200d\ua67e\u05be\u05f3\u05f4\u309b\u309c\u30a0\u30fb\u3003\u0f0b\u0f0c\u00b7])(#|\uFF03)(?!\uFE0F|\u20E3)([\p{L}\p{M}\p{Nd}_\u200c\u200d\ua67e\u05be\u05f3\u05f4\u309b\u309c\u30a0\u30fb\u3003\u0f0b\u0f0c\u00b7]*[\p{L}\p{M}][\p{L}\p{M}\p{Nd}_\u200c\u200d\ua67e\u05be\u05f3\u05f4\u309b\u309c\u30a0\u30fb\u3003\u0f0b\u0f0c\u00b7]*)

The above pattern can be pieced together from this java file (retrieved 2015-11-23). Validation tests for this pattern are located in this file around line 128.


After looking at the previous answers here and making some test tweets to see what Twitter liked, I think I've come up with a solid regular expression that should do the trick. It requires lookaround functionality in the regular expression engine, so it might not work with all engines out there. It should still work fine for .NET and PCRE.

(?:(?<=\s)|^)#(\w*[A-Za-z_]+\w*)

According to RegexBuddy, this does the following: RegexBuddy Create View

And again, according to RegexBuddy, here is what it matches: RegexBuddy Test View

Anything highlighted is part of the match. The darker highlighted part indicates what is returned from the capture.

Edit Dec 2014:
Here's a slightly simplified version from zero323 that should be functionally equivalent:

(?<=\s|^)#(\w*[A-Za-z_]+\w*)