Search in html source with GOOGLE? [closed]

I've come across the following resources on my travels (some already mentioned above):

HTML Mark-up-focused search engines

Nerdydata

I'd also like to throw in the following:

Huge, website crawl data archives

Common Crawl - 'years of free web page data to help change the world' (over 250TB+)

How can we analyze this crawl data?

For an idea of how to begin analyzing some of this massive data, take a look at Big Data/Map-reduce-type frameworks(s).

Google lists some ideas on using Apache's Spark project to analyze Common Crawl's dump(s). To understand the file format(s) used by Common Crawl, refer to the following:

So you’re ready to get started [with Common Crawl]
Navigating the WARC file format [by Common Crawl]

The article, Accessing-Common-Crawl-Dataset-on-S3, outlines accessing Common Crawl's 250TB+ dump(s) in a low cost manner without transferring that data load outside of Amazon's AWS/S3 network. Of course, that assumes you are going to use some combination AWS/EC2/S3 etc. to analyze the crawl data.

Finally, Patrick Durusau maintains some interesting Common-Crawl-usage-related blog pages.

Personally, I find this subject intriguing, I suggest we get this crawl data while it's HOT! ;-)

You can try PublicWWW for search in source/mark-up. It allows to find any HTML, JavaScript, CSS and plain text in web page source code on 167+ million websites.

With PublicWWW you can:

Find related websites through the unique HTML codes they share, i.e. widgets & publisher IDs.
Identify sites using certain images or badges.
Find out who else is using your theme.
Identify sites mentioning you.
Find your competitor's affiliates.
Identify sites where your competitors personally collaborate or interact.
References to use a library or a platform.
Find code examples on the net.
Figure out who is using what JS widgets on their sites.
...

Of course you can find not only your websites which use some code/mark-up snippet.

How to rebase after git-subtree add?

How to construct an abstract syntax tree

98% notes hit, 9 note streak, lots of notes go clunk

How do I get the coordinates in the Windows 10 edition of minecraft?

How can I be idolized by NCR, yet attacked by NCR troopers in NCRCF?

(Minecraft Java 1.14.4) How do I continuously test if there is air below an entity? [duplicate]

Joya Seed: How to plant?

Would killing the holdout guard at casinos prevent them disarming me in the future?

If, on the Maple Wolf's 3rd day without killing, a Succubus redirects a MW's attack into a successful kill on a different target, will the MW die?

How do you gift the Battle Pass to other players if you already own the Battle Pass?

Can you get banned for going AFK in Battle Royale?