How can we copy text from Wikipedia without the citation parts "[1]", "[2]", "[3]"?

If we copy text from a Wikipedia page, this is roughly what we get:

Sentence spacing is the horizontal space between sentences in typeset text. It is a matter of typographical convention.^[1] Since the introduction of movable-type printing in Europe, various sentence spacing conventions have been used in languages with a Latin-derived alphabet.^[2] These include a normal word space (as between the words in a sentence), a single enlarged space, two full spaces, and, most recently in digital media, no space.^[3] Although modern digital fonts can automatically adjust a single word space to create visually pleasing and consistent spacing following terminal punctuation,^[4] most debate is about whether to strike a keyboard's spacebar once or twice between sentences.^[5]

I do not wish to copy the parts ^[1] and ^[2] etc. This is actually what I wanted to copy:

Sentence spacing is the horizontal space between sentences in typeset text. It is a matter of typographical convention. Since the introduction of movable-type printing in Europe, various sentence spacing conventions have been used in languages with a Latin-derived alphabet. These include a normal word space (as between the words in a sentence), a single enlarged space, two full spaces, and, most recently in digital media, no space. Although modern digital fonts can automatically adjust a single word space to create visually pleasing and consistent spacing following terminal punctuation, most debate is about whether to strike a keyboard's spacebar once or twice between sentences.

The selected answer below uses regex but it doesn't work everytime. (If the actual text itself contains [ and ] the regex shouldn't be removing them.)

Are there better solutions?

Solution 1:

A bookmarklet is your friend...

Create a new browser bookmark and copy the javascript code below into it - when you want to copy some text from wikipedia, just click it beforehand and it'll remove all instances of ^[n] to meet your requirement in the question.

javascript:function a (){document.body.innerHTML=document.body.innerHTML.replace(/<sup\b[^>]*>(.*?)<\/sup>/gi, "" );return;}; a();

Behind the scenes, it's just doing a regular expression search and replace of all <sup>...</sup> HTML tags on the page.

I've just tried this in IE7 and it works fine, so hopefully should be ok in other browsers too.

I'll credit this SO thread with pointing me in the right direction - I knew a bookmarklet was the way to go, but had never written one before.

How can we copy text from Wikipedia without the citation parts "[1]", "[2]", "[3]"?

Solution 1:

Related

Recent Posts