How can I edit the source of HTML in the clipboard?

If I cut some HTML from a Pidgin instant messenger window, I can easily paste it verbatim into a new HTML email in Thunderbird. All formatting (fonts, colors, etc) are preserved, so it appears that my Ubuntu 13.10 desktop clipboard must have the HTML source somewhere.

But I'd like to tweak the HTML source.

How can I actually get at the HTML source when it is in the clipboard? I'd like to just throw it into a text file, work on the markup in Vim or whatever, then use this HTML source in a webpage or feed it to Thunderbird's "Insert → HTML".

Hmm, maybe something like PasteImg (mentioned in Getting a graphic on the clipboard to the disk?), but using request_rich_text() instead of request_image()? I wouldn't mind using a small Python script the rare times I want to get HTML source from the clipboard.

What's in the clipboard may actually be "rich text".

The Python script from this answer outputs

Current clipboard offers formats: ('TIMESTAMP', 'TARGETS', 'MULTIPLE',
'SAVE_TARGETS', 'COMPOUND_TEXT', 'STRING', 'TEXT', 'UTF8_STRING', 'text/html',
'text/plain')

Turns out my Pidgin logs are in HTML, so that's one way to get at this HTML source, bypassing the clipboard entirely. I'm still interested in the answer to the original question (how to retrieve HTML from the clipboard).


Solution 1:

Found it! Here's how to get at the HTML source when there's some on your clipboard:

#!/usr/bin/env python
import gtk
print (gtk.Clipboard().wait_for_contents('text/html')).data

This helped.

This didn't work for me. My callback was never entered.

Solution 2:

I see what you're trying to get at. Try pasting into something that takes wysiwyg, edit it there, then copy paste into Thunderbird?

Maybe bluegriffon or libreoffice writer will work.

Solution 3:

Here's a modification of your script, which actually allows editing the html directly.

It also handles problems with character encoding: if you reply to an e-mail from someone using Windows, chances are the encoding is stuck in UTF-16, which is no good for editing. You may need to install the chardet module in Python.

Replace 'vi' with your text editor of choice, in subprocess.call(....

#!/usr/bin/env python
import gtk
import chardet
import os
import getopt
import subprocess

dtype = 'text/html'
htmlclip = gtk.Clipboard().wait_for_contents(dtype).data
encoding = chardet.detect(htmlclip)['encoding']

# Shove the clipboard to a temporary file
tmpfn = '/tmp/htmlclip_%i' % os.getpid()

with open (tmpfn, 'w') as editfile:
   editfile.write(htmlclip.decode(encoding))

# Manually edit the temporary file
subprocess.call(['vi', tmpfn])

with open (tmpfn, 'r') as editfile:
   htmlclip = editfile.read().encode(encoding)

# Put the modified data back to clipboard
gtk.Clipboard().set_with_data(
      [(dtype,0,0)],
      lambda cb, sd, info, data: sd.set(dtype, 8, htmlclip),
      lambda cb, d: None )
gtk.Clipboard().set_can_store([(dtype,0,0)])
gtk.Clipboard().store()

This does a full edit cycle, modifying the clipboard “in place”.

I use it to make up for Thunderbird's annoying lack of a html-editor feature:

  1. select all with ctrl+a in the message composing window
  2. ctrl+c
  3. run the above script, which opens an editor –
    1. make your changes to the html source
    2. save and quit
  4. ctrl+v in the compose window overwrites the entire content with your html-edited version.