Word Importing HTML "Paragraph" Styles
Hoping someone has an answer. Been working on it on & off for a week.
Goal: Convert an old online blog (for the owner) into 10 Word docs totaling 3200 pages. Each part of each blog has 1 of 5 formats (title, date, type, summary, content). Each part within the docs will have the same style associated with it so it can be tweaked via style sheet only (i.e. change the font size of every title).
My approach so far: I have scraped each blog page and have each part of each page saved in a database. My thought was to export all the text from the database with <SPAN class="...">...</SPAN>
added to each part and saved as a text file. Then import the text file into Word.
I'm almost successful. Word imports the styles adding them to the style gallery & imports the text applying the new style to it.
The problem is the style added to the gallery is a CARACTER style type only. So I can change the font, but not the paragraph. No line spacing, no paragraph spacing, no centering, etc.
No matter what I have tried, I can't get Word to make it a PARAGRAPH style type, which is what I need. I have tried several different tags besides SPAN. I have tried adding a text-align: center
to the class to force Word to see it as a paragraph style, but it's just ignored. I have also tried to define the new style within Word before importing, but then the text is just imported without any style.
Is anyone able to help? If you want to try the import yourself, save the sample code below to a text file then in Word: Insert - Object - Text from File. Thanks!
EDIT: Because of some sample content, I can see that the tag <LI class ="...>
imports as a paragraph style type. So I do know Word is capable of doing it.
<!DOCTYPE html>
<html>
<head>
<style>
.articleTitle{
font-family: Georgia;
font-size: 16pt;
text-align: center;
}
</style>
</head>
<body>
<span class="articleTitle">A few of my favorite fruits</span><p/>
</body>
</html>
What about modifying the styles.xml in Word document after you've created it? If you change to zip, then extract styles.xml, you could change property for your styles to w:type="paragraph". I did a quick test with a blank Word doc, created a character style, then modified the xml, reopened the doc and it was a paragraph style which I could then modify. Obviously only experiment with a copy of your Word doc to see what happens.