Replace image in word doc using OpenXML
Solution 1:
Although the documentation for OpenXML isn't great, there is an excellent tool that you can use to see how existing Word documents are built. If you install the OpenXml SDK it comes with the DocumentReflector.exe tool under the Open XML Format SDK\V2.0\tools directory.
Images in Word documents consist of the image data and an ID that is assigned to it that is referenced in the body of the document. It seems like your problem can be broken down into two parts: finding the ID of the image in the document, and then re-writing the image data for it.
To find the ID of the image, you'll need to parse the MainDocumentPart. Images are stored in Runs as a Drawing element
<w:p>
<w:r>
<w:drawing>
<wp:inline>
<wp:extent cx="3200400" cy="704850" /> <!-- describes the size of the image -->
<wp:docPr id="2" name="Picture 1" descr="filename.JPG" />
<a:graphic>
<a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture">
<pic:pic>
<pic:nvPicPr>
<pic:cNvPr id="0" name="filename.JPG" />
<pic:cNvPicPr />
</pic:nvPicPr>
<pic:blipFill>
<a:blip r:embed="rId5" /> <!-- this is the ID you need to find -->
<a:stretch>
<a:fillRect />
</a:stretch>
</pic:blipFill>
<pic:spPr>
<a:xfrm>
<a:ext cx="3200400" cy="704850" />
</a:xfrm>
<a:prstGeom prst="rect" />
</pic:spPr>
</pic:pic>
</a:graphicData>
</a:graphic>
</wp:inline>
</w:drawing>
</w:r>
</w:p>
In the above example, you need to find the ID of the image stored in the blip element. How you go about finding that is dependent on your problem, but if you know the filename of the original image you can look at the docPr element:
using (WordprocessingDocument document = WordprocessingDocument.Open("docfilename.docx", true)) {
// go through the document and pull out the inline image elements
IEnumerable<Inline> imageElements = from run in Document.MainDocumentPart.Document.Descendants<Run>()
where run.Descendants<Inline>().First() != null
select run.Descendants<Inline>().First();
// select the image that has the correct filename (chooses the first if there are many)
Inline selectedImage = (from image in imageElements
where (image.DocProperties != null &&
image.DocProperties.Equals("image filename"))
select image).First();
// get the ID from the inline element
string imageId = "default value";
Blip blipElement = selectedImage.Descendants<Blip>().First();
if (blipElement != null) {
imageId = blipElement.Embed.Value;
}
}
Then when you have the image ID, you can use that to rewrite the image data. I think this is how you would do it:
ImagePart imagePart = (ImagePart)document.MainDocumentPart.GetPartById(imageId);
byte[] imageBytes = File.ReadAllBytes("new_image.jpg");
BinaryWriter writer = new BinaryWriter(imagePart.GetStream());
writer.Write(imageBytes);
writer.Close();
Solution 2:
I'd like to update this thread and add to Adam's answer above for the benefit of others.
I actually managed to hack some working code together the other day, (before Adam posted his answer) but it was pretty difficult. The documentation really is poor and there isn't a lot of info out there.
I didn't know about the Inline
and Run
elements which Adam uses in his answer, but the trick seems to be in getting to the Descendants<>
property and then you can pretty much parse any element like a normal XML mapping.
byte[] docBytes = File.ReadAllBytes(_myFilePath);
using (MemoryStream ms = new MemoryStream())
{
ms.Write(docBytes, 0, docBytes.Length);
using (WordprocessingDocument wpdoc = WordprocessingDocument.Open(ms, true))
{
MainDocumentPart mainPart = wpdoc.MainDocumentPart;
Document doc = mainPart.Document;
// now you can use doc.Descendants<T>()
}
}
Once you've got this it's fairly easy to search for things, although you have to work out what everything is called. For example, the <pic:nvPicPr>
is Picture.NonVisualPictureProperties
, etc.
As Adam correctly says, the element you need to find to replace the image is the Blip
element. But you need to find the correct blip which corresponds to the image you're trying to replace.
Adam shows a way using the Inline
element. I just dived straight in and looked for all the picture elements. I'm not sure which is the better or more robust way (I don't know how consistent the xml structure is between documents and if this cause breaking code).
Blip GetBlipForPicture(string picName, Document document)
{
return document.Descendants<Picture>()
.Where(p => picName == p.NonVisualPictureProperties.NonVisualDrawingProperties.Name)
.Select(p => p.BlipFill.Blip)
.Single(); // return First or ToList or whatever here, there can be more than one
}
See Adam's XML example to make sense of the different elements here and see what I'm searching for.
The blip has an ID in the Embed
property, eg: <a:blip r:embed="rId4" cstate="print" />
, what this does is map the Blip to an image in the Media folder (you can see all these folders and files if you rename you .docx to a .zip and unzip it). You can find the mapping in _rels\document.xml.rels
:
<Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="media/image1.png" />
So what you need to do is add a new image, and then point this blip at the id of your newly created image:
// add new ImagePart
ImagePart newImg = mainPart.AddImagePart(ImagePartType.Png);
// Put image data into the ImagePart (from a filestream)
newImg .FeedData(File.Open(_myImgPath, FileMode.Open, FileAccess.Read));
// Get the blip
Blip blip = GetBlipForPicture("MyPlaceholder.png", doc);
// Point blip at new image
blip.Embed = mainPart.GetIdOfPart(newImg);
I presume this just orphans the old image in the Media folder which isn't ideal, although maybe it's clever enough to garbage collect it so to speak. There may be a better way to do it, but I couldn't find it.
Anyway, there you have it. This thread is now the most complete documentation on how to swap an image anywhere on the web (I know this, I spent hours searching). So hopefully some people will find it useful.