Calling wkhtmltopdf to generate PDF from HTML
I'm attempting to create a PDF file from an HTML file. After looking around a little I've found: wkhtmltopdf to be perfect. I need to call this .exe from the ASP.NET server. I've attempted:
Process p = new Process();
p.StartInfo.UseShellExecute = false;
p.StartInfo.FileName = HttpContext.Current.Server.MapPath("wkhtmltopdf.exe");
p.StartInfo.Arguments = "TestPDF.htm TestPDF.pdf";
p.Start();
p.WaitForExit();
With no success of any files being created on the server. Can anyone give me a pointer in the right direction? I put the wkhtmltopdf.exe file at the top level directory of the site. Is there anywhere else it should be held?
Edit: If anyone has better solutions to dynamically create pdf files from html, please let me know.
Update:
My answer below, creates the pdf file on the disk. I then streamed that file to the users browser as a download. Consider using something like Hath's answer below to get wkhtml2pdf to output to a stream instead and then send that directly to the user - that will bypass lots of issues with file permissions etc.
My original answer:
Make sure you've specified an output path for the PDF that is writeable by the ASP.NET process of IIS running on your server (usually NETWORK_SERVICE I think).
Mine looks like this (and it works):
/// <summary>
/// Convert Html page at a given URL to a PDF file using open-source tool wkhtml2pdf
/// </summary>
/// <param name="Url"></param>
/// <param name="outputFilename"></param>
/// <returns></returns>
public static bool HtmlToPdf(string Url, string outputFilename)
{
// assemble destination PDF file name
string filename = ConfigurationManager.AppSettings["ExportFilePath"] + "\\" + outputFilename + ".pdf";
// get proj no for header
Project project = new Project(int.Parse(outputFilename));
var p = new System.Diagnostics.Process();
p.StartInfo.FileName = ConfigurationManager.AppSettings["HtmlToPdfExePath"];
string switches = "--print-media-type ";
switches += "--margin-top 4mm --margin-bottom 4mm --margin-right 0mm --margin-left 0mm ";
switches += "--page-size A4 ";
switches += "--no-background ";
switches += "--redirect-delay 100";
p.StartInfo.Arguments = switches + " " + Url + " " + filename;
p.StartInfo.UseShellExecute = false; // needs to be false in order to redirect output
p.StartInfo.RedirectStandardOutput = true;
p.StartInfo.RedirectStandardError = true;
p.StartInfo.RedirectStandardInput = true; // redirect all 3, as it should be all 3 or none
p.StartInfo.WorkingDirectory = StripFilenameFromFullPath(p.StartInfo.FileName);
p.Start();
// read the output here...
string output = p.StandardOutput.ReadToEnd();
// ...then wait n milliseconds for exit (as after exit, it can't read the output)
p.WaitForExit(60000);
// read the exit code, close process
int returnCode = p.ExitCode;
p.Close();
// if 0 or 2, it worked (not sure about other values, I want a better way to confirm this)
return (returnCode == 0 || returnCode == 2);
}
I had the same problem when i tried using msmq with a windows service but it was very slow for some reason. (the process part).
This is what finally worked:
private void DoDownload()
{
var url = Request.Url.GetLeftPart(UriPartial.Authority) + "/CPCDownload.aspx?IsPDF=False?UserID=" + this.CurrentUser.UserID.ToString();
var file = WKHtmlToPdf(url);
if (file != null)
{
Response.ContentType = "Application/pdf";
Response.BinaryWrite(file);
Response.End();
}
}
public byte[] WKHtmlToPdf(string url)
{
var fileName = " - ";
var wkhtmlDir = "C:\\Program Files\\wkhtmltopdf\\";
var wkhtml = "C:\\Program Files\\wkhtmltopdf\\wkhtmltopdf.exe";
var p = new Process();
p.StartInfo.CreateNoWindow = true;
p.StartInfo.RedirectStandardOutput = true;
p.StartInfo.RedirectStandardError = true;
p.StartInfo.RedirectStandardInput = true;
p.StartInfo.UseShellExecute = false;
p.StartInfo.FileName = wkhtml;
p.StartInfo.WorkingDirectory = wkhtmlDir;
string switches = "";
switches += "--print-media-type ";
switches += "--margin-top 10mm --margin-bottom 10mm --margin-right 10mm --margin-left 10mm ";
switches += "--page-size Letter ";
p.StartInfo.Arguments = switches + " " + url + " " + fileName;
p.Start();
//read output
byte[] buffer = new byte[32768];
byte[] file;
using(var ms = new MemoryStream())
{
while(true)
{
int read = p.StandardOutput.BaseStream.Read(buffer, 0,buffer.Length);
if(read <=0)
{
break;
}
ms.Write(buffer, 0, read);
}
file = ms.ToArray();
}
// wait or exit
p.WaitForExit(60000);
// read the exit code, close process
int returnCode = p.ExitCode;
p.Close();
return returnCode == 0 ? file : null;
}
Thanks Graham Ambrose and everyone else.
OK, so this is an old question, but an excellent one. And since I did not find a good answer, I made my own :) Also, I've posted this super simple project to GitHub.
Here is some sample code:
var pdfData = HtmlToXConverter.ConvertToPdf("<h1>SOO COOL!</h1>");
Here are some key points:
- No P/Invoke
- No creating of a new process
- No file-system (all in RAM)
- Native .NET DLL with intellisense, etc.
- Ability to generate a PDF or PNG (
HtmlToXConverter.ConvertToPng
)
Check out the C# wrapper library (using P/Invoke) for the wkhtmltopdf library: https://github.com/pruiz/WkHtmlToXSharp
There are many reason why this is generally a bad idea. How are you going to control the executables that get spawned off but end up living on in memory if there is a crash? What about denial-of-service attacks, or if something malicious gets into TestPDF.htm?
My understanding is that the ASP.NET user account will not have the rights to logon locally. It also needs to have the correct file permissions to access the executable and to write to the file system. You need to edit the local security policy and let the ASP.NET user account (maybe ASPNET) logon locally (it may be in the deny list by default). Then you need to edit the permissions on the NTFS filesystem for the other files. If you are in a shared hosting environment it may be impossible to apply the configuration you need.
The best way to use an external executable like this is to queue jobs from the ASP.NET code and have some sort of service monitor the queue. If you do this you will protect yourself from all sorts of bad things happening. The maintenance issues with changing the user account are not worth the effort in my opinion, and whilst setting up a service or scheduled job is a pain, its just a better design. The ASP.NET page should poll a result queue for the output and you can present the user with a wait page. This is acceptable in most cases.