Links:How to save html file to PDF

I want  to save html file generated by ASP.NET to PDF.

I was pointed to itextsharp open source project.

I found a few links, discussing how to do it:

http://www.velocityreviews.com/forums/t72716-using-itextsharp-to-generate-pdf-from-aspnet.html

 iTextSharp Tutorial Chapter 7: XML and (X)HTML

 iTextSharp Demo(asp.net 2.0):http://rubypdf.com/itextsharp/tutorial01/ap07Chap0707.cs.html introduces HtmlParser.Parse.(see the source code here)

We tried to use it.

HtmlParser.Parse does NOT throw any error , but the pdf file generated from this could be blank/empty.
Debug output shows the messages from parser, if Html file has invalid structure.

This is a big problem: HtmlParser.Parse is very strict and any minor mistakes in HTML causes exceptions or almost silent creation of empty PDF file.

The post of Creating pdf in .NET from html has a lot of interesting comments, including suggestion  to use HTML Agility Pack.



We are going to try how HtmlParser.Parse will be tolerant to html, regenerated from HTML Agility Pack.

The thread   [ 1819614 ] Error parsing images in HTML files has description of the fix

Another option is always use XML complient HTML, verified by http://validator.w3.org/#validate_by_input ,but it could take some time to tidy up the HTML generated from ASP.NET  



 http://www.google.com.au/search?source=ig&hl=en&rlz=&q=HtmlParser.Parse&meta=

 Links to other products:  

Generate PDF from ASP.NET gives a few references to different products including iTextSharp

 Dynamically Generating PDFs in .NET : http://www.developerfusion.co.uk/show/6623/ 

 Another option is to try (and possibly buy) commercial product abcpdf 

 

I saw a suggestion to use http://www.htmldoc.org/ -the command line version of HTMLDoc to convert HTML to PDF, but it is not good for programmatic access.

 
Advertisements