We’ve used HtmlAgilityPack based function IsValidHtmlFragment(string html) to validate dynamically loaded Html Fragments before inserting into main page(or do not insert if it is invalid) and recently noticed that it doesn’t return false for some fragments with not closed tags.
I decided to find some other tool to call from the code and found that most of HTML parcers are too forgivven(like browsers) or too strict- to check full XHTML conformance.
I am going to add manual procedure to run Tidy.EXE (parameters are listed here)or use Tidy Online to FIX the errors before adding new html into the system, but still want to ensure, that Html with incorrect structure not inserted.
I found an article “C# Validate XHTML” with source code and decided to use it as a start point. Unfortunately there are quite a few things in the original code, that didn’t work as I expected/wanted, so I had to spend much more time to change it that I originally thought. Thanks to Sam Allen for very responsive answers.
I’ve put source code of this class to http://geekswithblogs.net/mnf/archive/2011/06/01/htmlvalidator-class.aspx and consider to add it to codeplex at http://htmlvalidator.codeplex.com/
Other options, that I’ve looked :
http://www.webpronews.com/using-the-api-for-the-wc-html-validator-2006-11 -USELESS, because download link is broken.
http://www.blackbeltcoder.com/Articles/strings/parsing-html-tags-in-c (code project has slightly older version of the same article http://www.codeproject.com/Articles/57176/Parsing-HTML-Tags-in-Csharp.aspx) -does parsing, but not validation.
http://social.msdn.microsoft.com/Forums/en/regexp/thread/6aebedb1-9dc2-468b-9bb4-a1ecda3d0311 How to write a regular expression to validate text input as Html? –
regex isn’t the best solution for parsing HTML.
W3C Markup Validator library in C # http://sourceforge.net/projects/w3cmarkupvalida/
If I understood it correctly, it assumes that file is a valid XML document. Also it has restrictive GNU license.
Free Broken Links Validators
W3C Link Checker
Worried about broken links in your web documents? This online validator from the W3 Consortium is able to recursively check your document for dead links. You simply enter a URL in the form provided, and it will visit your site and check the links.
Doesn’t validates frame pages from frameset, quite slow.e.g my site main page document processed in 264.04 seconds.
Xenu Link Sleuth: Find Broken Links
Xenu is a utility for Windows that checks your web site for broken links. It can work both with a “live” website as well as on a copy of your web site residing on your own hard disk. It’s a favourite of many webmasters for checking broken links on their site.
I haven’t try it yet.