I’ve tested all exploits I know on HTML Purifier and it did very well. It filters not only HTML, but also CSS and URLs.
Parsing of URLs may be tricky, e.g. these are valid: http://spoof.com:email@example.com or //evil.com. Internationalized domains (IDN) can be written in two ways – Unicode and punycode.
Go with HTML Purifier – it has most of these worked out. If you just want to fix broken HTML, then use HTML Tidy (it’s available as PHP extension).