Fixed attribute parsing to tolerate more bad stuff (from Google..)
authorFrederic Jolliton <frederic@jolliton.com>
Wed, 7 Sep 2005 21:44:20 +0000 (21:44 +0000)
committerFrederic Jolliton <frederic@jolliton.com>
Wed, 7 Sep 2005 21:44:20 +0000 (21:44 +0000)
 * Fixed attribute parsing to tolerate some bad formated value
   (such as those who contains '=' character without even being
   quoted properly.)

git-archimport-id: frederic@jolliton.com--2005-main/tx--main--0.1--patch-8

htmlparser.py

index 1e4053c..8e66dcf 100644 (file)
@@ -68,7 +68,7 @@ reAttr = re.compile(
        '('
        r'\s*=\s*'         # spaces then =
        '('
-       '[^\'"\\s=]+'      # anything but spaces and quote characters
+       '[^\'"\\s]+'       # anything but spaces and quote characters
        '|'
        '"[^"]*"'          # double-quote string
        '|'