Changeset 3ac1078 in unitok


Ignore:
Timestamp:
Jun 23, 2015, 12:33:00 PM (22 months ago)
Author:
Jan Michelfeit <jan.michelfeit@…>
Branches:
master
Parents:
c351917
Message:

Fix hexadecimal HTML entities

File:
1 edited

Legend:

Unmodified
Added
Removed
  • unitok/uninorm.py

    r166252d r3ac1078  
    55from htmlentitydefs import name2codepoint
    66
    7 HTMLENTITY_RE = re.compile(ur"&(#x?[0-9]+|\w+);")
     7HTMLENTITY_RE = re.compile(ur"&(#x?[0-9A-F]+|\w+);")
    88SPECIAL_ENTITIES = [u'gt', u'lt', u'quot']
    99def entity2unicode(mo, dont_convert):
Note: See TracChangeset for help on using the changeset viewer.