I have a string of HTML stored in a database. Unfortunately it contains characters such as ® I want to replace these characters by their HTML equivalent, either in the DB itself or using a Find Replace in my Python / Django code.
Any suggestions on how I can do this?
You can use that the ASCII characters are the first 128 ones, so get the number of each character with
ord and strip it if it's out of range
# -*- coding: utf-8 -*- def strip_non_ascii(string): ''' Returns the string without non ASCII characters''' stripped = (c for c in string if 0 < ord(c) < 127) return ''.join(stripped) test = u'éáé123456tgreáé@€' print test print strip_non_ascii(test)
Please note that
@ is included because, well, after all it's an ASCII character. If you want to strip a particular subset (like just numbers and uppercase and lowercase letters), you can limit the range looking at a ASCII table
EDITED: After reading your question again, maybe you need to escape your HTML code, so all those characters appears correctly once rendered. You can use the
escape filter on your templates.