Remove non-ASCII characters from a string using python / django


I have a string of HTML stored in a database. Unfortunately it contains characters such as ® I want to replace these characters by their HTML equivalent, either in the DB itself or using a Find Replace in my Python / Django code.

Any suggestions on how I can do this?

11/22/2015 1:34:26 AM

You can use that the ASCII characters are the first 128 ones, so get the number of each character with ord and strip it if it's out of range

# -*- coding: utf-8 -*-

def strip_non_ascii(string):
    ''' Returns the string without non ASCII characters'''
    stripped = (c for c in string if 0 < ord(c) < 127)
    return ''.join(stripped)

test = u'éáé123456tgreáé@€'
print test
print strip_non_ascii(test)



Please note that @ is included because, well, after all it's an ASCII character. If you want to strip a particular subset (like just numbers and uppercase and lowercase letters), you can limit the range looking at a ASCII table

EDITED: After reading your question again, maybe you need to escape your HTML code, so all those characters appears correctly once rendered. You can use the escape filter on your templates.

4/30/2010 8:25:11 AM

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow