Counting repeated characters in a string in Python


Question

I want to count the number of times each character is repeated in a string. Is there any particular way to do it apart from comparing each character of the string from A-Z and incrementing a counter?

Update (in reference to Anthony's answer): Whatever you have suggested till now I have to write 26 times. Is there an easier way?

1
41
5/23/2017 10:30:52 AM

Accepted Answer

My first idea was to do this:

chars = "abcdefghijklmnopqrstuvwxyz"
check_string = "i am checking this string to see how many times each character appears"

for char in chars:
  count = check_string.count(char)
  if count > 1:
    print char, count

This is not a good idea, however! This is going to scan the string 26 times, so you're going to potentially do 26 times more work than some of the other answers. You really should do this:

count = {}
for s in check_string:
  if count.has_key(s):
    count[s] += 1
  else:
    count[s] = 1

for key in count:
  if count[key] > 1:
    print key, count[key]

This ensures that you only go through the string once, instead of 26 times.

Also, Alex's answer is a great one - I was not familiar with the collections module. I'll be using that in the future. His answer is more concise than mine is and technically superior. I recommend using his code over mine.

34
6/13/2009 8:10:32 PM

import collections

d = collections.defaultdict(int)
for c in thestring:
    d[c] += 1

A collections.defaultdict is like a dict (subclasses it, actually), but when an entry is sought and not found, instead of reporting it doesn't have it, it makes it and inserts it by calling the supplied 0-argument callable. Most popular are defaultdict(int), for counting (or, equivalently, to make a multiset AKA bag data structure), and defaultdict(list), which does away forever with the need to use .setdefault(akey, []).append(avalue) and similar awkward idioms.

So once you've done this d is a dict-like container mapping every character to the number of times it appears, and you can emit it any way you like, of course. For example, most-popular character first:

for c in sorted(d, key=d.get, reverse=True):
  print '%s %6d' % (c, d[c])

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon