How can I parse a C header file with Perl?


Question

I have a header file in which there is a large struct. I need to read this structure using some program and make some operations on each member of the structure and write them back.

For example I have some structure like

const BYTE Some_Idx[] = {
4,7,10,15,17,19,24,29,
31,32,35,45,49,51,52,54,
55,58,60,64,65,66,67,69,
70,72,76,77,81,82,83,85,
88,93,94,95,97,99,102,103,
105,106,113,115,122,124,125,126,
129,131,137,139,140,149,151,152,
153,155,158,159,160,163,165,169,
174,175,181,182,183,189,190,193,
197,201,204,206,208,210,211,212,
213,214,215,217,218,219,220,223,
225,228,230,234,236,237,240,241,
242,247,249};

Now, I need to read this and apply some operation on each of the member variable and create a new structure with different order, something like:

const BYTE Some_Idx_Mod_mul_2[] = {
8,14,20, ...
...
484,494,498};

Is there any Perl library already available for this? If not Perl, something else like Python is also OK.

Can somebody please help!!!

1
6
6/15/2009 5:03:53 PM

Accepted Answer

Keeping your data lying around in a header makes it trickier to get at using other programs like Perl. Another approach you might consider is to keep this data in a database or another file and regenerate your header file as-needed, maybe even as part of your build system. The reason for this is that generating C is much easier than parsing C, it's trivial to write a script that parses a text file and makes a header for you, and such a script could even be invoked from your build system.

Assuming that you want to keep your data in a C header file, you will need one of two things to solve this problem:

  • a quick one-off script to parse exactly (or close to exactly) the input you describe.
  • a general, well-written script that can parse arbitrary C and work generally on to lots of different headers.

The first case seems more common than the second to me, but it's hard to tell from your question if this is better solved by a script that needs to parse arbitrary C or a script that needs to parse this specific file. For code that works on your specific case, the following works for me on your input:

#!/usr/bin/perl -w

use strict;

open FILE, "<header.h" or die $!;
my @file = <FILE>;
close FILE or die $!;

my $in_block = 0;
my $regex = 'Some_Idx\[\]';
my $byte_line = '';
my @byte_entries;
foreach my $line (@file) {
    chomp $line;

    if ( $line =~ /$regex.*\{(.*)/ ) {
        $in_block = 1;
        my @digits = @{ match_digits($1) };
        push @digits, @byte_entries;
        next;
    }

    if ( $in_block ) {
        my @digits = @{ match_digits($line) };
        push @byte_entries, @digits;
    }

    if ( $line =~ /\}/ ) {
        $in_block = 0;
    }
}

print "const BYTE Some_Idx_Mod_mul_2[] = {\n";
print join ",", map { $_ * 2 } @byte_entries;
print "};\n";

sub match_digits {
    my $text = shift;
    my @digits;
    while ( $text =~ /(\d+),*/g ) {
        push @digits, $1;
    }

    return \@digits;
}

Parsing arbitrary C is a little tricky and not worth it for many applications, but maybe you need to actually do this. One trick is to let GCC do the parsing for you and read in GCC's parse tree using a CPAN module named GCC::TranslationUnit. Here's the GCC command to compile the code, assuming you have a single file named test.c:

gcc -fdump-translation-unit -c test.c

Here's the Perl code to read in the parse tree:

  use GCC::TranslationUnit;

  # echo '#include <stdio.h>' > stdio.c
  # gcc -fdump-translation-unit -c stdio.c
  $node = GCC::TranslationUnit::Parser->parsefile('stdio.c.tu')->root;

  # list every function/variable name
  while($node) {
    if($node->isa('GCC::Node::function_decl') or
       $node->isa('GCC::Node::var_decl')) {
      printf "%s declared in %s\n",
        $node->name->identifier, $node->source;
    }
  } continue {
    $node = $node->chain;
  }
9
6/16/2009 2:30:03 AM

Sorry if this is a stupid question, but why worry about parsing the file at all? Why not write a C program that #includes the header, processes it as required and then spits out the source for the modified header. I'm sure this would be simpler than the Perl/Python solutions, and it would be much more reliable because the header would be being parsed by the C compilers parser.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon