How to implement common bash idioms in Python?


Question

I currently do my textfile manipulation through a bunch of badly remembered AWK, sed, Bash and a tiny bit of Perl.

I've seen mentioned a few places that python is good for this kind of thing. How can I use Python to replace shell scripting, AWK, sed and friends?

1
242
4/13/2018 11:58:59 PM

Accepted Answer

Any shell has several sets of features.

  • The Essential Linux/Unix commands. All of these are available through the subprocess library. This isn't always the best first choice for doing all external commands. Look also at shutil for some commands that are separate Linux commands, but you could probably implement directly in your Python scripts. Another huge batch of Linux commands are in the os library; you can do these more simply in Python.

    And -- bonus! -- more quickly. Each separate Linux command in the shell (with a few exceptions) forks a subprocess. By using Python shutil and os modules, you don't fork a subprocess.

  • The shell environment features. This includes stuff that sets a command's environment (current directory and environment variables and what-not). You can easily manage this from Python directly.

  • The shell programming features. This is all the process status code checking, the various logic commands (if, while, for, etc.) the test command and all of it's relatives. The function definition stuff. This is all much, much easier in Python. This is one of the huge victories in getting rid of bash and doing it in Python.

  • Interaction features. This includes command history and what-not. You don't need this for writing shell scripts. This is only for human interaction, and not for script-writing.

  • The shell file management features. This includes redirection and pipelines. This is trickier. Much of this can be done with subprocess. But some things that are easy in the shell are unpleasant in Python. Specifically stuff like (a | b; c ) | something >result. This runs two processes in parallel (with output of a as input to b), followed by a third process. The output from that sequence is run in parallel with something and the output is collected into a file named result. That's just complex to express in any other language.

Specific programs (awk, sed, grep, etc.) can often be rewritten as Python modules. Don't go overboard. Replace what you need and evolve your "grep" module. Don't start out writing a Python module that replaces "grep".

The best thing is that you can do this in steps.

  1. Replace AWK and PERL with Python. Leave everything else alone.
  2. Look at replacing GREP with Python. This can be a bit more complex, but your version of GREP can be tailored to your processing needs.
  3. Look at replacing FIND with Python loops that use os.walk. This is a big win because you don't spawn as many processes.
  4. Look at replacing common shell logic (loops, decisions, etc.) with Python scripts.
144
10/22/2015 6:40:38 AM

Yes, of course :)

Take a look at these libraries which help you Never write shell scripts again (Plumbum's motto).

Also, if you want to replace awk, sed and grep with something Python based then I recommend pyp -

"The Pyed Piper", or pyp, is a linux command line text manipulation tool similar to awk or sed, but which uses standard python string and list methods as well as custom functions evolved to generate fast results in an intense production environment.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon