Scientific and Numeric

Python is the language of choice in Scientific and Numeric. The libraries like NumPy, SciPy, Pandas, matplotlib, IPython makes the problems easier. Python is moving in this field at a faster pace. The focus of Python language is on Productivity and readability.

Why Python for scientific and numeric computing?

  • Very clear, readable syntax through whitespace indentation
  • Strong introspection capabilities
  • Full modularity, supporting hierarchical packages
  • Exception-based error handling
  • Interactive python console
  • Dynamic data types & automatic memory management

Natural Language Processing

Libraries for working with human languages.

  • NLTK - A leading platform for building Python programs to work with human language data.
  • jieba - Chinese Words Segmentation Utilities.
  • langid.py - Stand-alone language identification system.
  • Pattern - A web mining module for the Python. It has tools for natural language processing, machine learning, among others.
  • SnowNLP - A library for processing Chinese text.
  • TextBlob - Providing a consistent API for diving into common NLP tasks. Stands on the giant shoulders of NLTK and Pattern.

Science and Data Analysis

Libraries for scientific computing and data analyzing.

  • astropy - A community Python library for Astronomy.
  • bcbio-nextgen - A toolkit providing best-practice pipelines for fully automated high throughput sequencing analysis.
  • bccb - Collection of useful code related to biological analysis.
  • Biopython - Biopython is a set of freely available tools for biological computation.
  • blaze - NumPy and Pandas interface to Big Data.
  • cclib - A library for parsing and interpreting the results of computational chemistry packages.
  • NetworkX - A high-productivity software for complex networks.
  • Numba - Python JIT (just in time) complier to LLVM aimed at scientific Python by the developers of Cython and NumPy.
  • NumPy - A fundamental package for scientific computing with Python.
  • Open Babel - A chemical toolbox designed to speak the many languages of chemical data.
  • Open Mining - Business Intelligence (BI) in Python (Pandas web interface)
  • orange - Data mining, data visualization, analysis and machine learning through visual programming or Python scripting.
  • Pandas - A library providing high-performance, easy-to-use data structures and data analysis tools.
  • PyDy - Short for Python Dynamics, used to assist with workflow in the modeling of dynamic motion based around NumPy, SciPy, IPython, and matplotlib.
  • PyMC - Markov Chain Monte Carlo sampling toolkit.
  • RDKit - Cheminformatics and Machine Learning Software.
  • SciPy - A Python-based ecosystem of open-source software for mathematics, science, and engineering.
  • statsmodels - Statistical modeling and econometrics in Python.
  • SymPy - A Python library for symbolic mathematics.
  • zipline - A Pythonic algorithmic trading library.
  • FEniCS - The FEniCS Project is a collection of free software for the automated, efficient solution of differential equations.

Data Visualization

Libraries for visualizing data. See: awesome-javascript.

  • matplotlib - A Python 2D plotting library.
  • bokeh - A powerful Python interactive visualization library that targets modern web browsers for presentation, with the goal of providing elegant, concise construction of novel graphics in the style of D3.js, but also delivering this capability with high-performance interactivity over very large or streaming datasets.
  • ggplot - Same API as ggplot2 for R.
  • plotly - Collaborative web plotting for Python and matplotlib.
  • pygal - A Python SVG Charts Creator.
  • pygraphviz - Python interface to Graphviz.
  • PyQtGraph - Interactive and realtime 2D/3D/Image plotting and science/engineering widgets.
  • vincent - A Python to Vega translator.
  • VisPy - High-performance scientific visualization based on OpenGL.

Machine Learning

Libraries for Machine Learning. See: awesome-machine-learning.

  • Crab - A flexible, fast recommender engine.
  • gensim - Topic Modelling for Humans.
  • hebel - GPU-Accelerated Deep Learning Library in Python.
  • NuPIC - Numenta Platform for Intelligent Computing.
  • pattern - Web mining module for Python.
  • PyBrain - Another Python Machine Learning Library.
  • Pylearn2 - A Machine Learning library based on Theano.
  • python-recsys - A Python library for implementing a Recommender System.
  • scikit-learn - A Python module for machine learning built on top of SciPy.
  • vowpal_porpoise - A lightweight Python wrapper for Vowpal Wabbit.

MapReduce

Frameworks and libraries for MapReduce.

  • dpark - Python clone of Spark, a MapReduce alike framework in Python.
  • dumbo - Python module that allows one to easily write and run Hadoop programs.
  • luigi - A module that helps you build complex pipelines of batch jobs.
  • mrjob - Run MapReduce jobs on Hadoop or Amazon Web Services.
  • PySpark - The Spark Python API.
  • streamparse - Run Python code against real-time streams of data. Integrates with Apache Storm.

Tutorials

Books/Resources

Online courses/Videos

Contributors

The following people helped in creating the above content.