Packages


python packages

  • Palladium

    • It is a pluggable framework for developing real-world machine learning solutions. It provides generic implementations for things commonly needed in machine learning, such as dataset loading, model training with parameter search, a web service, and persistence capabilities, allowing you to concentrate on the core task of developing an accurate machine learning model.
    • Built on sklearn
  • Nashpy

  • gc
    This module provides an interface to the optional garbage collector. It provides the ability to disable the collector, tune the collection frequency, and set debugging options.

  • pathlib2

  • dabl (Data Analysis Baseline Library)

  • missingno

  • AutoViz

  • Bamboolib

  • FlashText

  • PyFlux

  • Numerizer

  • Emot

Useful python libraries for data science


Machine learning


Data versioning

  • DVC
  • Quilt: a versioned data portal for AWS

Data visualization

  • Pylab is a module that belongs to the Python mathematics library Matplotlib. PyLab combines the numerical module numpy with the graphical plotting module pyplot. PyLab was designed with the interactive Python interpreter in mind, and therefore many of its functions are short and require minimal typing.

  • Pydicom is a pure Python package for working with DICOM files such as medical images, reports, and radiotherapy objects.

  • imgaug

  • pillow


Data science pipeline

  • pypeln: creating concurrent data pipelines
  • PyFunctional: creating data pipelines easy by using chained functional operators.
  • Joblib: running Python functions as pipeline jobs

How to manage application dependencies?

# install
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python

# configure your current shell 
source $HOME/.poetry/env
  # To install it with pip
  $ [sudo] pip install pigar
  
  # To install it with conda
  $ conda install -c conda-forge pigar

  # Generate requirements.txt for current directory.
  $ pigar

  # Generating requirements.txt for given directory in given file.
  $ pigar -p ../dev-requirements.txt -P ../
  • Manage dependencies for Python projects using Pipenv

Pipenv is a dependency manager for Python projects. If you’re familiar with Node.js’ npm or Ruby’s bundler, it is similar in spirit to those tools. While pip alone is often sufficient for personal use, Pipenv is recommended for collaborative projects as it’s a higher-level tool that simplifies dependency management for common use cases.

  $ pip install --user pipenv
  $ cd myproject
  $ pipenv install requests

How to integrate R and Python in a single workflow?


How to operate and manipulate physical quantities in Python?

Pint is a Python package to define, operate and manipulate physical quantities: the product of a numerical value and a unit of measurement. It allows arithmetic operations between them and conversions from and to different units.

   # Install
   $ pip install pint

   >>> import pint
   >>> ureg = pint.UnitRegistry()
  
  # 1) Convert between meters and centimeters 
   >>> 3 * ureg.meter + 4 * ureg.cm
   <Quantity(3.04, 'meter')>
   
   # 2) Or use numpy 
   >>> import numpy as np
   >>> [3, 4] * ureg.meter + [4, 3] * ureg.cm
   <Quantity([ 3.04  4.03], 'meter')>

   >>> np.sum(_)
   <Quantity(7.07, 'meter')>

How to do geospatial operations?

Library to provide basic geospatial operations like distance calculation, conversion of decimal coordinates to sexagesimal and vice versa, etc. This library is currently 2D, meaning that altitude/elevation is not yet supported by any of its functions!


Data science glossary