Next Steps

Next Steps#

In this section, we’ve introduced the concept of Scientific Computing along with two incredibly popular packages: NumPy and pandas. However, these two packages, while incredibly powerful, only get you to the point of working with tabular data in python. They are great for managing, manipulating, and summarizing data. However, they won’t help you analyze your data…and can only enable you to scratch the surface of visualizing data.

The goal of this last chapter is simply to point you in the direction of where you could learn more about analyzing data in Python.

Data Visualization#

There are many ways to visualize data. We saw that pandas has some built-in capabililties; however, it is limited in its visualizations. To this end, we’ll point you in the direction of three additional packages for plotting and visualizaing data in python:

  1. matplotlib | This package is well-established and used ubiquitously. It allows for generation and customization of data visualizations and publication-quality plots.

  2. seaborn | This package is built on top of matplotlib but has a high-level interface for generating good-looking plots with less code and time than matplotlib. Works well with DataFrames from pandas.

  3. altair | Implements the grammar of graphics in Python, enabling a consistent and simple API. Not quite as popular as the two above, but incredibly powerful!

Data Analysis#

  1. statsmodels | This package is the go-to or statistical modeling, enabling users to explore data and carry out statistical tests. Works well with DataFrames from pandas.

  2. SciPy | Includes algorithms for optimiation, differential equations, algebraic equations, etc. Builds on top of NumPy and works well with numerical data.

  3. Scikit-learn | Popular package for learning machine learning in python. Built on NumPy, SciPy, and matplotlib. Has really great documentation.

  4. PyTorch | Ecosystem to generate production-ready machine learning models; often used in concert with TensorFlow