Scientific Computing#
Scientific Computing is an extension on top of everything we’ve learned thus far, with the specific goal of using computing to work with data. There are a host of tools that have been developed within the Python ecosystem to support the needs and goals of indidivuals working with data. In Python, there a suite of packages that all work well together to work with and analyze data. These packages are not made by the same people who manage the python language and standard library. However, these packages are incredibly popular, maintained and supported by teams of developers, and regularly updated.
The most central tool in scientific computing (and what we’ll focus on most in this section) is a package called pandas
. pandas
is a “fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language.” [1] Wes McKinney originally developed the pandas package in 2008, with its first official release in 2009. [2], cementing Python as one of the primary languages used when analyzing data, which is a primary function of data scientists and researchers alike.
We’ll also introduce numpy
, which enables matrix operations in Python and upon which pandas
is built, along with a few additional packages for future exploration. The goal of this section is not to teach these packages in depth (but there are many resources that do this [3]) but rather to introduce the main functions necessary to get started when working with data.
Package Installation#
A reminder that since these packages are not part of the standard library, they will need to be installed prior to use. Remember, you install once and then import the package wherever needed after that.
The most straightforward approach to install these packages is to use pip install packagename
. For example, to install pandas, it would be pip install pandas
. (Note that if you do not have full install rights on your system, you may need to use the --user
flag, for example: pip install --use pandas
)
If you’re using a different package management system, such as conda, you are able to install the packages used here as well.
For this section, you’ll have to install both pandas
and numpy
to complete all included exercises and examples.
1. ^ https://pandas.pydata.org/
2. ^ https://en.wikipedia.org/wiki/Pandas_(software)
3. ^ https://jakevdp.github.io/PythonDataScienceHandbook/03.00-introduction-to-pandas.html