No question about it, Python is a crucial part of modern data science. Convenient and powerful, Python connects data scientists and developers with a whole galaxy of tools and functionality, in convenient and programmatic ways.
Still, those tools sometimes come with a little—or a lot—of assembly required. Because Python is a general-purpose programming language, how it’s packaged and delivered doesn’t speak specifically to data scientists. But various folks have delivered Python to that audience in a way that’s prepackaged, with little to no assembly required—a project that regular Python users can benefit from, too.
Continuum Analytics’s Anaconda distribution is a repackaging of Python aimed at developers who use Python for data science. It provides a management GUI, a slew of scientifically oriented work environments, and tools to simplify the process of using Python for data crunching. It can also be used as a general replacement for the standard Python distribution, but only if you’re conscious of how and why it differs from the stock version of Python.
Anaconda comes in four distinct editions, each intended for different use cases for different audiences.
Anaconda Individual Edition
The free-to-use Individual Edition of Anaconda comes with the core features found in all Anaconda editions — the Anaconda Navigator, Jupyter Notebooks, the Spyder IDE, and so on. (More on these later.) The Individual Edition is the best place to start with Anaconda, as it will allow you to gain experience with all of the major elements in Anaconda and their behaviours.
Anaconda Commercial Edition
The Commercial Edition provides access to a package repository that has been curated for commercial use, with uptime guarantees. It is also the edition you need to buy if you plan to use Anaconda for commercial use (as opposed to individual or academic research). Each seat license starts at US$14.95 per month.
Anaconda Team Edition
The Team Edition provides teams of developers with user management features, high-priority updates to packages, and fine-grained package controls (block/allow lists). It is licensed for commercial use, with prices beginning at US$10,000 for a team of five users for one year.
Anaconda Enterprise Edition
The Enterprise Edition is aimed at enterprises that want to develop machine learning models and deploy them into production. Thus it provides infrastructure for all stages of the machine learning lifecycle, such as containerisation for projects. Pricing is available on request only.
What’s included in Anaconda
CPython, the reference version of Python, includes a few things to make life easier—the standard library, the IDLE mini-IDE, and the Tkinter user-interface library. But everything you might need for data science is an add-on—even the most basic tools. Anaconda, by contrast, tries to include a decent selection of data-science tools out of the box.
Here’s what’s included by default in Anaconda.
The Python interpreter
Anaconda includes by default the most recent release version of the Python interpreter. This is not the stock CPython build that comes from the Python Software Foundation—it’s a custom build, created by Anaconda Inc. specifically for the Anaconda distribution. According to Anaconda CTO Peter Wang, the interpreter has “more secure compiler flags on some platforms, better performance optimisations on others.”
That said, Anaconda’s Python interpreter should be drop-in compatible with CPython. C extensions written for it should work as is. In Microsoft Windows, for example, the interpreter has been compiled with Microsoft Visual C/C++ version 1928, same as the stock edition of CPython itself.
The Anaconda Navigator
The most noticeable thing Anaconda adds to the experience of working with Python is a GUI, the Anaconda Navigator. It is not an IDE, and it doesn’t try to be one, because most Python-aware IDEs can register and use the Anaconda Python runtime themselves. Instead, the Navigator is an organisational system for the larger pieces in Anaconda.
With the Navigator, you can add and launch high-level applications like R Studio or Jupyterlab; manage virtual environments and packages; set up “projects,” a way to manage work in Anaconda; and perform various administrative functions.
Although the Navigator provides the convenience of a GUI, it doesn’t replace any command-line functionality in Anaconda, or in Python generally. For example, although you can manage packages through the GUI, you can also use the command line to do so.
CPython, by contrast, has no formal GUI. It does come with IDLE, a mini-IDE suitable for quick one-off tasks. But anything for managing Python itself has to come from third parties. To that end, some IDEs provide GUI interfaces to CPython’s components. Microsoft Visual Studio, for example, has a GUI for Python’s Pip package-management system, akin to the UI Anaconda provides for its own Conda package manager.
Python comes with the Pip package manager, for installing and managing third-party Python packages. As much as Python’s developers have expanded Pip’s powers over the years, it’s still limited. It only manages packages for Python itself, not the rest of the system.
Anaconda’s developers struggled with this limitation, but eventually decided to engineer their own solution: Conda, a package management solution that handles not only Python packages but dependencies outside the Python ecosystem.
Here’s an example of what Conda helps with: If you have multiple Conda packages that rely on a compiler, like GCC or LLVM, Conda can resolve that external dependency for all those packages. It can install a single instance of a specific version of GCC for all Conda packages that need it. Pip would either have to assume you already have GCC installed somewhere on your system—or bundle a copy of GCC with each package that used it, a horribly inefficient and cumbersome solution.
Thus, Conda isn’t interchangeable with Pip. It doesn’t even use the same package format; packages created for Pip have to be re-created for Conda. But almost every package of significance used in the Python ecosystem is available through Conda.
How Anaconda makes data work easier
A fair number of Anaconda’s improvements revolve around the workaday use of Python, things that benefit most any Python user. But the most important benefits are aimed specifically at how data science users often find themselves at odds with their Python environments.
Python packages, even as managed with Conda, don’t always play nice with each other. Sometimes, you need different versions of things for particular projects. Python’s virtual environments feature, aka venv, was developed to offset this problem, but Conda takes the idea a step further.
Read more on the next page...