Why the Python programming language shines for data science, machine learning, systems automation, web and API development, and more
Dating from 1991, the Python programming language was considered a gap-filler, a way to write scripts that “automate the boring stuff” (as one popular book on learning Python put it) or to rapidly prototype applications that will be implemented in other languages.
However, over the past few years, Python has emerged as a first-class citizen in modern software development, infrastructure management, and data analysis. It is no longer a back-room utility language, but a major force in web application creation and systems management, and a key driver of the explosion in big data analytics and machine intelligence.
Python’s key advantages
Python’s success revolves around several advantages it provides for beginners and experts alike.
Python is easy to learn and use
The number of features in the language itself is modest, requiring relatively little investment of time or effort to produce your first programs. The Python syntax is designed to be readable and straightforward. This simplicity makes Python an ideal teaching language, and it lets newcomers pick it up quickly. As a result, developers spend more time thinking about the problem they’re trying to solve and less time thinking about language complexities or deciphering code left by others.
Python is broadly adopted and supported
Python is both popular and widely used, as the high rankings in surveys like the Tiobe Index and the large number of GitHub projects using Python attest. Python runs on every major operating system and platform, and most minor ones too. Many major libraries and API-powered services have Python bindings or wrappers, letting Python interface freely with those services or directly use those libraries.
Python is not a “toy” language
Even though scripting and automation cover a large chunk of Python’s use cases (more on that later), Python is also used to build professional-quality software, both as standalone applications and as web services. Python may not be the fastest language, but what it lacks in speed, it makes up for in versatility.
Python keeps moving forward
Each revision of the Python language adds useful new features to keep pace with modern software development practices. Asynchronous operations and coroutines, for instance, are now standard parts of the language, making it easier to write Python apps that perform concurrent processing.
What Python is used for
The most basic use case for Python is as a scripting and automation language. Python isn’t just a replacement for shell scripts or batch files; it is also used to automate interactions with web browsers or application GUIs or to do system provisioning and configuration in tools such as Ansible and Salt. But scripting and automation represent only the tip of the iceberg with Python.
General application programming with Python
You can create both command-line and cross-platform GUI applications with Python and deploy them as self-contained executables. Python doesn’t have the native ability to generate a standalone binary from a script, but third-party packages like cx_Freeze and PyInstaller can be used to accomplish that.
Data science and machine learning with Python
Sophisticated data analysis has become one of fastest-moving areas of IT and one of Python’s star use cases. The vast majority of the libraries used for data science or machine learning have Python interfaces, making the language the most popular high-level command interface to for machine learning libraries and other numerical algorithms.
Web services and RESTful APIs in Python
Python’s native libraries and third-party web frameworks provide fast and convenient ways to create everything from simple REST APIs in a few lines of code to full-blown, data-driven sites. Python’s latest versions have strong support for asynchronous operations, letting sites handle tens of thousands of requests per second with the right libraries.
Metaprogramming and code generation in Python
In Python, everything in the language is an object, including Python modules and libraries themselves. This lets Python work as a highly efficient code generator, making it possible to write applications that manipulate their own functions and have the kind of extensibility that would be difficult or impossible to pull off in other languages.
Python can also be used to drive code-generation systems, such as LLVM, to efficiently create code in other languages.
“Glue code” in Python
Python is often described as a “glue language,” meaning it can let disparate code (typically libraries with C language interfaces) interoperate. Its use in data science and machine learning is in this vein, but that’s just one incarnation of the general idea. If you have applications or program domains that you would like to hitch up, but cannot talk to each other directly, you can use Python to connect them.
Where Python falls short
Also worth noting are the sorts of tasks Python is not well-suited for.
Python is a high-level language, so it’s not suitable for system-level programming—device drivers or OS kernels are out of the picture.
It’s also not ideal for situations that call for cross-platform standalone binaries. You could build a standalone Python app for Windows, MacOS, and Linux, but not elegantly or simply.
Finally, Python is not the best choice when speed is an absolute priority in every aspect of the application. For that, you’re better off with C/C++ or another language of that caliber.
How Python makes programming simple
Python’s syntax is meant to be readable and clean, with little pretense. A standard “hello world” in Python 3.x is nothing more than:
Python provides many syntactical elements to concisely express many common program flows. Consider a sample program for reading lines from a text file into a list object, stripping each line of its terminating newline character along the way:
with open(‘myfile.txt’) as my_file: file_lines = [x.rstrip(‘\n’) for x in my_file]
with/as construction is a context manager, which provides an efficient way to instantiate an object for a block of code and then dispose of it outside that block. In this case, the object is
my_file, instantiated with the
open() function. This takes the place of several lines of boilerplate to open the file, read individual lines from it, then close it up.
[x … for x in my_file] construction is another Python idiosyncrasy, the list comprehension. It lets an item that contains other items (here,
my_file and the lines it contains) be iterated through, and it lets each iterated element (that is, each
x) be processed and automatically appended to a list.
You could write such a thing as a formal
for… loop in Python, much as you would in another language. The point is that Python has a way to economically express things like loops that iterate over multiple objects and perform a simple operation on each element in the loop, or to work with things that require explicit instantiation and disposal.
Constructions like this let Python developers balance terseness and readability.
Python’s other language features are meant to complement common use cases. Most modern object types—Unicode strings, for example—are built directly into the language. Data structures—like lists, dictionaries (i.e., hashmaps or key-value stores), tuples (for storing immutable collections of objects), and sets (for storing collections of unique objects)—are available as standard-issue items.[ Also on InfoWorld: How to get started with Python ]
Python 2 vs. Python 3
Python is available in two versions, which are different enough to trip up many new users. Python 2.x, the older “legacy” branch, will continue to be supported (that is, receive official updates) through 2020, and it might persist unofficially after that. Python 3.x, the current and future incarnation of the language, has many useful and important features not found in Python 2.x, such as new syntax features (e.g., the “walrus operator”), better concurrency controls, and a more efficient interpreter.
Python 3 adoption was slowed for the longest time by the relative lack of third-party library support. Many Python libraries supported only Python 2, making it difficult to switch. But over the last couple of years, the number of libraries supporting only Python 2 has dwindled; all of the most popular libraries are now compatible with both Python 2 and Python 3. Today, Python 3 is the best choice for new projects; there is no reason to pick Python 2 unless you have no choice. If you are stuck with Python 2, you have various strategies at your disposal.
The success of Python rests on a rich ecosystem of first- and third-party software. Python benefits from both a strong standard library and a generous assortment of easily obtained and readily used libraries from third-party developers. Python has been enriched by decades of expansion and contribution.
Python’s standard library provides modules for common programming tasks—math, string handling, file and directory access, networking, asynchronous operations, threading, multiprocess management, and so on. But it also includes modules that manage common, high-level programming tasks needed by modern applications: reading and writing structured file formats like JSON and XML, manipulating compressed files, working with internet protocols and data formats (webpages, URLs, email). Most any external code that exposes a C-compatible foreign function interface can be accessed with Python’s
The default Python distribution also provides a rudimentary, but useful, cross-platform GUI library via Tkinter, and an embedded copy of the SQLite 3 database.
The thousands of third-party libraries, available through the Python Package Index (PyPI), constitute the strongest showcase for Python’s popularity and versatility.
- The BeautifulSoup library provides an all-in-one toolbox for scraping HTML—even tricky, broken HTML—and extracting data from it.
- Requests makes working with HTTP requests at scale painless and simple.
- Frameworks like Flask and Django allow rapid development of web services that encompass both simple and advanced use cases.
- Multiple cloud services can be managed through Python’s object model using Apache Libcloud.
- NumPy, Pandas, and Matplotlib accelerate math and statistical operations, and make it easy to create visualizations of data.
Like C#, Java, and Go, Python has garbage-collected memory management, meaning the programmer doesn’t have to implement code to track and release objects. Normally, garbage collection happens automatically in the background, but if that poses a performance problem, you can trigger it manually or disable it entirely, or declare whole regions of objects exempt from garbage collection as a performance enhancement.
An important aspect of Python is its dynamism. Everything in the language, including functions and modules themselves, are handled as objects. This comes at the expense of speed (more on that later), but makes it far easier to write high-level code. Developers can perform complex object manipulations with only a few instructions, and even treat parts of an application as abstractions that can be altered if needed.
Python’s use of significant whitespace has been cited as both one of Python’s best and worst attributes. The indentation on the second line below isn’t just for readability; it is part of Python’s syntax. Python interpreters will reject programs that don’t use proper indentation to indicate control flow.
with open(‘myfile.txt’) as my_file: file_lines = [x.rstrip(‘\n’) for x in my_file]
Syntactical white space might cause noses to wrinkle, and some people do reject Python for this reason. But strict indentation rules are far less obtrusive in practice than they might seem in theory, even with the most minimal of code editors, and the result is code that is cleaner and more readable.
Another potential turnoff, especially for those coming from languages like C or Java, is how Python handles variable typing. By default, Python uses dynamic or “duck” typing—great for quick coding, but potentially problematic in large code bases. That said, Python has recently added support for optional compile-time type hinting, so projects that might benefit from static typing can use it.
Is Python slow? Not necessarily
One common caveat about Python is that it’s slow. Objectively, it’s true. Python programs generally run much more slowly than corresponding programs in C/C++ or Java. Some Python programs will be slower by an order of magnitude or more
Why so slow? It isn’t just because most Python runtimes are interpreters rather than compilers. It is also due to the fact that the inherent dynamism and the malleability of objects in Python make it difficult to optimize the language for speed, even when it is compiled. That said, Python’s speed may not be as much of an issue as it might seem, and there are ways to alleviate it.
Python performance optimizations
It isn’t always the fate of a slow Python program to be forever slow. Many Python programs are slow because they don’t properly use the functionality in Python or its standard library. Novice Python programmers often write Python as if it were C or Java, and leave performance on the table. Math and statistics operations can be sped up dramatically by using libraries such as NumPy and Pandas.
A common adage of software development is that 90 percent of the activity for a program tends to be in 10 percent of the code, so optimizing that 10 percent can yield major improvements. With Python, you can selectively convert that 10 percent to C or even assembly, using projects like Cython or Numba. The result is often a program that runs within striking distance of a counterpart written entirely in C, but without being cluttered with C’s memory-micromanagement details.
Finally, alternative Python runtimes have speed optimizations that the stock CPython runtime lacks. PyPy, for instance, is a just-in-time (JIT) Python compiler that converts Python to native machine code on the fly. PyPy can provide orders-of-magnitude speedups for many common operations.
Developer time often trumps machine time
Or to put it another way: For many tasks, speed of development beats speed of execution.
A given Python program might take six seconds to execute versus a fraction of a second in another language. But it might take only ten minutes for a developer to put that Python program together, versus an hour or more of development time in another language. The amount of time lost in the execution of the Python program is more than gained back by the time saved in the development process.
Obviously, this is less true when you’re writing software that has high-throughput, high-concurrency demands, such as a trading application or database. But for many real-world applications, in domains ranging from systems management to machine learning, Python will prove to be fast enough.
Plus, the flexibility and pace of development that Python enables may allow for innovation that would be more difficult and time-consuming to achieve in other languages.
When speed of development and programmer comfort are more important than shaving a few seconds off the machine clock, Python may well be the best tool for the job.