If Python is interpreted, what are .pyc files?

I've been given to understand that Python is an interpreted language...

This popular meme is incorrect, or, rather, constructed upon a misunderstanding of (natural) language levels: a similar mistake would be to say "the Bible is a hardcover book". Let me explain that simile...

"The Bible" is "a book" in the sense of being a class of (actual, physical objects identified as) books; the books identified as "copies of the Bible" are supposed to have something fundamental in common (the contents, although even those can be in different languages, with different acceptable translations, levels of footnotes and other annotations) -- however, those books are perfectly well allowed to differ in a myriad of aspects that are not considered fundamental -- kind of binding, color of binding, font(s) used in the printing, illustrations if any, wide writable margins or not, numbers and kinds of builtin bookmarks, and so on, and so forth.

It's quite possible that a typical printing of the Bible would indeed be in hardcover binding -- after all, it's a book that's typically meant to be read over and over, bookmarked at several places, thumbed through looking for given chapter-and-verse pointers, etc, etc, and a good hardcover binding can make a given copy last longer under such use. However, these are mundane (practical) issues that cannot be used to determine whether a given actual book object is a copy of the Bible or not: paperback printings are perfectly possible!

Similarly, Python is "a language" in the sense of defining a class of language implementations which must all be similar in some fundamental respects (syntax, most semantics except those parts of those where they're explicitly allowed to differ) but are fully allowed to differ in just about every "implementation" detail -- including how they deal with the source files they're given, whether they compile the sources to some lower level forms (and, if so, which form -- and whether they save such compiled forms, to disk or elsewhere), how they execute said forms, and so forth.

The classical implementation, CPython, is often called just "Python" for short -- but it's just one of several production-quality implementations, side by side with Microsoft's IronPython (which compiles to CLR codes, i.e., ".NET"), Jython (which compiles to JVM codes), PyPy (which is written in Python itself and can compile to a huge variety of "back-end" forms including "just-in-time" generated machine language). They're all Python (=="implementations of the Python language") just like many superficially different book objects can all be Bibles (=="copies of The Bible").

If you're interested in CPython specifically: it compiles the source files into a Python-specific lower-level form (known as "bytecode"), does so automatically when needed (when there is no bytecode file corresponding to a source file, or the bytecode file is older than the source or compiled by a different Python version), usually saves the bytecode files to disk (to avoid recompiling them in the future). OTOH IronPython will typically compile to CLR codes (saving them to disk or not, depending) and Jython to JVM codes (saving them to disk or not -- it will use the .class extension if it does save them).

These lower level forms are then executed by appropriate "virtual machines" also known as "interpreters" -- the CPython VM, the .Net runtime, the Java VM (aka JVM), as appropriate.

So, in this sense (what do typical implementations do), Python is an "interpreted language" if and only if C# and Java are: all of them have a typical implementation strategy of producing bytecode first, then executing it via a VM/interpreter.

More likely the focus is on how "heavy", slow, and high-ceremony the compilation process is. CPython is designed to compile as fast as possible, as lightweight as possible, with as little ceremony as feasible -- the compiler does very little error checking and optimization, so it can run fast and in small amounts of memory, which in turns lets it be run automatically and transparently whenever needed, without the user even needing to be aware that there is a compilation going on, most of the time. Java and C# typically accept more work during compilation (and therefore don't perform automatic compilation) in order to check errors more thoroughly and perform more optimizations. It's a continuum of gray scales, not a black or white situation, and it would be utterly arbitrary to put a threshold at some given level and say that only above that level you call it "compilation"!-)

They contain byte code, which is what the Python interpreter compiles the source to. This code is then executed by Python's virtual machine.

Python's documentation explains the definition like this:

Python is an interpreted language, as opposed to a compiled one, though the distinction can be blurry because of the presence of the bytecode compiler. This means that source files can be run directly without explicitly creating an executable which is then run.

There is no such thing as an interpreted language. Whether an interpreter or a compiler is used is purely a trait of the implementation and has absolutely nothing whatsoever to do with the language.

Every language can be implemented by either an interpreter or a compiler. The vast majority of languages have at least one implementation of each type. (For example, there are interpreters for C and C++ and there are compilers for JavaScript, PHP, Perl, Python and Ruby.) Besides, the majority of modern language implementations actually combine both an interpreter and a compiler (or even multiple compilers).

A language is just a set of abstract mathematical rules. An interpreter is one of several concrete implementation strategies for a language. Those two live on completely different abstraction levels. If English were a typed language, the term "interpreted language" would be a type error. The statement "Python is an interpreted language" is not just false (because being false would imply that the statement even makes sense, even if it is wrong), it just plain doesn't make sense, because a language can never be defined as "interpreted."

In particular, if you look at the currently existing Python implementations, these are the implementation strategies they are using:

IronPython: compiles to DLR trees which the DLR then compiles to CIL bytecode. What happens to the CIL bytecode depends upon which CLI VES you are running on, but Microsoft .NET, GNU Portable.NET and Novell Mono will eventually compile it to native machine code.
Jython: interprets Python sourcecode until it identifies the hot code paths, which it then compiles to JVML bytecode. What happens to the JVML bytecode depends upon which JVM you are running on. Maxine will directly compile it to un-optimized native code until it identifies the hot code paths, which it then recompiles to optimized native code. HotSpot will first interpret the JVML bytecode and then eventually compile the hot code paths to optimized machine code.
PyPy: compiles to PyPy bytecode, which then gets interpreted by the PyPy VM until it identifies the hot code paths which it then compiles into native code, JVML bytecode or CIL bytecode depending on which platform you are running on.
CPython: compiles to CPython bytecode which it then interprets.
Stackless Python: compiles to CPython bytecode which it then interprets.
Unladen Swallow: compiles to CPython bytecode which it then interprets until it identifies the hot code paths which it then compiles to LLVM IR which the LLVM compiler then compiles to native machine code.
Cython: compiles Python code to portable C code, which is then compiled with a standard C compiler
Nuitka: compiles Python code to machine-dependent C++ code, which is then compiled with a standard C compiler

You might notice that every single one of the implementations in that list (plus some others I didn't mention, like tinypy, Shedskin or Psyco) has a compiler. In fact, as far as I know, there is currently no Python implementation which is purely interpreted, there is no such implementation planned and there never has been such an implementation.

Not only does the term "interpreted language" not make sense, even if you interpret it as meaning "language with interpreted implementation", it is clearly not true. Whoever told you that, obviously doesn't know what he is talking about.

In particular, the .pyc files you are seeing are cached bytecode files produced by CPython, Stackless Python or Unladen Swallow.

If Python is interpreted, what are .pyc files?

Related

Recent Posts