What is the computer science definition of entropy?

I've recently started a course on data compression at my university. However, I find the use of the term "entropy" as it applies to computer science rather ambiguous. As far as I can tell, it roughly translates to the "randomness" of a system or structure.

What is the proper definition of computer science "entropy"?


Solution 1:

Entropy can mean different things:

Computing

In computing, entropy is the randomness collected by an operating system or application for use in cryptography or other uses that require random data. This randomness is often collected from hardware sources, either pre-existing ones such as mouse movements or specially provided randomness generators.

Information theory

In information theory, entropy is a measure of the uncertainty associated with a random variable. The term by itself in this context usually refers to the Shannon entropy, which quantifies, in the sense of an expected value, the information contained in a message, usually in units such as bits. Equivalently, the Shannon entropy is a measure of the average information content one is missing when one does not know the value of the random variable

Entropy in data compression

Entropy in data compression may denote the randomness of the data that you are inputing to the compression algorithm. The more the entropy, the lesser the compression ratio. That means the more random the text is, the lesser you can compress it.

Shannon's entropy represents an absolute limit on the best possible lossless compression of any communication: treating messages to be encoded as a sequence of independent and identically-distributed random variables, Shannon's source coding theorem shows that, in the limit, the average length of the shortest possible representation to encode the messages in a given alphabet is their entropy divided by the logarithm of the number of symbols in the target alphabet.

Solution 2:

My favorite definition, with a more practical focus, is found in Chapter 1 of the excellent book The Pragmatic Programmer: From Journeyman to Master by Andrew Hunt and David Thomas:

Software Entropy

While software development is immune from almost all physical laws, entropy hits us hard. Entropy is a term from physics that refers to the amount of "disorder" in a system. Unfortunately, the laws of thermodynamics guarantee that the entropy in the universe tends toward a maximum. When disorder increases in software, programmers call it "software rot."

There are many factors that can contribute to software rot. The most important one seems to be the psychology, or culture, at work on a project. Even if you are a team of one, your project's psychology can be a very delicate thing. Despite the best laid plans and the best people, a project can still experience ruin and decay during its lifetime. Yet there are other projects that, despite enormous difficulties and constant setbacks, successfully fight nature's tendency toward disorder and manage to come out pretty well.

...

...

A broken window.

One broken window, left unrepaired for any substantial length of time, instills in the inhabitants of the building a sense of abandonment—a sense that the powers that be don't care about the building. So another window gets broken. People start littering. Graffiti appears. Serious structural damage begins. In a relatively short space of time, the building becomes damaged beyond the owner's desire to fix it, and the sense of abandonment becomes reality.

The "Broken Window Theory" has inspired police departments in New York and other major cities to crack down on the small stuff in order to keep out the big stuff. It works: keeping on top of broken windows, graffiti, and other small infractions has reduced the serious crime level.

Tip 4

Don't Live with Broken Windows

Don't leave "broken windows" (bad designs, wrong decisions, or poor code) unrepaired. Fix each one as soon as it is discovered. If there is insufficient time to fix it properly, then board it up. Perhaps you can comment out the offending code, or display a "Not Implemented" message, or substitute dummy data instead. Take some action to prevent further damage and to show that you're on top of the situation.

Text taken from: http://pragprog.com/the-pragmatic-programmer/extracts/software-entropy

Solution 3:

I always encountered entropy in the sense of Shannon Entropy.

From http://en.wikipedia.org/wiki/Information_entropy:

In information theory, entropy is a measure of the uncertainty associated with a random variable. The term by itself in this context usually refers to the Shannon entropy, which quantifies, in the sense of an expected value, the information contained in a message, usually in units such as bits. Equivalently, the Shannon entropy is a measure of the average information content one is missing when one does not know the value of the random variable.

Solution 4:

alt text
(source: mit.edu)

from University of Mexico

The information theoretic notion of Entropy is a generalization of the physical notion. There are many ways to describe Entropy. It is a measure of the randomness of a random variable. It is also a measure of the amount of information a random variable or stochastic process contains. It is also a lower bound on the amount a message can be compressed. And finally it is the average number of yes/no questions that need to be asked about an random entity to determine its value.

Equation for Entropy in a sample application for probability calculation:

it is the sum over all values of a rv of the probability of that value times the log of that prob(i.e. p(x)logp(x)). This equation can be derived from first principles of the properties of information.