Information entropy is in fact a more fundamental definition than our familiar Baltzmann entropy in stat mech.
When we use Boltzmann energy , where W is the total number of microscopic states of the system with a given macroscopic state, we assume that all microscopic states has the same probability. This is true for most thermal systems. However, if the probability of each micro state is not the same, we have to modify the definition of entropy, then we have Gibbs entropy, which is equivalent to Shannon (information) entropy: , where is the probability of a state , and the sum takes over all ‘s. You can check that if all are equal, this definition goes back to Boltzmann entropy.
So we can have a ‘modern’ interpretation of thermal entropy: the amount of entropy means the amount of information we need to input into this system to determine its micro state for a given macro state. In other words, order means predictability, and for a specific state, the predictability means its probability. If a thermal system has a larger number of micro states, then each state would have smaller probability, which means we have a smaller chance to predict the right micro state, that’s why this system has a higher entropy.
A very useful lesson from thermal theory for us to understand order and disorder in information theory is: For two systems with the same macro state, the one with independent sub systems has higher entropy than the one with correlated sub systems. For example, ideal gas system has higher entropy than interacting gas system with the same micro state. We can use this intuition into information theory. For example, compare two pages of paper with the same number of letters on it. The letters on one page is totally random, while the other page is a well written article. So we say the letters on the first page are independent while the letters on the second page have some correlations with each other. To determine the micro structure of the first page, we have to put in the information of each letter, with probability for each one. To determine the micro structure of the second page, we only need to put in the information of each word, with the probability larger than , where n is the number of letters in the word, since we know that there are some combination of letters which are definitely not a word. So now you can calculate the probability of each micro state, the second should be larger than the first. So we say the second page is more ‘predictable’, and the information comes from correlation. It’s interesting that we have just interpreted information entropy by using thermal entropy, for smaller probability of a micro state means larger number of micro states.
But there are 2 cases in which we can only use information entropy. One is, as I mentioned, when the probability of each micro state is not the same; Another one is non-equilibrium process. I am not familiar with either of them. Does any one know any examples of these cases?
The following is Lightsaber’s explanation of entropy in non-equilibrium process:
Let me start with non-equilibrium situations. In my lecture, I mentioned that “Information is …… boundary condition.” Many thought that “boundary condition” was a phrase I randomly picked, it’s not. Actually, I was referring to the non-equilibrium cases. Non-equilibrium is featured by non-uniform distribution and time-variance. By possessing the information of its spatial distribution, the entropy of the system is reduced. When equilibrium is established, all the information become no longer valid, and the entropy increases. That is why the establishment of equilibrium is always entropy-increasing. This piece of information could be valid only at a particular time, or be valid during the entire time interval we study. In the former case, it’s an “initial value condition” in PDE language, but it’s but a boundary condition in time domain anyway.
Starting from the simplest example in non-equilibrium thermodynamics: diffusion. If we connect a full bottle of nitrogen dioxide (bottle A) and a full bottle of air (bottle B) with a glass tube, we can see the red color gradually propagates, until the gas in both bottle has the same color. This is a entropy-increasing process. At the very beginning, we do know (know = possess a piece of valid information) that there’s neither air in A nor air in B. With the knowledge, the entropy is relatively low. In the language of probability theory as Shannon used, the probability of (an infinitesimal domain) in bottle A being filled with NO2 is 1, while it is 0 in bottle B. After the establishment of equilibrium, this piece of information become completely invalid, and the entropy is larger.
However, how can we describe the dynamical process (the formal term is “transport process” I think) between the start and the equilibrium? Can we use information theory to process it? I believe so.
In my PERSONAL opinion which hasn’t appeared on any reference material I’ve read so far, the introduction of fuzzy mathematics will be a possible way to solve it. Darthmaverick is an outstanding expert in this area, but I can discuss to the best of my knowledge. We can define a “membership function of validity” for any piece of information, which is dependent on time. This membership function can be determined as “The probability that the piece of information is true”. For example, for the statement “Bottle A is full of NO2 without air”, we can define the membership function as “P(an infinitesimal domain is filled with NO2)-P(an infinitesimal domain is filled with air)”. At the starting of transpotation, this membership function is 1, and it becomes 0 eventually. It can be proven (though I haven’t done it myself) that this membership function could be constructed to be proportional to the “entropy decreasing capability” of the corresponding information, as in the example above.
The utilization of this measure is still to be explored. After all, I come up with this combination of non-equilibrium thermodynamics, information theory and fuzzy mathematics independently, and it’s expected that some original work can be done following this direction.
That’s all for now, thank you.
As to the cases in which “the probability of each micro state is not the same”, intuition told me that it’s EQUIVALENT to the non-equilibrium cases, or uniquely corresponds to one. I wish I could prove it mathematically, but I cannot do it in a rigorous way. Possibly it’s wrong anyway.
One of the possible way of proving it is to consider the symmetry of the system. The unsymmetry of different microstates vs the broken symmetry the macroscopic system, what’s the connection?
Plus, according to LOT2, a closed system tends to maximize its entropy. Can a system reach the GLOBAL maximum of entropy without eliminating the unsymmetry among microstates?
I wish I could find an example in which symmetry among microstates is essentially and permanently broken. If it exists, and it can be stablized given time, my hypothesis in the 1st paragraph is wrong.
Then CoolPro asked an interesting question (in Chinese):
My answer is:
Another piece of LS’ comments:
A microstate is a specific, detailed configuration that includes the state of all the particles inside. For an N-particle system, it’s one single point in the 6N-dimensional phase space. For example, for a system described by canonical ensemble, its equilibrium macrostate is consist of numerous microstates, EACH OF WHICH obeys Gibbs distribution (cuz each microstate contains a complete set of information about all the particles and thus has its OWN distribution), and has the same probability as each other. If we change the parameters (like T), the equilibrium state will be consist of another set of microstates, but each of them will still obey Gibbs distribution and will has the same probability as each other.
To my knowledge, Gibbs distribution (or more commonly called ‘Gibbs measure’, or ‘Boltzmann distribution’) means the probability of a microstate of a system, not the distribution of the particles in this system. If this is not GL means, that’s fine, we don’t need to debate on terminologies. But I do have some more comments. The phase space you mentioned is for microcanonical ensembles, in which the density of states is a constant, which means all microstates, no matter what macrostates it correspond to, have the same probability. This is true for microcanonical ensembles (or isolated systems), but not for canonical ensembles, which is exactly described by Gibbs measure. In the phase space of a canonical ensemble, the density of a certain microstate is not a constant, it can repeat for many times, and the density or probability of the microstate is proportional to the number of microstates of the reservoir which ‘coexist’ with it. When you consider this and do the calculation, you get the Gibbs measure.
We should distinguish ensemble language and distribution language. The former one is only interested in microstates of the whole system. It is so abstract that it never cares about what kind of system it is or the fate of a single particle in it, while this fate or distribution probability of a single particle is just the essential focus of the distribution language, and it’s result varies with the types of systems(classical, quantum, interacting or not, etc. ). And further, a given microstate in ensemble language has no ‘probability distribution’, cuz it’s totally determined, the so-called distribution is just a description of this state, and it can be way off the Boltzmann distribution, eg, some higher energy level may have more particles than some lower energy level. Some of us may confuse ‘Boltzmann distribution’ in ensenble language with that in distribution language. They have exactly the same mathematical form, but have very different meanings: one is for microstates of a system ensemble, one is for particles in a single system. Why they have the same form is just a coincidence, because ensembles are defined as classical and independent with each other, which is the same property of the particles in a classical non-interacting thermal system. But see, we have other kinds of thermal systems, eg, quantum boson or fermion system, and they do not obey Boltzmann distribution.