Seeking to explain thermodynamics based on moving and interacting atoms

Chapter 7 – Entropy and the 2nd Law of Thermodynamics

Entropy is usually introduced only in the later chapters of thermodynamics textbooks, largely because—unlike temperature, pressure, or volume—it is not an intuitively accessible property of matter. Yet entropy and the concepts built upon it, including the Boltzmann distribution of energy (discussed in Chapter 8), play a central role in thermodynamics, far beyond the familiar Clausius statement that the entropy of the world tends toward a maximum. Because these ideas are so important to our micro-to-macro understanding of thermodynamics, I prefer to introduce them early rather than delay their treatment.

With that in mind, we begin here—with entropy and, more broadly, the Second Law of Thermodynamics.

My Perspective on Entropy and the Second Law of Thermodynamics

Entropy and probability

Entropy and heat

As will be developed in later chapters, the discovery of entropy, initially in the context of heat and later in the context of probability, in the mid-1800s enabled theorists to formulate thermodynamic equations to account for not only changes in internal energy due to heat addition, but also for changes in, what I call, structural energy that result from chemical reactions. The structural energy of a system reflects the total energy adjusted for temperature (\intδQ/T) required to establish that system; the difference in structural energy between reactants and products is T ∆Srxn. These concepts will be elaborated later.

Background on how I arrived at my perspective

[The following is an excerpt from Chapter 24, the lead-in to Part IV, of my book, Block by Block – The Historical and Theoretical Foundations of Thermodynamics.]

 I never understood entropy – anonymous engineer

We can’t measure it directly.  We rarely use it directly.  Few of us truly understand it.  It has frustrated us no end.  And yet…

And yet, shortly after its discovery by Rudolf Clausius in 1865, the concept of entropy became one of the founding cornerstones of classical thermodynamics.  This extremely powerful concept, a concept that almost died an early death were it not for the genius of such individuals as J. Willard Gibbs, enabled a mathematically rigorous means to ensure the impossibility of perpetual motion while also providing the critical bridge from the mechanical world to the chemical world, enabling physicists, chemists and engineers a means to quantitatively master both phase and reaction equilibria.

Yes, entropy has frustrated us since its inception, its underlying meaning elusive to our continued inquiries.  But while many don’t fully understand it, this hasn’t stopped us from using it.

In keeping with the motivation of this book, the intent of Part IV is to raise your level of understanding of entropy and its role in the 2nd Law of thermodynamics, especially regarding the world of chemistry.  As with previous parts, this lead-in chapter to Part IV serves to introduce the fundamental science behind entropy, while subsequent chapters share the history that brought us to the discovery or creation of such fundamentals—at some point the concepts of discovery and creation blur.  It is through sharing the history that I take an even deeper dive into the fundamental science involved.  So let’s start.

Entropy as a consequence of our discrete world

The understanding of [temperature and entropy]…is dependent on the atomic constituency of matter.  – Arieh Ben-Naim [1]

To understand entropy, one must zoom into the atomic world and consider what happens with real atoms and molecules—it is because nature is discrete [2] that entropy exists. 

Nature is comprised of discrete entities: atoms and molecules, photons and fundamental particles.  We’re taught in physics about how to apply Newton’s laws of motion to analyze two colliding bodies, but we’re rarely taught how to analyze a large system of such colliding bodies, as the concepts are more complicated.  It’s in the large system that entropy exists, which is rather fascinating since who would think that any kind of structure would exist inside such a seemingly frenetic system and who would think that such a system could be characterized by a single number called entropy?

Atoms in a box

Put a single moving atom into an isolated box and, in the ideal Classical World, it will continue to move, bouncing off the walls, this way and that, like a billiard ball ricocheting off the sides of a pool table. Since the system is isolated (and ideal), then according to the 1st Law of Thermodynamics its energy remains constant.  And since the only energy in this system is the kinetic energy of the atom, then this too remains constant.  The only parameters that change are the atom’s location and direction of movement, or its momentum, due to successive collisions with the box’s walls, which we treat as ideal reflectors.  For this system, ignoring quantum mechanical effects, knowing the atom’s initial location and velocity, we can predict its motion until the end of time.

Now add another atom to this system.  We can similarly model this but with the added feature of the two atoms colliding with each other.  During such collisions, both kinetic energy and momentum are conserved as per Newton.  With such a simple system, we can again predict the future motions of both atoms to the end of time.

Now add a third atom.  The same conservation laws apply.  Continuing to ignore quantum mechanics, the particles will move exactly as Newton would predict.  It’s just that the mathematics become a little more complicated.

Now add more atoms.  Add a trillion atoms.  There’s no difference.  The conservation laws continue to apply and all trillion atoms will move according to Newton’s dictums.  But while there may be no inherent physical difference as we move from the small to the large, there is a tremendous difference in how we approach the modeling.  In short, we can calculate the small but not the large, for that task is nearly impossible.  We’ve made significant progress over the years in modeling such an ideal system due to the increase in available computing power, but when we start accounting for the large numbers involved and start adding the real-world complexity of intermolecular interactions and quantum effects, we reach ‘impossible’ rather quickly. [3]

Fortunately for us though, there’s a way around this limitation, a way based on the fundamental fact that nature is not arbitrary, although things get fuzzy at the scale of Heisenberg’s uncertain world, which we can safely ignore for now.  Because nature largely follows a cause-effect world at the microscopic level, certain macroscopic properties and relationships between these properties exist, which tremendously simplifies our quantitative approach to modeling nature.  Instead of following trillions of colliding atoms, we can instead follow such macroscopic properties as temperature and pressure.

We saw this in Part III.  In an isolated system, regardless of what’s going on inside the system, if no heat enters the system and no work is done on the system, then the energy of the system remains constant.  This 1st Law of Thermodynamics rises directly from the behavior of atoms at the microscopic level where energy is conserved during each collision.  We can pretty quickly grasp the meaning of this law.  It makes sense to us.  Energy just doesn’t appear or disappear.  Things just don’t happen.  You can’t get something for nothing.  There’s no such thing as a free lunch.  As human beings, we intuitively grasp the meaning of this law as reflected by the fact that it pops up frequently during our daily conversations.  We may not always apply the law correctly, but we do grasp its meaning.

The 2nd Law of Thermodynamics in which entropy is imbedded is more complicated.  While this law similarly rises as a consequence of the cause-effect nature of the microscopic world, it unfortunately doesn’t lend itself to such a quick grasp of meaning.  But let’s try.

The appearance of structure in large systems of colliding atoms

In the above system of a trillion atoms, for now assuming they behave as an ideal gas (no interactions), the atoms follow the cause-effect world according to Newton.  The cause-effect is there, operating in the background, during each and every collision, beyond our limits of detection and modeling.  But while all of these collisions are going on, creating a seemingly muddy pool of kinetic confusion, something quite remarkable happens.  Two beautifully symmetric and smooth population distributions appear, one for location and the other for velocity. [4] Given enough time for many collisions to occur, the distribution of the atoms becomes uniform with respect to location and Gaussian—forming the famous shape of the bell curve—with respect to velocity. [5]  This fascinating behavior always happens for such a large number of atoms.  It is this behavior that is the basis for the highest-level definition of the 2nd Law of Thermodynamics and that explains the lower-level manifestations of this law such as heat always flows from hot to cold and perpetual motion is impossible.  While we’ll get to all of this later, for right now, it’s important to realize that this is really it.  Given enough time, atoms spread evenly with regards to location and Gaussian with regards to velocity.

So why do these two specific distributions result?  Probability.  There’s no driving force at work other than probability.  They result as a natural consequence of the probability associated with a large number of events such as collisions between atoms in the world of Avogadro-sized systems. 

In short, nature moves towards its most probable state.  If you flip a coin a thousand times, the most probable result is that you’ll end up with 500 heads and 500 tails, even though each individual flip is random.  There’s no driving force involved.  This is simply the result of how probability works for large numbers. [6] Is it possible to observe a thousand heads for a thousand flips?  Yes.  Probable?  No.  You’ll never see this in your lifetime.  After many flips, the 50:50 distribution (with some very small error bars) is a near certainty.  This same basic probability logic applies to colliding atoms.  Are other distributions possible?  Yes.  Probable?  No.  The most probable distributions for a system of colliding atoms are uniform-location and Gaussian-velocity.  The probability that each occurs is (just shy of) 100%.

Why uniform-location and Gaussian-velocity distributions make sense

Let’s dig a little bit deeper into this discussion to get a better feel for the concepts involved.  And let’s do this by considering some examples, starting first with location.  Envision all the trillion atoms located solely in one half of the box.  You would look at this and realize that something isn’t right.  Nature doesn’t behave this way.  On the other hand, envision all the atoms spread evenly throughout the entire box.  You would look at this and say, this looks right, just as you would if you placed a drop of dye into a glass of water and saw it eventually spread and create a uniform coloring throughout the entire glass.  You would expect to see such an even distribution for location.  This has been your experience based on your own real-world observations.  And this is indeed how nature behaves.  Atoms move in an isolated system until density is uniform throughout (assuming no external field like gravity is acting on the system).  The cause is uniform probability.  If all of the atoms have access to all the volume, then, given enough time, they’ll eventually populate this volume without preference.  As a result, out of all the possible distributions that could possibly exist, the uniform distribution prevails. 

With velocity, it’s a little more complicated.  Imagine you have the power to ‘see’ the velocities—here based on both speed and direction—of all of these many atoms, and say you saw just a few really fast atoms accounting for all the energy,[7] while all of the (many) other atoms were moving really slowly.  You would similarly look at this and realize that something isn’t right.  Nature doesn’t behave this way.  We just empirically know that given enough time, all of the atoms would be in motion, hitting each other, with no atom left untouched for long.  Head on or glancing blow, nature has no preference.  Every type of collision you could imagine occurs.  If in this scenario we could ‘see’ the velocities of all the atoms or, better yet, if we could somehow see the distribution of velocities and how it changes over time, you would see a Gaussian distribution (assuming you’re looking at a single velocity component, such as vx) very tightly fluctuating around some average, which is in fact zero if the entire system is stationary, and every once in a while, a non-Gaussian distribution would flicker across your screen, but this would be a rare event.  You’d look at this movie and think, this looks right.  It would look like many other Gaussian distributions you’ve seen before such as a population distribution based on height.

Initially, even though the distribution might look right, you likely wouldn’t be able to explain why.  But if you thought about it some, you’d start to realize that there’s a reason for this shape.  When atoms collide, energy is conserved.  If one gains, the other loses.  Moreover, it’s typically the slower one that gains and the faster one that loses, meaning that fast atoms typically don’t get faster.  Statistically, the faster they get, the greater the concentration of slow atoms in their immediate neighborhood and the higher the probability they’ll hit a slow atom and thus be pulled back towards the average, which is either the mean kinetic energy or zero velocity for the x, y or z component, depending on which property you’re studying.  In the world of statistics, this “pull” is termed “regression toward the mean.”  There’s a stronger and stronger pull towards the mean as a given atom’s velocity increases further and further away from the mean.

Having said all this, it is possible that you could have such a highly unusual distribution of atoms inside the system—the flicker I mentioned above—in which the motion of a single atom amongst the trillion accounts for all of the energy, such as could be the case if the atom were bouncing exactly perpendicularly between two parallel walls with all the other atoms sitting off to the side with zero velocity.  Is this possible?  Yes.  Probable?  Most assuredly not.  And you’d simply know this by looking at such a distribution.  You’d think that it simply doesn’t look right.  You’d think that nature is more symmetrical than this.  And you’d be correct.

The microstate and statistical mechanics

At this point, you’re likely wondering what this whole conversation has to do with entropy.  I’m getting there but along a path that’s different than typically taught.  Please bear with me.  I’m introducing you first to one physical model—there are others—used to guide the mathematical approach to analyzing such a system.  This is a very short preview of the path taken by Ludwig Boltzmann to arrive at the probabilistic nature of entropy.

* * *

Imagine an isolated system of a trillion monatomic gas-phase atoms for which total energy (U), volume (V) and number of atoms (N) are all fixed.  Further imagine that you could watch a high-speed movie of these atoms.  As discussed above, you would see them evenly distributed throughout the system with small rare fluctuations.  You’d see fast, you’d see slow.  You’d see constant collisions.  It’d be like watching a game of Pong on your television with a huge number of balls in colliding motion.

Now say that you wanted to analyze this in more detail.  Taking the earlier discussion to a higher level that enables application of advanced mathematics, you could take the movie, cut out all the individual frames, take many measurements and then for each and every frame record each atom’s location and velocity (speed and direction; you would need to use two sequential frames to determine velocities).  Furthermore, you could convert the raw data into population graphs, showing the number of atoms as a function of location, velocity or energy.  In this way, each single frame of the movie becomes a single data point representing a unique way of arranging the N atoms in the given volume with fixed energy.  You could even make a new movie wherein each frame consists of one of the population distributions you want to analyze.  You could run the movie and watch how the distribution changes with time.

In the world of statistical mechanics, each frame of the movie is called a microstate and represents a unique distribution of the available atoms that complies with fixed U-V-N constraints.  While the microstate is a man-made concept representing a significant breakthrough in our attempt to model and understand the thermodynamics of such large systems, it is also an excellent approximation of reality as it captures the real fluctuations that occur in nature.  At steady state, the time-average of the microstates equals the macroscopic state of the system.[8]  When we measure macroscopic properties, such as temperature and pressure, the numbers aren’t exactly constant because of these microstate fluctuations.  We typically don’t see such fluctuations, as they’re easy to miss if you’re not looking for them.  But if you do want to see them, there is one place to look—under the microscope, for it is there that you can observe the famed Brownian motion, which is more accurately a consequence of the fluctuations rather than the fluctuations themselves.  It was Einstein who made sense of this phenomenon by using Boltzmann’s work to quantify the fluctuations and resulting motions and in so doing provided substantive evidence towards the existence of atoms.

The microstate is a mathematical construct that was developed to model nature.  While nature doesn’t know anything about microstates, I again emphasize that the above is not simply a mathematical exercise.  It’s a highly effective means of understanding and predicting nature, just as Einstein demonstrated.  Experiments have since further proven that the models used in statistical mechanics truly do reflect how nature behaves and fascinatingly creative experiments have even made visual for us the Gaussian distribution of velocity.  While so far we’ve only addressed the simplest of systems—monatomic, non-interacting atoms—the approach also works (with adjustments) for more complicated systems involving interactions and/or vibrating, rotating, reacting molecules.  No matter how complicated the system, the population distributions based on location, velocity and energy follow similar distributions laws.  The distributions are the predictable consequence of Newtonian mechanics combined with energy conservation playing itself out over a very large number of atoms.

Mathematical approach to analyzing the microstates – balls in buckets

Of course, it gets more complicated than this.  You have a viable physical model.  Now you need to bring in the mathematics.  How would you go about modeling the cut-up movie?

* * *

If you’ll recall from probability theory, there are only so many ways to arrange a given number of balls into a given number of buckets.  For example, if you had two balls and two buckets, there would be four possible arrangements:  both in the left bucket, both in the right bucket and then, assuming a red and a blue ball, red in left / blue in right and then blue in left / red in right.[9] The point here is that the mathematics exists to calculate this number of possible arrangements.

Now imagine that the above buckets are not physical objects but instead conceptual ones based on location or velocity or both, with infinitesimally sized ranges for each variable (ranges are needed to create the buckets).  And further imagine that instead of balls you have atoms.  The same mathematics applies.  You could (with much work) sit down and calculate the total number of different ways to place the atoms into the location buckets and the velocity buckets.  It would get kind of complicated as you’d be working in six dimensions—three for location (x-y-z coordinates) and three for velocity (vx-vy-vz vectors)—but the math would still work.  In the end you would arrive at a finite number, meaning that there is a fixed number of ways to arrange a fixed number of atoms into a huge group of location-velocity buckets, all the while maintaining fixed U-V-N.

Each and every one of these arrangements aligns with a unique frame or microstate in your movie.  Each microstate is thus comprised of its own unique distribution of atoms based on both location and velocity.

An interesting fact about each and every one of these microstates or movie frames or arrangements is this.  Each is just as likely to occur as any other.  Each is equiprobable with all the others.  This sounds unusual, right?  Consider that one possible frame is for all atoms to be in one half of the given volume V.  It is indeed possible that this will occur.  Consider that another possible frame is for all atoms to be spread evenly across the entire volume.  It is equally possible that this will occur.  But this doesn’t make sense, does it?  How can this be?  The ½ volume frame looks very unusual if not highly improbable, like getting a thousand heads in a row while flipping a coin.  Understanding the logic here gets you to the heart of this discussion.

If you count up all the possible movie frames or all the possible arrangements, you’d arrive at a finite number.  Let’s call this number J.  If you were to start analyzing the different arrangements comprising J you’d realize something very quickly.  Most all of the arrangements look the same.  Most all of the arrangements would show a similar distribution of uniform-location and Gaussian-velocity.  In fact, you could even do the mathematics to prove this.  You could pose the question, which distribution accounts for the most number of arrangements?, and then use advanced calculus and probability-based mathematics to solve.  Make “number of arrangements” the dependent y-variable, “distribution” the independent x-variable, differentiate (dy/dx), set equal to zero, and find the maximum.  The answer:  uniform-location and Gaussian-velocity.  These two distributions account for (by far) the most type of arrangements.  You could then quantify the total number of these arrangements and call this number W.

The final fascinating thing about this exercise is that for large numbers, W is almost exactly equal to J and for all intents and purposes the two can be assumed to be the same, an assumption which significantly simplifies subsequent mathematics.  In looking at this, the reason that uniform-location and Gaussian-velocity are the most probable distributions is that they occur much, much more frequently than any other distribution.  Yes, a given distribution of all atoms on one side has the same probability of occurring as a given distribution of uniform-location.  But there are overwhelmingly more uniform-location distributions that occur, thus making them the most frequent and thus the most probable.  The frequency of occurrence of all such unusual distributions—the “flickers”—is located in the value of J minus W, and this value is very small but not zero – or else Brownian motion would never exist.  (Highly improbable events are not impossible.)

Nature shows no preference for how atoms and molecules distribute themselves within an isolated system so long as the U-V-N constraints are maintained.  There’s no “driving force” to move towards a given distribution other than pure probability.  Let’s look at what this means.  Say you were to set up an improbable system, such as one in which all atoms were in one half of the volume or one in which all of the faster moving atoms were in one half of the volume with the slower moving atoms in the other half.  You could do this by putting a sliding door into the system to separate one half from the other and then adding all atoms to one side or heating up one of the sides.  In this way you’d create a pressure gradient or a temperature gradient, respectively, between the two halves.  At the exact moment you removed this internal constraint by either pulling out the door for the former or by somehow enabling thermal conduction across the door for the latter, you would have created a highly improbable system.  How would you know?  Because just as the drop of dye in a glass of water doesn’t last for long, neither does the state of these two systems.  If you sat and watched what happened, you’d see the atoms rush to fill the void and you’d see the two temperatures move towards each other.  In other words, just as a drop of dye moves to fill the entire glass of water, you’d see these two improbable initial states move towards the most probable final states of uniform-location and Gaussian-velocity.

Finally, the relevance of the number of microstates

At this point (again) you might be asking, even more emphatically, so what?  Here’s where we arrive at the answer.  Consider the statement:  a given system of U-V-N has a certain specific number of ways (called microstates) that it can be arranged.  This statement has profound implications because it effectively says that given U-V-N, the number of arrangements is fixed and thus this number itself a property of the system, just like temperature, pressure and internal energy.  An unusual property, yes, but a property nonetheless.

Well, along the path taken by Boltzmann that I chose to share here, the concept of “number of microstates” is directly related to the main theme of this chapter: entropy.  In fact, entropy, which we denote as S, is calculated from this number, which we denote as W, per the famous Boltzmann equation, the one carved on his tombstone,

            S = k ln W

in which k later became kB , the famed Boltzmann’s constant.

Entropy is a fundamental property of matter, a state property, and is related to the number of different ways atoms, molecules and even photons can be arranged while maintaining fixed U-V-N.  Once U-V-N are fixed, the number W also becomes fixed, just like the number of different arrangements of balls-in-buckets is fixed once you fix the numbers of both balls and buckets. 

Entropy is a difficult-to-comprehend property, but it’s a property nonetheless.  As we’ll see later, because we know from classical thermodynamics that S is dependent on U-V-N, then we also know that W is dependent on U-V-N as well and can thus establish a direct link between the atomic world and the bulk properties of matter.  Boltzmann’s equation is the bridge connecting the two worlds.

* * *

From the standpoint of classical thermodynamics, we don’t really need this link between entropy and the number of microstates.  We don’t need Boltzmann’s equation.  In fact, we pretty much don’t need any aspect of the entire preceding discussion.  It’s largely irrelevant to our use of entropy in the solving of problems relating to, for example, phase and reaction equilibria.

So why am I spending time telling you about it?  Because this link has everything to do with explaining what entropy is.  In classical thermodynamics we use entropy without really understanding why.  While there’s nothing wrong with that, there is something very limiting about that, something that limits our deeper understanding of what we’re really doing.  We end up not understanding entropy and so become almost afraid to confront it, embrace it and use it.

The discovery of entropy – Clausius and the steam engine

One logical question to ask based on the above is, why don’t we learn the deeper meaning behind entropy first, right at the beginning?  And that’s a good question.  One reason is that we don’t need to understand it; we can use it perfectly well to solve problems without the understanding.  Another reason has to do with history.  It turns out that if you start from statistical mechanics and apply some very powerful mathematics, you can arrive at the following rather familiar and famous equation

dS  =  δQ/T        reversible process

This equation—valid for a reversible process involving an equilibrated system, concepts we’ll get to later—states that the increase in entropy of a system, regardless of the contents of that system, is equal to the heat absorbed by the system divided by absolute temperature, another nontrivial concept we’ll get to later.  The contents could be hydrogen gas, water or titanium.  It doesn’t matter.  The relationship remains the same.  This equation is a consequence of the probabilistic nature of entropy and is buried deep down in the mathematics of statistical mechanics as just one of many equations and relationships that can be derived. 

It just so happens that there’s another way to discover this equation: inside a steam engine.  Who would have thought?  Through his deep, insightful contemplation of Sadi Carnot’s work, Clausius realized, in a truly monumental step in history that broke open a huge log-jam in the world of physical chemistry, that not only does entropy exist as a property but that it changes in such a way as to govern each step of Carnot’s four-step engine reversible cycle:  the two isothermal volume-change steps, for which the change in entropy equals the total heat absorbed or rejected divided by temperature (since temperature is constant), and the two adiabatic volume-change steps, for which there is no change in entropy since there is no heat exchange.

Entropy enabled completion of thermodynamics

Clausius’ above equation is not the sole defining equation of entropy.  As noted before, it’s not even the defining equation as that prize belongs to Boltzmann’s equation.  But it is one way and in fact the way that enabled completion of classical thermodynamics.  Recall Clausius’ separate defining equation for the 1st Law of Thermodynamics:

            dU  =  δQ  –  δW

We know from his work that heat flow into the system, Q, and work done by the system, W, do not describe properties of matter, as such, but instead describe changes in properties.  Each quantifies a specific change inside the system.  We know from Part III that the infinitesimal work (δW) done by a fluid (gas or liquid) is equal to the pressure of the system times the infinitesimal change in volume, PdV.  We also know that the infinitesimal heat (δQ) added is equal to TdS.  Combining these two relationships enabled conversion of the 1st Law into one based purely on properties of the system, independent of the heat / work path taken to move from one state to another.

            dU  =  TdS  –  PdV

The arrival of entropy in 1865 enabled the 1st Law of Thermodynamics to be upgraded to the first fundamental equation of thermodynamics, or more accurately, the differential of the first fundamental equation.  This differential equation is based solely on thermodynamic properties without reference to heat and work processes and is the source from which all other thermodynamic equations can be derived.  It was this equation that served, in part, as the basis for Gibbs’ 300-page monumental treatise Equilibrium of Heterogeneous Substances.  The other part that Gibbs relied on was a fascinating characteristic of entropy that Clausius discovered.

Die Entropie der Welt strebt einem Maximum zu [10] – Clausius’ version of the 2nd Law

Clausius’ entire nine-memoir analysis of Carnot’s steam engine, an analysis in which the announcement of entropy was but a single part, firmly established the two laws of thermodynamics.  The 1st Law as written above was based on energy and its conservation.  The conservation inherent to this law said that an energy increase in one system must be compensated by an energy decrease of identical magnitude in another system.  This law in and of itself had a tremendous impact on the science community.  But it wasn’t enough.

Carnot wrote Reflexions to address “whether the motive power of heat is unbounded.”[11]  His theoretical approach, while groundbreaking, could not be completed since it was based on the false caloric theory of heat.  In correcting Carnot’s work by shifting to the mechanical theory of heat as quantified by Joule, Clausius and also Thomson were left with the residual challenge of how to quantify the maximum performance of an engine.  They struggled with identifying what it was that limited performance.  It was Clausius who ultimately solved this problem by first showing that the conservation of energy embedded in the 1st Law was a necessary but not sufficient condition for defining the limit.  Why?  Because the 1st Law did not prevent the flow of heat from cold to hot.  Without this constraint, the immense thermal energy contained in the ocean on account of its mass could be used to generate an immense amount of work, a situation both he and Thomson deemed impossible as will be addressed in a later chapter.  To solve this problem, Clausius added a 2nd Law saying that heat can only flow from hot to cold, which was really conventional wisdom at that point in time, but then upgraded this to the unprecedented statement: the entropy of the universe increases to a maximum.  Whether or not he meant this statement to also be valid for an isolated system is open to question and I’ll address this later.  But in fact this statement is valid for an isolated system—and I’ll continue to refer to it as being owned by Clausius—and became one of the critical starting hypotheses for the subsequent work of Gibbs.  Because of its importance, additional discussion is warranted.

Towards a deeper understanding of Clausius’ 2nd Law

The increasing-entropy version of the 2nd Law of Thermodynamics that started with Clausius is more complicated than it sounds.  Let’s consider it in more depth by first creating an improbable system having internal energy gradients of pressure (mechanical), temperature (thermal) or chemical potential (chemical), a property that will be covered in more detail later.  A way to create such a system would be to connect two separate systems having different values for any of these properties and then enabling these systems to exchange energy with each other by installing, for example, a flexible membrane, a conducting wall, or a porous wall, respectively.  In this scenario, the combined system will move towards its most probable state, the gradients will dissipate and iso-baric, iso-thermal and iso-potential conditions will result.

But where does entropy fit into this discussion?  Well, entropy is an extensive property just like volume and mass.  So take the two separate systems presented above.  Assume that each is initially equilibrated (a reasonable assumption), that each has its own entropy value, and that these two entropy values can be added together to quantify the total entropy of the initial set-up prior to combination.  Now at the exact moment you connect the two systems, the situation immediately changes and the combined system evolves towards one in which temperature, pressure, and chemical potential are constant throughout.  When you do the math, this final evolved state, for which the energy gradients have dissipated, is the most probable state and comprises the greatest number of arrangements or microstates.  When the atoms and molecules are distributed in accordance with this, equilibrium is achieved and entropy established.[12]  Now here’s the critical issue in all of this.  This final value of entropy for the combined system is always greater than or equal to the sum of the entropies of the two initial systems prior to combination.

To repeat, the final entropy of the combined equilibrated system will always be greater than or equal to the combined entropies of the two separate systems prior to contact.  A combined system of many parts will always strive towards its most probable state, the entropy of which will always be greater than the sum of the entropies of all the components.

∑ S separate parts          ≤       S combined system

It was this characteristic of entropy that Gibbs employed to fully define the concept of chemical equilibria in his groundbreaking publication.  I’ll discuss this in greater detail later but wanted to address it now because this understanding serves to emphasize the importance of not only entropy but also its relevance to the field of equilibria.

* * *

Let’s look even further into the scenario involving the contact of two separate systems, this time solely involving temperature.  If you graphed the initial population distribution of all of the atoms in the combined system against their respective kinetic energies (as opposed to their respective velocities), you’d see a bimodal distribution with each peak corresponding to one or the other of the two temperatures.  For the same reason an isothermal system will never naturally split into a bimodal distribution like this, an initial bimodal distribution will never exist forever.  It will always (eventually) move towards a single distribution indicative of the most probable isothermal state. 

Even though in the above scenario the atoms of the two systems don’t mix, the system can still be viewed as a single system with regards to the fact that the energies in the two systems do “mix” via conduction.  As the atoms between the two systems “talk” to each other at the conducting boundary between them by colliding with the boundary wall atoms, the hot ones slow, the cold ones speed up and the two temperatures move towards each other.  When the two temperatures equal each other, no thermal energy gradient remains and the population distribution becomes a single distribution based on kinetic energy, i.e., temperature, which reflects a Gaussian distribution based on velocity.   

If you were to calculate the entropy of each body before and after combination, you’d find that the total sum of the two entropies before contact is less than the total entropy of the combined and equilibrated system.  This is the nature of entropy.  Energy gradients of any kind reflect an improbable state.  As a system moves from improbable to probable, gradients dissipate and total entropy increases.  This is because the entropy increase of the lower energy system is greater than the entropy decrease of the higher energy system.  This is the nature of entropy.  And this is why the sum of the two changes is always positive.  This is the context of Clausius’ version of the 2nd Law:  entropy increases to a maximum in an isolated system.

The historical path towards entropy

The historical path leading towards our understanding of entropy is fascinating.  It began with early forms of the 2nd Law, then continued with the discovery of entropy and subsequent upgrading of the 2nd Law, and finally ended with the arrival of statistical mechanics.  While fascinating, to this author, this path is arguably not the ideal path by which to learn about entropy as it’s difficult to learn about any subject by first learning about its consequences.  It was our good fortune that Clausius remarkably discovered entropy from a very unusual direction, the steam engine, well before statistical mechanics was developed.  But it was in statistical mechanics where entropy would eventually be understood.

So to this author, a challenge in understanding entropy and the 2nd Law is caused, in part, by the fact that the cart came before the horse during entropy’s history.[13] The property arrived before we had any concept of what it was.  For better or for worse, the structure of our education reflects this cart-before-horse history.  We are educated according to history’s timeline.  Learning about entropy from a steam engine is not easy.  But then again learning about entropy from statistical distributions isn’t easy either.  Many links in the logic chain separate the simple two-body collision physics of Newton from the statistical arrangement physics of Boltzmann.  Each link along the way isn’t that difficult to understand but the full leap from Newton to Boltzmann is difficult to make, especially since entropy doesn’t lend itself to the visceral understanding we have for other properties, such as temperature and pressure, which we can directly sense and measure.  Perhaps learning about entropy is simply an inherent challenge, no matter which way you approach it.  But I prefer not to give up so easily.  In fact, one of the overriding goals for writing this book was to provide some new ideas about how to better teach the concept to students, and so here we are.

* * *

As undergraduates, we learn the classical thermodynamics of Clausius, Thomson, Gibbs and others and while we may not understand what entropy is, we can still learn how to use it to solve problems.  The statistical thermodynamics of Maxwell, Boltzmann and Gibbs again provides insight into entropy but is taught second, typically in graduate school, if at all.  So we learn how to get by without truly understanding entropy.  Not helping matters is that classical thermodynamics is itself confusing.  Gibbs’ treatise, in which entropy played a central role, is very difficult to read.  And his work competed with other works to establish the field.  Many had their own version of this field:  Clausius, Maxwell, Helmholtz, Duhem, Planck, Nernst and others.  Entropy was important to some, not so much to others.  The atomic theory of matter came into being after completion of classical thermodynamics and so helped transform this subject into the statistical mechanics of Boltzmann.  Indeed, the success of statistical mechanics helped validate its starting atomic-theory hypothesis (Chapter 4).  Again, this history is indeed fascinating, but the sequence of how this all happened and how it ended up in the textbooks with different and antiquated terminologies led to even more confusion, above and beyond the technical challenge itself.  Each author had his own way of explaining the material.  As Feynman said, “This is why thermodynamics is hard, because everyone uses a different approach.  If we could only sit down once and decide on our variables, and stick to them, it would be fairly easy.” [14]

* * *

Man’s incessant need to figure things out, to determine the cause of an effect, led him into a brick wall many times regarding the journey towards entropy, its meaning and its relation to the 2nd Law of Thermodynamics.  Many interpretations were given, many books were written and many mistakes were made about this concept.  And while we have finally arrived at a detailed definition of this concept, to this day, 150 years after its discovery by Rudolf Clausius, many otherwise successful graduates in physics, chemistry and engineering still don’t fully understand entropy.  Part IV is my own approach to bringing clarity to the subject by sharing with you the rich, technical history that led to its discovery and use.

References

[1] Ben-Naim, Arieh. 2008. A Farewell to Entropy: Statistical Thermodynamics Based on Information: S=logW. Hackensack, N.J: World Scientific p. 2.

[2] Ignoring wave-particle duality for now.

[3] Kaznessis, Yiannis Nikolaos. 2012. Statistical Thermodynamics and Stochastic Kinetics: An Introduction for Engineers. Cambridge ; New York: Cambridge University Press. see p. 5 for a discussion of this.

[4] Even if you were small enough, you couldn’t directly see these distributions with your eyes as you’d be one-step removed.  You’d have to find some magical way to measure the instantaneous location and velocity of every atom and then graph the location and velocity population distributions to see them. 

[5] Technically the Gaussian distribution is based on the population distribution of one component of velocity such as vx as opposed to speed, which is √(vx2+vy2+vz2), or energy.  Because these properties are all tied together, you’ll often see population distributions based on one or the other or both.  When I speak of a Gaussian-velocity distribution, I am referring to vx, which has both positive and negative values centered around zero for a non-moving system.

[6] In another example, have you ever visited a museum in which a large-scale machine called the Galton board demonstrates how balls cascading down a vertical board containing interleaved rows of pins, bouncing either left or right as they hit each pin, grows into the Normal or Gaussian distribution as a large number of such balls collect in the one-ball wide bins at the bottom?  Such demonstrations are pretty cool to watch unfold and reflect how many sequential 50:50 decisions result in a Gaussian distribution.

[7] Recall that velocity converts to energy per the equation, E = ½ mv2.  When you have an ideal monatomic gas with no interactions, energy is comprised solely of the kinetic energy of the atoms, which is related to velocity by the term ½ mv2.

[8] The assumption that the time average represents the macroscopic state, or the ensemble average, is called the ergodic theorem.

[9] I don’t address in this book the issue of distinguishable versus indistinguishable particles as it unnecessarily raises the complexity of the content beyond my objectives. 

[10] Clausius, R. 1867. The Mechanical Theory of Heat: With Its Applications to the Steam-Engine and to the Physical Properties of Bodies. Edited by Hirst, T. Archer. London: John Van Voorst, p. 365.  The entropy of the world increases to a maximum.  Clausius concluded his Ninth and final memoir with this bold and far-reaching statement.  Gibbs used this statement as the epigraph to his own masterpiece, On the Equilibrium of Heterogeneous Substances.

[11] Carnot, Sadi, E Clapeyron, and R Clausius. 1988. Reflections on the Motive Power of Fire by Sadi Carnot and Other Papers on the Second Law of Thermodynamics by E. Clapeyron and R. Clausius.  Edited with an Introduction by E. Mendoza. Edited by E Mendoza. Mineola (N.Y.): Dover. p. 5.

[12] Technically, entropy is defined at the outset of this scenario even in a state of non-equilibrium.  Once you define the state of the system by defining U-V-N, you define W and thus S.  Even in the non-equilibrium state, the values of U-V-N and thus S are valid.  But it is only in the equilibrium state that the atoms’ locations and velocity distributions match those associated with the most probable state as defined by S. 

[13] The author’s feeling is that teaching entropy by following this cart-before-the-horse history creates some unnecessary challenges.  The concept is indeed challenging enough to grasp without trying to figure it out while also figuring out Carnot’s heat engine.  Perhaps starting with the fundamental probabilistic explanation of entropy and then moving into the consequences, including that the fact that heat only flows from hot to cold, would be more effective.  Discussion of engine limitations would come second in this methodology.

[14] Feynman, Richard Phillips, Robert B. Leighton, Matthew L. Sands, and Richard Phillips Feynman. 1989a. The Feynman Lectures on Physics.  Volume I.  Mainly Mechanics, Radiation, and Heat. Vol. 1. The Feynman Lectures on Physics 1. Redwood City, Calif.: Addison-Wesley, pp. 44-9.

END