Pipelines

2.1. Introduction

One of the fundamental concepts in a shell is called the pipeline. It also forms the basis of one of the most significant advances that PowerShell brings to the table. A pipeline is a big name for a simple concept—a series of commands where the output of one becomes the input of the next. A pipeline in a shell is much like an assembly line in a factory: it successively refines something as it passes between the stages, as shown in Example 2-1.

Example 2-1. A PowerShell pipeline
Get-Process | Where-Object { $_.WorkingSet -gt 500kb } | Sort-Object -Descending Name

In PowerShell, you separate each stage in the pipeline with the pipe (|) character.

In Example 2-1, the Get-Process cmdlet generates objects that represent actual processes on the system. These process objects contain information about the process’s name, memory usage, process id, and more. The Where-Object cmdlet, then, gets to work directly with those processes, testing easily for those that use more than 500 kb of memory. It passes those along, allowing the Sort-Object cmdlet to also work directly with those processes, sorting them by name in descending order. This brief example illustrates a significant advancement in the power of pipelines: PowerShell passes full-fidelity objects along the pipeline, not their text representations.

In contrast, all other shells pass data as plain text between the stages. Extracting meaningful information from plain-text output turns the authoring of pipelines into a black art. Expressing the previous example in a traditional Unix-based shell is exceedingly difficult and nearly impossible in cmd.exe.

Traditional text-based shells make writing pipelines so difficult because they require you to deeply understand the peculiarities of output formatting for each command in the pipeline, as shown in Example 2-2.

Example 2-2. A traditional text-based pipeline
lee@trinity:~$ ps -F | awk '{ if($5 > 500) print }' | sort -r -k 64,70
UID        PID  PPID  C    SZ   RSS PSR STIME TTY             TIME CMD
lee       8175  7967  0   965  1036   0 21:51 pts/0       00:00:00 ps -F
lee       7967  7966  0  1173  2104   0 21:38 pts/0       00:00:00 -bash

In this example, you have to know that, for every line, group number five represents the memory usage. You have to know another language (that of the awk tool) to filter by that column. Finally, you have to know the column range that contains the process name (columns 64 to 70 on this system) and then provide that to the sort command. And that’s just a simple example.

An object-based pipeline opens up enormous possibilities, making system administration both immensely more simple and more powerful.

Tags: , , ,