5.3 Pipes

1. What are Pipes?

In computing, a pipe is a mechanism used for inter-process communication (IPC), which allows data to be passed from one process to another. A pipe acts as a communication channel, enabling the output of one process (typically the write end) to become the input of another process (typically the read end).

Pipes are commonly used in Unix-like operating systems (such as Linux and macOS), and they can be a powerful tool for building complex, multi-process programs.

1.1 Types of Pipes

Anonymous Pipes:
- Anonymous pipes are the simplest form of pipes. They are typically used for communication between related processes (like a parent process and a child process).
- They are created using the pipe() system call in Unix-like systems.
- These pipes are unidirectional, meaning data flows in one direction only — either from the write end to the read end, or vice versa.
- Anonymous pipes are generally used for communication between processes that share a common ancestor, like a parent and a child process.
Named Pipes (FIFOs):
- Named pipes (also called FIFOs, which stands for First In, First Out) are similar to anonymous pipes but are persistent and can be accessed via a specific filename in the filesystem.
- They are not limited to communication between related processes, and they can be used between unrelated processes.
- Named pipes are created using the mkfifo() system call and are typically represented as files in the file system.

Example of a Named Pipe

Any process can write and read into the pipe. Here is an example on the bash terminal:

mkfifo my_pipe
gzip -c < my_pipe >> out.gz &

Command mkfifo creates the named pipe file called my_pipe. Afterwards, we run the command gzip -c to run in the background. This gzip is used to zip any data that is coming from my_pipe and continuously appending the result into file out.gz. Now, we can try to dump any data into the pipe such as:

cat file > my_pipe

Since gzip is still running in the background, it will automatically read the data and zip it, followed by appending it to out.gz. We can remove the pipe anytime with command: rm my_pipe

1.2 Key Concepts

Pipe System Calls:
- pipe(): Creates a pipe, providing two file descriptors — one for reading and one for writing.
- read(): Reads data from the read end of the pipe.
- write(): Writes data to the write end of the pipe.
- close(): Closes a file descriptor (read or write end).
- dup2(): Redirects a file descriptor to another, often used to redirect input/output to/from pipes.
- mkfifo(): Creates a named pipe (FIFO) in the filesystem.
Unidirectional Communication:
- Pipes are typically unidirectional, meaning data flows from the write end to the read end. However, you can create multiple pipes for bidirectional communication between two processes.
File Descriptors:
- A pipe is represented by two file descriptors:
  - One for reading (pipefd[0]).
  - One for writing (pipefd[1]).
- These file descriptors are used to interact with the pipe in the same way you interact with regular files, through read(), write(), and close() system calls.

1.3 How Pipes Work

When a pipe is created, it essentially creates a buffer between the two processes. The write-end of the pipe writes data into the buffer, and the read-end reads data from the buffer. This process happens in a first-in, first-out (FIFO) manner.

Example Workflow

Process A (parent) writes data to the pipe:
- It uses the write-end (pipefd[1]) to write data to the pipe.
Process B (child) reads data from the pipe:
- It uses the read-end (pipefd[0]) to read the data written by Process A.

The operating system handles the transfer of data from the write-end to the read-end of the pipe. If there is no data to read, the read() call will block (wait) until there is something to read.

1.4 Example of Using Pipes in C (Anonymous Pipe)

Here is an example where we create a pipe, fork a child process, and have the child process write data to the pipe while the parent reads it.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main() {
    int pipefd[2];
    pid_t pid;

    // Create a pipe
    if (pipe(pipefd) == -1) {
        perror("pipe");
        exit(EXIT_FAILURE);
    }

    // Fork a child process
    pid = fork();

    if (pid == -1) {
        perror("fork");
        exit(EXIT_FAILURE);
    } else if (pid == 0) {
        // Child process: Write to pipe
        close(pipefd[0]);  // Close unused read end
        write(pipefd[1], "Hello from child!", 18);
        close(pipefd[1]);  // Close write end after use
        exit(EXIT_SUCCESS);
    } else {
        // Parent process: Read from pipe
        close(pipefd[1]);  // Close unused write end
        char buffer[100];
        read(pipefd[0], buffer, sizeof(buffer));
        printf("Parent received: %s\n", buffer);
        close(pipefd[0]);  // Close read end after use
        wait(NULL);  // Wait for child to finish
    }

    return 0;
}

Explanation of the Example:

Pipe Creation: The pipe() system call creates a pipe, resulting in two file descriptors (pipefd[0] for reading and pipefd[1] for writing).
Forking: The fork() system call creates a child process.
Child Process:
- The child process writes a message ("Hello from child!") to the pipe using the write end (pipefd[1]).
- After writing, it closes the write end and exits.
Parent Process:
- The parent process reads the data from the pipe using the read end (pipefd[0]).
- It then prints the data to the terminal and waits for the child to finish.

1.5 Use Cases for Pipes

Command Piping in Shells:
- Pipes are commonly used in command-line environments (such as Unix/Linux shells) to chain commands together. For example:
  Terminal window
```
ls | grep "dev"
```
  This command lists the contents of the current directory and pipes the output to the grep command to filter the results.
Process Communication:
- Pipes can be used for communication between a parent and child process, or between any two processes, as long as they are related and share the pipe.
Data Streaming:
- Pipes can be used in systems that require streaming data between processes, such as logging systems, real-time data processing, etc.

1.6 Advantages and Limitations of Pipes

Advantages

Simple: Pipes are a simple and efficient method of communication between processes.
Speed: They are faster than many other IPC mechanisms like message queues or shared memory for small amounts of data.
Built-In: Available natively in Unix-like operating systems and easy to implement using system calls.

Limitations

Unidirectional: Standard pipes are unidirectional. For bidirectional communication, you need two pipes or another IPC method.
Buffer Size: Pipes typically have a limited buffer size (e.g., 4 KB or 64 KB), which may result in blocking if the buffer is full.
Related Processes Only: Anonymous pipes are typically used only for communication between related processes (parent-child), not unrelated processes.

2. Sample Program

Write a C program that uses fork, exec and pipe to perform the equivalent of the shell command: [user@pc]$ ls /dev | head -25

The following C program demonstrates inter-process communication (IPC) using pipes. It creates a pipe, forks a child process, and redirects the output of the ls command (which lists files in the /dev directory) to the parent process via the pipe. The parent then reads the first 25 lines of the output from the pipe and prints them to the terminal.

2.1 Header Files

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>

stdio.h: Provides functions like printf() for printing to the console.
stdlib.h: Includes functions like exit() for terminating the program with a status code.
unistd.h: Provides system calls such as pipe(), fork(), and dup2().
sys/types.h: Defines data types like pid_t, which is used for process IDs.
sys/wait.h: Defines macros for waiting on child processes, specifically wait().

2.2 Create a Pipe

int pipefd[2];
if (pipe(pipefd) == -1) {
    perror("pipe");
    exit(EXIT_FAILURE);
}

pipe(pipefd) creates a pipe with two file descriptors:
- pipefd[0] is the read end of the pipe.
- pipefd[1] is the write end of the pipe.
If pipe() fails (e.g., due to resource limitations), it prints an error and exits the program.

2.3 Fork a Child Process

pid = fork();
if (pid == -1) {
    perror("fork");
    exit(EXIT_FAILURE);
} else if (pid == 0) {
    // Child process code
} else {
    // Parent process code
}

fork() creates a new process:
- The parent process receives the child’s PID (a positive integer).
- The child process receives 0.
If fork() fails, it prints an error and exits the program.

2.4 Child Process

if (pid == 0) {
    // Child process

    // Redirect stdout to the write end of the pipe
    if (dup2(pipefd[1], STDOUT_FILENO) == -1) {
        perror("dup2");
        exit(EXIT_FAILURE);
    }

    // Close unused read end of the pipe
    close(pipefd[0]);

    // Execute the command "ls /dev" using exec
    char *args[] = {"ls", "/dev", NULL};
    execv("/bin/ls", args);

    // If exec fails
    perror("exec");
    exit(EXIT_FAILURE);
}

Redirecting stdout:
- The dup2() system call redirects the child’s standard output (STDOUT_FILENO, file descriptor 1) to the write end of the pipe (pipefd[1]).
- After this, any output the child writes to stdout will go into the pipe instead of to the terminal.
Close Unused Read End:
- The child process closes pipefd[0], as it will not be using the read end of the pipe.
Execute the Command:
- execv("/bin/ls", args) replaces the child process with the ls command that lists files in the /dev directory.
- execv() runs the command specified (in this case, ls /dev), and the child process no longer exists in its original form after execution.
- If execv() fails (e.g., because the executable is not found), the child process prints an error message and exits.

2.5 Parent Process

else {
    // Parent process

    // Close unused write end of the pipe
    close(pipefd[1]);

    // Read from the read end of the pipe and print the first 25 lines
    char buffer;
    int count = 0;

    while (count < 25 && read(pipefd[0], &buffer, sizeof(buffer)) != 0) {
        printf("%c", buffer);
        if (buffer == '\n') {
            count++;
        }
    }

    // Close the read end of the pipe
    close(pipefd[0]);

    // Wait for the child process to finish
    wait(NULL);
}

Close Unused Write End:
- The parent closes pipefd[1] because it will only be reading from the pipe (not writing).
Reading from the Pipe:
- The parent reads characters from the pipe using read(pipefd[0], &buffer, sizeof(buffer)). The program continues reading characters until 25 newlines (\n) are encountered, effectively printing the first 25 lines of output from the ls /dev command.
Close the Read End:
- After reading, the parent closes pipefd[0] to clean up resources.
Waiting for the Child:
- The parent calls wait(NULL) to wait for the child process to finish executing before it terminates. This prevents the parent from finishing before the child and avoids leaving a “zombie” child process.

2.6 Program Execution Flow

Step 1: The parent creates a pipe.
Step 2: The program calls fork() to create a child process.
Step 3 (Child Process):
- The child redirects its output to the pipe using dup2().
- It then executes the ls /dev command using execv(). The output of ls /dev will be written into the pipe.
Step 4 (Parent Process):
- The parent closes the write end of the pipe, then reads the output from the pipe.
- It prints the first 25 lines of output from ls /dev.
Step 5: The parent waits for the child process to finish using wait(), ensuring that both processes terminate cleanly.

Key Concepts

Pipe: A pipe is used to enable communication between processes. One process writes to the pipe, and the other reads from it.
Forking: The fork() system call creates a child process. The parent and child processes then execute in parallel.
Redirection: The dup2() system call redirects the standard output of the child process to the write end of the pipe.
Exec: The execv() system call replaces the child process’s image with the specified command, in this case, ls /dev.
IPC (Inter-Process Communication): Pipes are used for communication between the parent and child processes. The child writes output to the pipe, and the parent reads from it.

Example Output

When the program runs, it will print the first 25 lines of the output of ls /dev. The /dev directory contains device files, so you will likely see entries such as:

tty
tty0
tty1
tty2
...

(Note: The actual output will depend on your system’s /dev directory.)

2.7 Complete C Code

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>

int main() {
    int pipefd[2];
    pid_t pid;

    // Create a pipe
    if (pipe(pipefd) == -1) {
        perror("pipe");
        exit(EXIT_FAILURE);
    }

    // Fork a child process
    pid = fork();

    if (pid == -1) {
        perror("fork");
        exit(EXIT_FAILURE);
    } else if (pid == 0) {
        // Child process

        // Redirect stdout to the write end of the pipe
        if (dup2(pipefd[1], STDOUT_FILENO) == -1) {
            perror("dup2");
            exit(EXIT_FAILURE);
        }

        // Close unused read end of the pipe
        close(pipefd[0]);

        // Execute the command "ls /dev" using exec
        char *args[] = {"ls", "/dev", NULL};
        execv("/bin/ls", args);

        // If exec fails
        perror("exec");
        exit(EXIT_FAILURE);
    } else {
        // Parent process

        // Close unused write end of the pipe
        close(pipefd[1]);

        // Read from the read end of the pipe and print the first 25 lines
        char buffer;
        int count = 0;

        while (count < 25 && read(pipefd[0], &buffer, sizeof(buffer)) != 0) {
            printf("%c", buffer);
            if (buffer == '\n') {
                count++;
            }
        }

        // Close the read end of the pipe
        close(pipefd[0]);

        // Wait for the child process to finish
        wait(NULL);
    }

    return 0;
}

3. STDOUT vs STDOUT_FILENO

3.1 What is `STDOUT_FILENO`?

STDOUT_FILENO is a file descriptor constant in C that represents the standard output stream. It is typically used in system calls (like write() or dup2()) to refer to the file descriptor for standard output.

The value of STDOUT_FILENO is usually 1 (though it is defined as a macro in system headers). In Unix-like systems, standard output corresponds to file descriptor 1. So when you use STDOUT_FILENO, you’re working with the file descriptor for standard output.

3.2 Is `STDOUT_FILENO` the same as `STDOUT`?

No, STDOUT_FILENO is not exactly the same as STDOUT. Here’s how they differ:

STDOUT_FILENO:
- It is a file descriptor (an integer value, typically 1).
- It is used in low-level system calls like write(), dup2(), or close(), which work with file descriptors.
- It represents the standard output stream at the file descriptor level.
For example:
```
write(STDOUT_FILENO, "Hello, World!\n", 14);
```
This uses the file descriptor to write directly to the standard output.
STDOUT:
- It is typically defined as a FILE * (a pointer to a FILE object).
- It is used in high-level standard I/O functions like fprintf(), fputs(), or fscanf(), which work with FILE * pointers.
- It represents the standard output stream as a FILE object, which is buffered.
For example:
```
fprintf(stdout, "Hello, World!\n");
```
Here, stdout is used with the fprintf() function, which writes to a FILE *.