Built-ins, Utilities, Functions, and System Calls Disambiguation (2024-07-25)

The differences between these haven't always been clear to me in the context of Linux. Many shared names also caused me some confusion. I'm assuming this is the case for others and an explanation may serve them.

Core Utilities

Core utilities or coreutils are just sets of programs commonly installed on systems by default. The most common and ubiquitous set of coreutils are are the GNU coreutils. If you're on Linux, you're probably using them. The coreutils are discrete binaries that can each be ran for basic and essential functionality required to use a system. For example, ls and mv are coreutils. There are alternative sets of coreutils such as uutils, which are the gnu ones rewritten in Rust. There is also Busybox which are coreutils reimplemented as a single binary. Busybox is usually found on distributions of Linux like Alpine that are extremely small intended for embedded environments. And other operating systems have their equivalents such as the BSDs. In fact there are some that I personally prefer over gnu counterparts such as doas.

Shell Built-ins

The first bit of confusion arises with the introduction of shell built-ins. A shell being an environment used to interact with the components and indirectly the kernel of the operating system; a built-in is a command that is in the shell itself. It is not its own program or binary and does not exist anywhere on the system. The which command can tell you if a command is built-in or not. which outputs the path of executable files, so if you run a command as an argument and a path isn't output, its built-in.
The printf command is a case where there is likely both a binary installed and a shell built-in (at least in the case of bash). To force the use of the binary one would just run the path to it. For example:

 /usr/bin/printf "Hello World!\n"
Hello World!

To force the use of the built-in the command builtin (which funnily enough is itself built-in) can be ran with the desired built-in command as an argument. For example:

builtin printf "Hello World!\n"
Hello World!

By default the binary is used.

C Functions

You may have noticed that printf shares the name of the c standard function c printf(). At the end of the day almost everything in Linux is ultimately C and the printf command is a kind of wrapper that just invokes the c function to output and format text. glibc is the gnu implementation of the c standard library which is, like the coreutils, on the vast majority of Linux distributions.

System Calls

Like commands that share the names of c functions, they can also share the names of system calls. System calls a c function that utilizes the kernel API to interface with the kernel. There exists c wrapper functions for system calls invoke the system call. Let's use the wait command as an example going down the layers of abstraction from top to bottom.

wait Built-in

The wait command is a built-in that detects changes in the status of a process.

$ sleep 10 &
$ jobs -l
[1] + 18057 Running              sleep 10
$ wait 18057
[1] + Done                       sleep 10
$

wait terminates once the job ends.

wait C Function

The wait() function in the c standard library is what the wait built-in is effectively wrapping.

#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

int main() {
    pid_t pid = fork();

    if (pid == -1) {
        perror("fork");
        exit(EXIT_FAILURE);
    } else if (pid == 0) {
        // Child process
        printf("Child process is running\n");
        sleep(1);  // Simulate some work with sleep
        exit(EXIT_SUCCESS);
    } else {
        // Parent process
        wait(NULL);  // Wait for the child to terminate
        printf("Child process has terminated\n");
    }

    return 0;
}

The above program contains fork() and wait(), two c functions that wrap system calls. This code demonstrates the use of wait() as a c function.

wait System Call

#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/syscall.h>

int main() {
    pid_t pid = fork();

    if (pid == -1) {
        perror("fork");
        exit(EXIT_FAILURE);
    } else if (pid == 0) {
        // Child process
        printf("Child process is running\n");
        sleep(1);  // Simulate some work with sleep
        exit(0);
    } else {
        // Parent process
        syscall(SYS_wait4, -1, NULL, 0, NULL);  // Wait for the child to terminate
        printf("Child process has terminated\n");
    }

    return 0;
}

The above code utilizes syscall() which is a function that accepts system calls themselves as parameters. SYS_wait is the actual system call. And all system calls share the same naming scheme.

System Calls All The Way Down

So every command - whether it be a binary or built-in - issued is ultimately some c code and a system call. And many share the same name! So when you're reading some documentation or a man page, understand the distinctions between them, or how there all parts of a greater whole at times abstracting one another. Then there is no need to be confused.


Some System Calls

  • SYS_read: Read from a file descriptor.
  • SYS_write: Write to a file descriptor.
  • SYS_open: Open a file.
  • SYS_close: Close a file descriptor.
  • SYS_stat: Get file status.
  • SYS_fstat: Get file status of an open file.
  • SYS_lstat: Get file status of a symbolic link.
  • SYS_poll: Wait for some event on a file descriptor.
  • SYS_lseek: Reposition read/write file offset.
  • SYS_mmap: Map files or devices into memory.
  • SYS_mprotect: Set protection on a region of memory.
  • SYS_munmap: Unmap a region of memory.
  • SYS_brk: Change data segment size.
  • SYS_rt_sigaction: Examine and change a signal action.
  • SYS_rt_sigprocmask: Examine and change blocked signals.
  • SYS_ioctl: Control device.
  • SYS_pread64: Read from a file descriptor at a given offset.
  • SYS_pwrite64: Write to a file descriptor at a given offset.
  • SYS_readv: Read data into multiple buffers.
  • SYS_writev: Write data from multiple buffers.
  • SYS_access: Check user's permissions for a file.
  • SYS_pipe: Create a pipe.
  • SYS_select: Synchronously monitor multiple file descriptors.
  • SYS_sched_yield: Yield the processor.
  • SYS_mremap: Remap a virtual memory address.
  • SYS_msync: Synchronize a file with a memory map.
  • SYS_mincore: Determine whether pages are resident in memory.
  • SYS_madvise: Give advice about use of memory.
  • SYS_shmget: Get a System V shared memory segment.
  • SYS_shmat: Attach a System V shared memory segment.
  • SYS_shmctl: Control operations on a shared memory segment.
  • SYS_dup: Duplicate a file descriptor.
  • SYS_dup2: Duplicate a file descriptor to a given descriptor.
  • SYS_pause: Wait for a signal.
  • SYS_nanosleep: High-resolution sleep.
  • SYS_getitimer: Get the value of an interval timer.
  • SYS_alarm: Set an alarm clock for delivery of a signal.
  • SYS_setitimer: Set the value of an interval timer.
  • SYS_getpid: Get process ID.
  • SYS_sendfile: Transfer data between file descriptors.
  • SYS_socket: Create an endpoint for communication.
  • SYS_connect: Initiate a connection on a socket.
  • SYS_accept: Accept a connection on a socket.
  • SYS_sendto: Send a message on a socket.
  • SYS_recvfrom: Receive a message from a socket.
  • SYS_sendmsg: Send a message on a socket.
  • SYS_recvmsg: Receive a message from a socket.
  • SYS_shutdown: Shut down part of a full-duplex connection.
  • SYS_bind: Bind a name to a socket.
  • SYS_listen: Listen for connections on a socket.
  • SYS_getsockname: Get socket name.
  • SYS_getpeername: Get name of connected peer socket.
  • SYS_socketpair: Create a pair of connected sockets.
  • SYS_setsockopt: Set options on sockets.
  • SYS_getsockopt: Get options on sockets.
  • SYS_clone: Create a child process.
  • SYS_fork: Create a child process.
  • SYS_vfork: Create a child process and block parent.
  • SYS_execve: Execute program.
  • SYS_exit: Terminate the calling process.
  • SYS_wait4: Wait for process to change state.
  • SYS_kill: Send a signal to a process.
  • SYS_uname: Get system information.
  • SYS_semget: Get a System V semaphore set.
  • SYS_semop: Operate on a System V semaphore set.
  • SYS_semctl: Control a System V semaphore set.
  • SYS_shmdt: Detach a System V shared memory segment.
  • SYS_msgget: Get a System V message queue.
  • SYS_msgsnd: Send a message to a System V message queue.
  • SYS_msgrcv: Receive a message from a System V message queue.
  • SYS_msgctl: Control operations on a System V message queue.
  • SYS_fcntl: Manipulate file descriptor.
  • SYS_flock: Apply or remove an advisory lock on an open file.
  • SYS_fsync: Synchronize a file's in-core state with storage device.
  • SYS_fdatasync: Synchronize a file's in-core state with storage device.
  • SYS_truncate: Truncate a file to a specified length.
  • SYS_ftruncate: Truncate an open file to a specified length.
  • SYS_getdents: Get directory entries.
  • SYS_getcwd: Get current working directory.
  • SYS_chdir: Change working directory.
  • SYS_fchdir: Change working directory of the calling process.
  • SYS_rename: Change the name or location of a file.
  • SYS_mkdir: Create a directory.
  • SYS_rmdir: Remove a directory.
  • SYS_creat: Create a file.
  • SYS_link: Create a new name for a file.
  • SYS_unlink: Delete a name and possibly the file it refers to.
  • SYS_symlink: Create a symbolic link.
  • SYS_readlink: Read value of a symbolic link.
  • SYS_chmod: Change permissions of a file.
  • SYS_fchmod: Change permissions of an open file.
  • SYS_chown: Change ownership of a file.
  • SYS_fchown: Change ownership of an open file.
  • SYS_lchown: Change ownership of a file, not following symbolic links.
  • SYS_umask: Set file mode creation mask.
  • SYS_gettimeofday: Get time.
  • SYS_getrlimit: Get resource limits.
  • SYS_getrusage: Get resource usage.
  • SYS_sysinfo: Get system information.
  • SYS_times: Get process times.
  • SYS_ptrace: Trace a process.
  • SYS_syslog: Read and/or clear kernel message ring buffer; set console_loglevel.
  • SYS_setuid: Set user identity.
  • SYS_setgid: Set group identity.
  • SYS_setpgid: Set process group ID.
  • SYS_getppid: Get parent process ID.
  • SYS_getpgrp: Get process group ID.
  • SYS_setsid: Create a session and set process group ID.
  • SYS_setreuid: Set real and effective user IDs.
  • SYS_setregid: Set real and effective group IDs.
  • SYS_getgroups: Get list of supplementary group IDs.
  • SYS_setgroups: Set list of supplementary group IDs.
  • SYS_setresuid: Set real, effective, and saved user IDs.
  • SYS_getresuid: Get real, effective, and saved user IDs.
  • SYS_setresgid: Set real, effective, and saved group IDs.
  • SYS_getresgid: Get real, effective, and saved group IDs.
  • SYS_getpgid: Get process group ID.
  • SYS_setfsuid: Set user ID used for filesystem checks.
  • SYS_setfsgid: Set group ID used for filesystem checks.