← Back to work
/05·Apr 2025 – May 2025·Operating Systems·15 min read

PintOS User Programs & System Calls

User-space program execution, syscalls, and parent-child synchronization on top of PintOS.

  • C
  • x86 Stack Layout
  • Syscalls
  • Semaphores

The second half of the OS-internals sequence: turning the PintOS kernel into something that can actually run user programs. This meant implementing process creation with proper argument passing, building out the system-call interface, validating user-space pointers before dereferencing, and making the wait / exit synchronization race-free.

The hardest part wasn't writing the code — it was making sure that none of the racy edge cases (parent exits before child, child dies during load, signal arrives mid-syscall) leave the kernel in a broken state.

Implementing User Program Execution and System Call Support in PintOS

Introduction

This document describes the design and implementation of Project 2: User Programs for PintOS (CSE 521). The primary objectives are:

  1. Argument Parsing and Stack Setup

    • Parse a command-line string into individual arguments
    • Construct the user stack so that the child process's main(argc, argv) sees the correct layout
  2. Basic System Call Support

    • Implement file descriptor management (open, read, write, close)
    • Implement wait and exit so that a parent can retrieve a child's exit status
    • Synchronize between parent and child during exec, loading, and termination
  3. User-Kernel Memory Safety

    • Validate every user-provided pointer before dereferencing or copying
    • Ensure that invalid or unmapped pointers trigger a safe termination (exit(-1))

Below, we present in-depth details of data structures, algorithms, synchronization mechanisms, and design rationales. All code snippets are in C and assume a Unix/x86-32 calling convention on a 4 KB user stack.


1. Argument Parsing and Stack Construction

1.1 Data Structures

We extended the kernel's thread structure (struct thread) to track information related to argument passing and parent/child synchronization:

/* In thread.h */
struct thread {
    /* … existing fields … */

    int exit_status;                /* Child's exit code (e.g., -1 on error) */
    struct semaphore load_sema;     /* Parent waits here until child finishes loading */
    struct semaphore exit_sema;     /* Parent waits here until child calls exit */
    bool has_parent_waited;         /* Prevent multiple wait() calls on the same child */

    /* … remaining fields … */
};
  • exit_status: Stores the child's exit code. When process_exit() runs, it sets this field before signaling the parent.
  • load_sema: Initialized to 0 in init_thread(). The parent calls sema_down(&child->load_sema) immediately after creating the child thread; the child calls sema_up(&child->load_sema) once it finishes loading the executable (or fails).
  • exit_sema: Initialized to 0. The parent calls sema_down(&child->exit_sema) inside sys_wait. The child calls sema_up(&child->exit_sema) in process_exit after setting exit_status.
  • has_parent_waited: A boolean flag to prevent multiple wait() calls on the same child. Once the parent wakes up from exit_sema, it sets this flag to true, and further calls to wait(child_tid) immediately return –1.

1.2 Algorithms

1.2.1 Tokenization with strtok_r

To split the command line into individual arguments, we use strtok_r for reentrancy and thread safety.

char *save_ptr;
for (char *token = strtok_r(cmdline, " 	
", &save_ptr);
     token != NULL;
     token = strtok_r(NULL, " 	
", &save_ptr)) {
    /* Each `token` is one argument, e.g., "exec", "arg1", "arg2". */
    push_argument_onto_stack(token);
}

Why strtok_r and not strtok?

  • strtok uses a static buffer to track its position, which is not safe if two threads (or nested calls) parse arguments simultaneously.
  • strtok_r takes a caller-supplied save_ptr pointer to store intermediate state, ensuring that each thread's parsing context is independent.

1.2.2 Stack Construction

After splitting the command line into tokens, we must build the user stack so that when the new process begins execution, it sees:

[ argument strings … ]  ← higher addresses
[ word-align padding ]
[ argv pointers (argv[0], argv[1], …, NULL) ]
[ argc ]
[ return address (fake) ]
[ ↓ initial %esp points here ]

Steps:

  1. Push Argument Strings
    • For each argument arg_i (in reverse order), copy the null-terminated string onto the stack.
    • Keep track of each string's user-space address.
    • Example: If arguments are ["a", "b"], push "b�" first, then "a�".
/* Pseudo-code inside load/userstack.c */
size_t total_args = count_tokens(cmdline);
char *arg_addresses[MAX_ARGS];

/* Push strings in reverse order */
for (i = total_args - 1; i >= 0; i--) {
    size_t len = strlen(tokens[i]) + 1;  /* include '�' */
    esp -= len;                          /* grow stack downward */
    memcpy(esp, tokens[i], len);
    arg_addresses[i] = esp;              /* record address in user memory */
}
  1. Word-Align the Stack
    • The x86 calling convention requires %esp to be 16-byte aligned at the time of a function call; however, PintOS does not strictly enforce 16-byte alignment. We align to 4 bytes for simplicity.
    • Compute align = ((uintptr_t)esp) & 0x3; if align != 0, subtract align from esp to reach a multiple of 4.
uintptr_t align = ((uintptr_t)esp) & 0x3;
if (align != 0) {
    esp -= align;
    /* (Contents of these padding bytes are undefined, so no need to fill.) */
}
  1. Push Argument Pointers (argv array)
    • Push the addresses of each argument string (again, in reverse order), followed by a terminating NULL.
    • After this, the stack looks like:
+----------------+  ← esp + 0
|   NULL         |  (argv[argc] = NULL)
+----------------+
|   &arg_str[n]  |  (argv[n-1])
+----------------+
|     ...        |
+----------------+
|   &arg_str[0]  |  (argv[0])
+----------------+
/* Push NULL sentinel */
esp -= sizeof(char *);
*((char **)esp) = NULL;

/* Push addresses of argument strings */
for (i = total_args - 1; i >= 0; i--) {
    esp -= sizeof(char *);
    *((char **)esp) = arg_addresses[i];
}
  1. Push argc and Fake Return Address
    • Push the integer argc.
    • Push a fake return address (0) so that if main ever returns, the process will exit cleanly.
/* Push argc */
esp -= sizeof(int);
*((int *)esp) = total_args;

/* Push fake return address */
esp -= sizeof(void *);
*((void **)esp) = 0;
  1. Overflow Prevention
    • Before pushing each string or pointer, compute the total space required (sum_of_string_lengths + N × sizeof(char*) + sizeof(int) + padding).
    • If stack_bottom - (sum_needed) < PHYS_BASE - PGSIZE, then the command line is too large to fit on the stack. In that case, call exit(-1) (which invokes process_exit and terminates the thread).
size_t sum_strings = 0;
for (i = 0; i < total_args; i++)
    sum_strings += strlen(tokens[i]) + 1;

size_t sum_pointers = (total_args + 1) * sizeof(char *);
size_t overhead = sizeof(int) + sizeof(void *); /* argc + fake ret */
size_t total_needed = sum_strings + sum_pointers + 3 + overhead; /* +3 for worst-case 4-byte align */

if ((PHYS_BASE - PGSIZE) + total_needed > PHYS_BASE) {
    /* Not enough stack space: terminate with error code -1 */
    exit(-1);
}

1.3 Rationale

Why Use strtok_r Instead of strtok?

  • Reentrancy: strtok_r requires a save_ptr local to the caller, so two threads can simultaneously parse different strings without corrupting a global parsing state.
  • Thread Safety: In a multi-threaded kernel, any use of strtok risks undefined behavior if, for example, one thread is preempted in the middle of parsing and another thread calls strtok.

Unix Shell–Style Argument Separation

  • Flexibility: Delegating parsing to a shell allows for richer syntax (e.g., quotes, wildcards). Although PintOS's argument parsing is simpler (space-delimited), we follow the Unix convention to remain compatible with typical user expectations.
  • Security: Keeping parsing logic in user space reduces the kernel's attack surface. A bug in the kernel's argument-parsing code is much harder to mitigate than a shell-level buffer overflow, because the kernel must remain correct at all times.

2. System Calls

This section covers our implementation of basic system calls: exit, wait, and file-related calls (open, read, write, close). We focus on data structures, algorithms for validating user memory, and synchronization between parent and child.

2.1 Data Structures

2.1.1 Thread-Level Changes

/* In thread.h */
#define FD_TABLE_MAX 128

struct thread {
    /* … from Section 1 … */

    struct file *file_descriptors[FD_TABLE_MAX];
    int next_fd;               /* Next available file descriptor (starts at 2) */

    /* … existing fields … */
};
  • file_descriptors: An array of pointers to struct file. Index 0 and 1 are reserved for stdin and stdout, respectively. The rest (2–127) map to open files.
  • next_fd: Initialized to 2. Each time sys_open() returns a new FD, we increment next_fd to find the first unused slot.

2.1.2 Global Filesystem Lock

/* In filesys.c or a global sync file */
struct lock fs_lock;
  • Protects all calls into the file system (e.g., file_open(), file_read(), file_write(), file_close(), filesys_remove()).
  • Ensures that two threads do not corrupt on-disk structures or in-memory data when they concurrently create/remove/lookup files.

2.1.3 File Descriptor Management

  • Each user process maintains its own file_descriptors array. FDs are unique per process:
    • FD 2 in process A refers to a different struct file * than FD 2 in process B.
  • On sys_open(char *filename):
    1. Acquire fs_lock.
    2. Call file_open(filename). If NULL, release fs_lock and return –1.
    3. Otherwise, scan file_descriptors from index 2 to FD_TABLE_MAX – 1 to find the first NULL slot; set that entry to the returned struct file * and return its index.
    4. Release fs_lock.

2.2 Algorithms

2.2.1 Validating User Pointers: require_user_ptr

Before any system call that reads or writes user data—such as sys_read(int fd, void *buffer, unsigned size) or sys_write(int fd, const void *buffer, unsigned size)—we must ensure that all user-supplied pointers:

  1. Are non-NULL.
  2. Are within the user address space (is_user_vaddr(uaddr)).
  3. Are mapped to physical memory (pagedir_get_page(thread_current()->pagedir, uaddr) != NULL).

For multi-byte ranges, we validate each page along the buffer:

void
require_user_ptr(const void *uaddr) {
    if (uaddr == NULL)
        exit(-1);

    if (!is_user_vaddr(uaddr))
        exit(-1);

    if (pagedir_get_page(thread_current()->pagedir, uaddr) == NULL)
        exit(-1);
}

void
validate_buffer(const void *buffer, unsigned size) {
    const uint8_t *byte_ptr = (const uint8_t *)buffer;
    for (unsigned i = 0; i < size; i++) {
        require_user_ptr(byte_ptr + i);
    }
}

Precision vs. Performance

  • In the best case (buffer lies entirely within one page, e.g., addresses 0x1000–0x13FF), only one page-table lookup is needed.
  • In the worst case (buffer spans two pages, e.g., 0x1FFC–0x2003), two lookups.
  • We chose correctness over performance, since a single system call's overhead is dominated by blocking and context switches anyway.

2.2.2 sys_read / sys_write Data Transfer

static int
sys_write(int fd, const void *buffer, unsigned size) {
    struct thread *cur = thread_current();

    /* Validate user buffer pointers */
    validate_buffer(buffer, size);

    if (fd == 1) {
        /* Write to stdout */
        putbuf(buffer, size);
        return size;
    }

    if (fd < 0 || fd >= FD_TABLE_MAX || cur->file_descriptors[fd] == NULL)
        return -1;

    struct file *f = cur->file_descriptors[fd];
    char kbuf[size];

    /* Copy user data into kernel buffer */
    memcpy(kbuf, buffer, size);

    /* Serialize file system operations */
    lock_acquire(&fs_lock);
    int bytes_written = file_write(f, kbuf, size);
    lock_release(&fs_lock);

    return bytes_written;
}

static int
sys_read(int fd, void *buffer, unsigned size) {
    struct thread *cur = thread_current();
    validate_buffer(buffer, size);

    if (fd == 0) {
        /* Read from stdin */
        int count = 0;
        while (count < (int)size) {
            ((char *)buffer)[count++] = input_getc();
        }
        return count;
    }

    if (fd < 0 || fd >= FD_TABLE_MAX || cur->file_descriptors[fd] == NULL)
        return -1;

    struct file *f = cur->file_descriptors[fd];
    char kbuf[size];

    lock_acquire(&fs_lock);
    int bytes_read = file_read(f, kbuf, size);
    lock_release(&fs_lock);

    /* Copy data back to user buffer */
    memcpy(buffer, kbuf, bytes_read);
    return bytes_read;
}
  • Buffer Validation: We call validate_buffer on both buffer and buffer + size − 1 (to ensure the final byte is valid).
  • File System Locking: All calls into the file system are wrapped by lock_acquire(&fs_lock) / lock_release(&fs_lock).
  • Copying Data:
    • For sys_read, copy from kernel space (kbuf) back into user space using memcpy.
    • For sys_write, copy from user space into kbuf.

2.2.3 wait System Call

static int
sys_wait(pid_t child_tid) {
    struct thread *cur = thread_current();
    struct thread *child = find_child_by_tid(cur, child_tid);

    if (child == NULL || child->has_parent_waited)
        return -1;

    /* Parent blocks until child signals exit_sema. */
    sema_down(&child->exit_sema);

    /* Child has exited: retrieve exit_status and mark as waited. */
    int status = child->exit_status;
    child->has_parent_waited = true;
    remove_child_from_list(cur, child);
    return status;
}
  • Finding the Child: We maintain a list of child threads in each parent's struct thread. find_child_by_tid scans this list for child_tid.
  • Preventing Multiple Waits: If child->has_parent_waited is already true, return –1.
  • Blocking on exit_sema: The parent calls sema_down(&child->exit_sema). If the child has already exited, its exit_sema is already up, so sema_down returns immediately. Otherwise, the parent blocks until process_exit does sema_up(&child->exit_sema).
  • Cleanup: After waking up, the parent records child->exit_status, sets child->has_parent_waited = true, and removes the child from the parent's list. Then it returns the exit code.

2.2.4 exit and process_exit

void
process_exit(void) {
    struct thread *cur = thread_current();
    int status = cur->exit_status;  /* Already set by sys_exit or default 0 */

    /* Close all open files */
    for (int fd = 2; fd < FD_TABLE_MAX; fd++) {
        if (cur->file_descriptors[fd] != NULL) {
            file_close(cur->file_descriptors[fd]);
            cur->file_descriptors[fd] = NULL;
        }
    }

    /* Signal parent that we are exiting */
    sema_up(&cur->exit_sema);

    /* Free page directory, other cleanup… */
    /* … existing PintOS cleanup … */
}
  • Setting exit_status:
    • In sys_exit(int status), we set cur->exit_status = status.
    • Then call process_exit().
  • Closing File Descriptors: Iterate through file_descriptors[2..127] and call file_close().
  • Signaling the Parent: sema_up(&cur->exit_sema) unblocks any parent waiting in sys_wait().

2.2.5 Error-Handling Strategy

  • Prevalidation: Every user pointer is validated via require_user_ptr (as shown above). If any check fails, we call exit(-1).
  • Immediate Termination: If any system call encounters an invalid pointer or an unexpected condition (e.g., fd not in range, file descriptor slot empty), we either return an error code (usually –1) or call exit(-1) when the kernel must abort that process.
/* Example from sys_write: */
static int
sys_write(int fd, const void *buffer, unsigned size) {
    require_user_ptr(buffer);                      /* Validate buffer start */
    require_user_ptr(buffer + size - 1);           /* Validate buffer end */

    if (fd == 1) {
        putbuf(buffer, size);
        return size;
    }

    /* … handle other FDs or return –1 … */
}

3. Synchronization

Proper synchronization is critical to prevent race conditions between parent and child threads during exec, loading the executable, and termination.

3.1 exec System Call Synchronization

  1. Parent Creates Child

    • sys_exec(const char *cmdline) calls process_execute(cmdline), which creates a new thread to run start_process.
    • The new thread is given a copy of cmdline so that the parent can continue using its own stack.
  2. Parent Waits for Child to Load

    • Immediately after calling process_execute, the parent calls:
    sema_down(&child->load_sema);
    

    This blocks the parent until the child finishes loading the executable.

  3. Child Loads Executable

    • In start_process(void *cmd_copy), the child attempts to load the ELF binary and set up its initial stack.
    • If loading succeeds, the child sets load_success = true in its struct thread and then calls:
    sema_up(&parent->load_sema);
    

    This wakes the parent.

    • If loading fails, the child sets load_success = false and still calls sema_up(&parent->load_sema) before calling exit(-1).
  4. Parent Receives Load Status

    • After sema_down, the parent checks child->load_success.
    • If false, sys_exec returns –1. Otherwise, it returns the child's tid.

3.2 Parent-Child Synchronization on wait / exit

Case 1: Parent Calls wait Before Child Exits

  1. Parent executes sys_wait(child_tid) and finds child->exit_sema.
  2. Parent calls sema_down(&child->exit_sema) and blocks.
  3. Child eventually calls process_exit(), which does sema_up(&child->exit_sema).
  4. Parent wakes, retrieves exit_status, sets child->has_parent_waited = true, and returns the status.

Case 2: Child Exits Before Parent Calls wait

  1. Child calls process_exit(), performing cleanup and sema_up(&child->exit_sema).
  2. The exit_sema count becomes 1.
  3. When the parent later calls sys_wait(child_tid), the call to sema_down(&child->exit_sema) returns immediately (because the semaphore is already up).
  4. Parent collects exit_status and proceeds as above.

Resource Cleanup and "Zombie" Processes

  • If the parent exits first (e.g., calls exit() before collecting its children's statuses), the child becomes a zombie.
  • We do not implement a reaper thread in PintOS, so these zombies remain until the entire process terminates.
  • When a child calls exit, it closes all open files and releases page tables. After signaling the parent, it calls thread_exit(), which frees the thread's TCB.

4. Detailed Design Rationales

4.1 User Memory Access

  • Strict Validation of Every Byte
    • For multi-byte buffers (e.g., in sys_write), we validate each byte's address to ensure it lies in a user page.
    • This prevents scenarios such as a buffer starting in valid memory but extending into kernel or unmapped pages.
    • Trade-Off: We accepted the performance overhead of repeated page-table lookups in exchange for correctness and security. A single pointer dereference error in the kernel can cause a complete system panic, which is far more disruptive than a slightly slower read or write call.

4.2 File Descriptor Management

  • Array-Based FD Table
    • We chose a fixed-size array of 128 entries (FD_TABLE_MAX) for simplicity.
    • Advantage: O(1) lookup, minimal code complexity.
    • Disadvantage: Limit of 126 user-opened files per process. Should a process need more, it must close existing ones.
    • We considered a dynamically resizable data structure (e.g., a hash table), but that would increase complexity without significant benefit for an educational OS.

4.3 tid_t as pid_t

  • We did identity mapping: the thread ID (tid_t) is used as the process ID (pid_t) in user-space calls.
  • Rationale:
    • Simplifies tracking parent/child relationships.
    • Avoids extra translation tables.
  • Alternative: Maintain a separate namespace for PIDs (e.g., hashed or incremented independently). This complicates bookkeeping without clear benefit in PintOS.

5. Survey Questions

  1. Difficulty

    • Argument Parsing & Stack Alignment: Ensuring correct 4-byte alignment while pushing variable-length strings onto the stack was challenging. We used hex_dump to inspect the stack contents and verify correctness.
    • Synchronization (exec/wait): Coordinating load_sema and exit_sema to avoid deadlocks and zombie processes required careful ordering of sema_down / sema_up calls.
  2. Key Insight

    • Implementing wait and exit demonstrated how modern operating systems manage orphaned processes and resource cleanup. We realized the importance of atomic semaphore operations to ensure that parents do not miss signals or wait indefinitely.
  3. Hints for Future Students

    • Stack Alignment Testing: Use hex_dump() extensively to verify that:
      • Argument strings are in the correct order.
      • The argv array ends with a NULL sentinel.
      • %esp is a multiple of 4 bytes before entering main.
    • Pointer Validation: In require_user_ptr, always check both the start and end of a buffer before performing memcpy(). A single off-by-one error can cause a kernel panic.
  4. TA Suggestions

    • Provide annotated diagrams showing the stack layout for various argument counts (e.g., 1 argument, 5 arguments, 20 arguments).
    • Give examples of edge cases, such as passing exactly 4 KB of argument data, to highlight overflow conditions.
  5. Final Thoughts

    • PintOS's minimal documentation forced us to read the source code of threads/init.c, process.c, and lib/user/syscall.c to understand how interrupts, page tables, and file I/O are implemented. This deep dive solidified our understanding of user–kernel interactions in a simple educational OS.

6. Implementation Notes

  1. Stack Alignment

    • Used a helper macro align_stack_word(uint8_t *esp) that does:
    uintptr_t align = ((uintptr_t)esp) & 0x3;
    if (align != 0)
        esp -= align; /* ensure esp mod 4 == 0 */
    
    • Verified with hex_dump((uint32_t *)esp, 64) after pushing all arguments, before calling intr_exit, to ensure %esp is 4-byte aligned.
  2. Semaphore Initialization

    • In init_thread(), after allocating a new struct thread, we initialized:
    sema_init(&t->load_sema, 0);
    sema_init(&t->exit_sema, 0);
    t->has_parent_waited = false;
    t->exit_status = -1;     /* Default exit code if child crashes before setting */
    
  3. Testing

    • Wrote custom test scripts (in the PintOS test suite) to cover:
      • Edge Cases in Argument Parsing: Exactly 1 argument, exactly 128 arguments (exceeding 4 KB), arguments of length 0 (empty string).
      • Invalid Pointers: Pass a buffer starting at a valid address but extending into unmapped pages to read. Expect exit(-1).
      • Wait/Exit Synchronization: Parent calls wait after child has already exited. Expect immediate return. Child exits before parent calls wait. Expect no deadlock.
      • File Descriptor Limits: Attempt to open more than 126 files in a single process. Expect sys_open to return –1 once the table is full.
    • Used msg() to print debug messages in user programs and hex_dump() to inspect memory.
    • Verified that zombies do not resurrect: once a child calls process_exit, its TCB is freed after the parent collects its status.

Conclusion

This completes the detailed, technically accurate description of Project 2: User Programs for PintOS. We have shown how to:

  • Parse command-line arguments safely and build the initial user stack
  • Implement exit, wait, and basic file I/O system calls with rigorous user-pointer validation
  • Synchronize parent and child threads during exec and wait

With these mechanisms, PintOS can successfully run simple C user programs, handle multiple open files, and ensure user–kernel memory safety.

Next case study

Distributed Network Chat App

Continue