← Back to work
/04·Apr 2025 – May 2025·Operating Systems·10 min read

PintOS Thread Scheduler & Priority Donation

Replacing PintOS's busy-wait scheduler with an interrupt-driven, fairness-aware MLFQ.

  • C
  • PintOS Kernel
  • GCC
  • QEMU
  • GDB
  • MLFQ

The first half of an OS-internals two-project sequence built on top of the PintOS educational kernel. I replaced the kernel's basic round-robin scheduler with three significant additions: an interrupt-driven sleep queue, nested priority donation, and a Multi-Level Feedback Queue scheduler driven by fixed-point arithmetic.

By the end, PintOS handles I/O-bound sleeps without burning CPU, prevents priority inversion across chained locks, and dynamically rebalances thread priorities based on observed CPU usage — the same job a real OS scheduler does, just at a smaller scale.

Building a Robust PintOS Thread Scheduler

Alarm Clock, Priority Donation, and MLFQ Scheduling

Introduction

In this comprehensive guide, we'll transform PintOS's basic round-robin thread scheduler into a sophisticated, production-quality scheduling system. Our journey encompasses three critical enhancements:

  1. Alarm Clock: Replace busy-waiting in timer_sleep() with an interrupt-driven system

  1. Priority Scheduling: Implement nested priority donation to prevent priority inversion

  1. MLFQS: Integrate a Multi-Level Feedback Queue Scheduler for dynamic priority adjustment

By the end of this guide, you'll understand how to build a scheduler that efficiently handles I/O-bound sleeps, ensures high-priority threads never starve, and dynamically adapts to workload patterns using fixed-point arithmetic.

Project Overview

Objective

Transform PintOS's thread scheduler with three critical features:

| Feature | Purpose | Key Benefit |
|---------|---------|-------------|
| **Alarm Clock** | Replace busy-waiting with interrupt-driven blocking | Eliminate CPU waste during sleeps |
| **Priority Scheduling** | Implement nested priority donation | Prevent high-priority thread starvation |
| **MLFQS** | Dynamic priority adjustment based on CPU usage | Balance fairness and responsiveness |

Key Challenges

  • Eliminating CPU waste during sleep operations
  • Preventing high-priority thread starvation due to lock contention
  • Balancing fairness and responsiveness in scheduling decisions

1. 🔔 Alarm Clock: Efficient Sleep Implementation

The Busy-Waiting Problem

The original timer_sleep() implementation suffered from a critical inefficiency:

void timer_sleep(int64_t ticks) {
    int64_t start = timer_ticks();
    while (timer_ticks() < start + ticks) {
        /* Busy-wait: keep looping until enough ticks have passed. */
    }
}

Problems with this approach:

  • 100% CPU utilization during seemingly idle periods
  • Wasted cycles - threads perform no useful work while spinning
  • Poor system responsiveness due to unnecessary CPU consumption

Solution: A Sorted Sleep Queue

Our solution maintains a sleep queue sorted by wake-up time, enabling efficient O(1) wake-up checks.

Data Structure

struct sleeping_thread {
    int64_t wakeup_tick;       /* Absolute wake time in ticks */
    struct thread *thread;     /* Thread that is blocked */
    struct list_elem elem;     /* Sorted insertion element */
};

/* Global sleep queue, sorted by wakeup_tick */
static struct list sleeping_threads_list;

Implementation

/* Comparison function for sorting by wakeup_tick */
static bool sleep_cmp_wakeup_tick(const struct list_elem *a,
                                  const struct list_elem *b,
                                  void *aux UNUSED) {
    struct sleeping_thread *sa = list_entry(a, struct sleeping_thread, elem);
    struct sleeping_thread *sb = list_entry(b, struct sleeping_thread, elem);
    return sa->wakeup_tick < sb->wakeup_tick;
}

/* Insert a sleeping_thread into the sorted sleep queue */
static void add_to_sleep_list(struct sleeping_thread *entry) {
    list_insert_ordered(&sleeping_threads_list,
                        &entry->elem,
                        sleep_cmp_wakeup_tick,
                        NULL);
}

/* Enhanced timer_sleep() implementation */
void timer_sleep(int64_t ticks) {
    if (ticks <= 0) return;

    int64_t start = timer_ticks();
    int64_t wakeup_tick = start + ticks;

    struct sleeping_thread *entry = malloc(sizeof(*entry));
    entry->wakeup_tick = wakeup_tick;
    entry->thread = thread_current();

    enum intr_level old_level = intr_disable();
    add_to_sleep_list(entry);
    thread_block();
    intr_set_level(old_level);
}

Wake-Up in Timer Interrupt Handler

void timer_interrupt(struct intr_frame *args UNUSED) {
    ticks++;

    /* Wake up threads whose sleep time has arrived */
    while (!list_empty(&sleeping_threads_list)) {
        struct sleeping_thread *first =
            list_entry(list_front(&sleeping_threads_list),
                       struct sleeping_thread, elem);

        if (ticks >= first->wakeup_tick) {
            list_pop_front(&sleeping_threads_list);
            thread_unblock(first->thread);
            free(first);
        } else {
            /* Since the list is sorted, no other thread is ready */
            break;
        }
    }

    thread_tick();
}

🎉 Key Benefits

  • O(1) Wake-Up Checks: Only inspect the first element in sorted list
  • Zero CPU Usage While Sleeping: Blocked threads removed from ready queue
  • Safe Synchronization: Interrupt disabling ensures atomic operations

2. Priority Scheduling & Nested Donation

The Priority Inversion Problem

Consider this classic scenario:

Timeline: Thread L (priority 30) → Thread M (priority 40) → Thread H (priority 63)

1. Thread L acquires Lock A
2. Thread H attempts to acquire Lock A and blocks
3. Thread M preempts Thread L, preventing Lock A release
4. Thread H waits indefinitely despite having highest priority

Result: High-priority threads starve behind medium-priority threads! 😱

Solution: Nested Priority Donation

When a high-priority thread blocks on a lock held by a lower-priority thread, we donate the high priority to the lock holder. This donation propagates recursively through lock chains.

Enhanced Data Structures

struct thread {
    int priority;              /* Effective priority (may be donated) */
    int ori_priority;          /* Original base priority */
    struct lock *waiting_lock; /* Lock the thread is currently blocked on */
    struct list locks;         /* Locks currently held, sorted by highest waiter priority */
    struct list_elem elem;     /* For insertion into ready list or donation list */
    /* … existing PintOS thread fields … */
};

struct lock {
    struct thread *holder;     /* Current owner of the lock, or NULL */
    struct list waiters;       /* Threads blocked on this lock, sorted by priority */
    struct list_elem lock_elem; /* For insertion into a thread's locks list */
    int priority;              /* Highest priority among threads waiting on this lock */
    /* … existing PintOS lock fields … */
};

Priority Donation Implementation

void donate_priority(struct thread *donor, struct lock *lock) {
    struct thread *holder = lock->holder;
    if (holder == NULL)
        return;

    if (holder->priority < donor->priority) {
        holder->priority = donor->priority;

        /* Recursive donation through lock chains */
        if (holder->waiting_lock != NULL) {
            donate_priority(holder, holder->waiting_lock);
        }
    }
}

Lock Acquisition with Donation

void lock_acquire(struct lock *lock) {
    enum intr_level old_level = intr_disable();
    struct thread *cur = thread_current();

    if (lock->holder != NULL) {
        cur->waiting_lock = lock;
        donate_priority(cur, lock);
        
        /* Insert cur into lock->waiters, highest priority first */
        list_insert_ordered(&lock->waiters,
                            &cur->elem,
                            thread_cmp_priority,
                            NULL);
        thread_block();
        
        /* Execution resumes here after lock_release unblocks us */
        cur->waiting_lock = NULL;
    }

    /* Acquire lock */
    lock->holder = cur;
    list_insert_ordered(&cur->locks,
                        &lock->lock_elem,
                        lock_cmp_priority,
                        NULL);
    cur->priority = max(cur->ori_priority, get_highest_donated_priority(cur));

    intr_set_level(old_level);
}

Lock Release and Priority Restoration

void lock_release(struct lock *lock) {
    enum intr_level old_level = intr_disable();
    struct thread *cur = thread_current();

    /* Remove lock from cur->locks */
    list_remove(&lock->lock_elem);

    if (!list_empty(&lock->waiters)) {
        /* Wake up highest-priority waiter */
        struct list_elem *max_elem = list_pop_front(&lock->waiters);
        struct thread *next = list_entry(max_elem, struct thread, elem);

        lock->holder = next;
        list_insert_ordered(&next->locks,
                            &lock->lock_elem,
                            lock_cmp_priority,
                            NULL);
        next->waiting_lock = NULL;
        thread_unblock(next);
    } else {
        lock->holder = NULL;
    }

    /* Restore or recompute cur's priority */
    int donated = get_highest_donated_priority(cur);
    cur->priority = max(cur->ori_priority, donated);

    /* Preempt if necessary */
    if (!list_empty(&ready_list)) {
        struct thread *highest_ready =
            list_entry(list_front(&ready_list), struct thread, elem);
        if (cur->priority < highest_ready->priority) {
            thread_yield();
        }
    }

    intr_set_level(old_level);
}

Why Nested Donation Matters

Example Scenario:

  • Thread H (priority 63) waits for Lock A → held by Thread M (priority 40)
  • Thread M is blocked on Lock B → held by Thread L (priority 30)

Without nested donation:

  1. H donates 63 to M, but M remains blocked on B
  2. L stays at priority 30, allowing medium-priority threads to preempt
  3. H waits indefinitely!

With nested donation:

  1. H's donation propagates: M becomes priority 63
  2. M donates 63 to L (recursive propagation)
  3. L runs immediately at priority 63, releases B, then A
  4. H can finally proceed!

3. Multi-Level Feedback Queue Scheduler (MLFQS)

Why MLFQS?

Static priority scheduling has limitations:

  • CPU-bound high-priority threads can monopolize the CPU
  • Manual priority tuning burden on programmers
  • No adaptation to changing workload patterns

MLFQS automatically:

  1. Tracks CPU usage (recent_cpu) with exponential decay
  2. Monitors system load (load_avg) reflecting runnable threads
  3. Dynamically adjusts priorities using the formula:
priority = PRI_MAX - (recent_cpu / 4) - (nice × 2)

This BSD-inspired design promotes fairness between CPU-bound and I/O-bound workloads.

Fixed-Point Arithmetic Implementation

Since PintOS kernel cannot use floating-point, we implement 17.14 fixed-point arithmetic (17 integer bits, 14 fractional bits):

typedef int fixed_point_t;

/* Convert integer n to fixed point */
static fixed_point_t int_to_fp(int n) {
    return n << 14;
}

/* Convert fixed-point x to nearest integer (round) */
static int fp_to_int_round(fixed_point_t x) {
    if (x >= 0)
        return (x + (1 << 13)) >> 14;
    else
        return (x - (1 << 13)) >> 14;
}

/* Fixed-point arithmetic operations */
static fixed_point_t fp_add(fixed_point_t x, fixed_point_t y) {
    return x + y;
}

static fixed_point_t fp_sub(fixed_point_t x, fixed_point_t y) {
    return x - y;
}

static fixed_point_t fp_mul(fixed_point_t x, fixed_point_t y) {
    return ((int64_t)x * y) >> 14;
}

static fixed_point_t fp_div(fixed_point_t x, fixed_point_t y) {
    return ((int64_t)x << 14) / y;
}

Enhanced Thread Structure

struct thread {
    int nice;                  /* Niceness: -20 (high priority) .. +20 (low priority) */
    fixed_point_t recent_cpu;  /* Scaled CPU usage (17.14 fixed-point) */
    /* … existing fields … */
};

static fixed_point_t load_avg; /* System load average (17.14 fixed-point) */

Key MLFQS Formulas

1. System Load Average

Updated once per second (every TIMER_FREQ ticks):

load_avg = (59/60) × load_avg + (1/60) × ready_threads
void update_load_avg(void) {
    /* Count ready threads (exclude idle) */
    size_t ready_threads = list_size(&ready_list);
    if (thread_current() != idle_thread)
        ready_threads++;

    fixed_point_t term1 = fp_mul(fp_div(int_to_fp(59), int_to_fp(60)), load_avg);
    fixed_point_t term2 = fp_mul(fp_div(int_to_fp(1), int_to_fp(60)), 
                                 int_to_fp((int) ready_threads));
    load_avg = fp_add(term1, term2);
}

2. Recent CPU Usage

Updated per tick for running thread and per second for all threads:

recent_cpu = (2×load_avg)/(2×load_avg + 1) × recent_cpu + nice
void update_recent_cpu_all(void) {
    struct list_elem *e;
    for (e = list_begin(&all_list); e != list_end(&all_list); e = list_next(e)) {
        struct thread *t = list_entry(e, struct thread, allelem);
        if (t != idle_thread) {
            fixed_point_t coeff = fp_div(fp_mul(int_to_fp(2), load_avg),
                                        fp_add(fp_mul(int_to_fp(2), load_avg),
                                               int_to_fp(1)));
            t->recent_cpu = fp_add(fp_mul(coeff, t->recent_cpu), int_to_fp(t->nice));
        }
    }
}

3. Priority Calculation

Updated every 4 ticks and when threads unblock:

void update_priority(struct thread *t) {
    if (t == idle_thread) return;

    /* Compute recent_cpu / 4, rounding to nearest */
    int recent_div_4 = fp_to_int_round(fp_div(t->recent_cpu, int_to_fp(4)));
    int new_priority = PRI_MAX - recent_div_4 - (t->nice * 2);

    /* Clamp to valid range */
    if (new_priority > PRI_MAX) new_priority = PRI_MAX;
    if (new_priority < PRI_MIN) new_priority = PRI_MIN;

    t->priority = new_priority;
}

Integration into Timer Interrupt

void thread_tick(void) {
    struct thread *cur = thread_current();

    if (thread_mlfqs) {
        /* Increment recent_cpu for running thread */
        if (cur != idle_thread) {
            cur->recent_cpu = fp_add(cur->recent_cpu, int_to_fp(1));
        }

        /* Every second: update load_avg and all threads' recent_cpu */
        if (timer_ticks() % TIMER_FREQ == 0) {
            update_load_avg();
            update_recent_cpu_all();
        }

        /* Every 4 ticks: update running thread's priority */
        if (timer_ticks() % 4 == 0) {
            update_priority(cur);
        }
    }

    /* … existing scheduling logic … */
}

📊 MLFQS Behavior Example

Consider two threads A and B, both with nice = 0:

Initial State:

priority_A = 63 - (0/4) - (0 × 2) = 63
priority_B = 63 - (0/4) - (0 × 2) = 63

Timeline:

| Ticks | Running Thread | recent_cpu_A | recent_cpu_B | priority_A | priority_B | Next Scheduled | |-------|----------------|--------------|--------------|------------|------------|----------------| | 0-7 | A | 8 | 0 | 61 | 63 | B | | 8-11 | B | 8 | 4 | 61 | 62 | B | | 12-15 | B | 8 | 8 | 61 | 61 | Round-robin |

Key Insight: CPU-bound threads accumulate higher recent_cpu, leading to lower priorities and better fairness! 🎯


4. 🔧 Design Trade-Offs and Optimizations

Alarm Clock: Overhead vs. Efficiency

| Pros ✅ | Cons ❌ | |---------|---------| | Sleeping threads consume zero CPU | O(N) insertion into sorted list | | O(1) wake-up checks per tick | Memory allocation/deallocation overhead |

🚀 Optimization Ideas:

  • Use binary min-heap for O(log N) insertion, O(1) peek
  • Embed sleep entries in thread structure to avoid dynamic allocation

Priority Donation: Power vs. Complexity

| Pros ✅ | Cons ❌ | |---------|---------| | Solves priority inversion completely | Recursive donation can be expensive | | Works with nested lock scenarios | Complex bookkeeping for held locks |

🚀 Optimization Ideas:

  • Limit donation depth to prevent unbounded recursion
  • Use max-heap for O(1) highest-donated priority lookup

MLFQS: Fairness vs. Performance

| Pros ✅ | Cons ❌ | |---------|---------| | Automatic workload balancing | Fixed-point arithmetic adds overhead | | No manual priority tuning needed | O(T) updates every second for all threads |

🚀 Optimization Ideas:

  • Stagger per-thread updates across multiple ticks
  • Update priorities only on unblock/yield events

🎉 Conclusion

Our enhanced PintOS scheduler now delivers:

  1. ⚡ Idle-Time Efficiency: Zero CPU usage for sleeping threads via sorted sleep queues
  2. 🔒 Inversion-Resilient: Nested priority donation prevents high-priority thread starvation
  3. 🔄 Dynamic Fairness: MLFQS automatically balances CPU-bound vs. I/O-bound workloads

🔑 Key Takeaways

  • Sorted Wait Queues enable O(1) wake-up performance critical for timer interrupts
  • Nested Donation must correctly propagate through lock chains and restore priorities
  • Fixed-Point Arithmetic allows BSD-style MLFQS without floating-point support
  • Feature vs. Overhead Balance requires careful consideration of complexity costs

With these enhancements, PintOS transforms from a basic round-robin scheduler into a production-grade thread scheduler capable of handling diverse workloads efficiently and fairly! 🚀


📚 References

  • Anderson, T., & Dahlin, M. Operating Systems: Principles and Practice
  • 4.4BSD Scheduler Documentation (load_avg, recent_cpu, priority formulas)
  • PintOS Project Specification: "Project 2: Alarm Clock, Priority Scheduling, and MLFQS"
  • PintOS Source Code (Stanford University)

Built with ❤️ for operating systems education

Next case study

PintOS User Programs & System Calls

Continue