netatalk.io

Dev Docs Caching Architecture

Caching Architecture

Parent page: Architecture Overview

Each afpd worker child maintains its own private, per-process cache hierarchy. There is no shared memory between workers — cross-process coherence is handled by the IPC Cache Hint Relay described below.

Implementation Files:

Component File Description
Directory cache (ARC/LRU) etc/afpd/dircache.c Core cache, eviction, ARC algorithm, hint receiver
Directory cache header etc/afpd/dircache.h Constants, API, rfork cache externs
AD metadata cache etc/afpd/ad_cache.c AppleDouble metadata (org.netatalk.Metadata) + resource fork (org.netatalk.ResourceFork) content caching
AD cache header etc/afpd/ad_cache.h AD cache API
Cache entry struct include/atalk/directory.h struct dir — all cache fields inline
IPC hint wire format include/atalk/server_ipc.h Hint types, payload struct, buffer constants
IPC hint sender/relay/flush libatalk/util/server_ipc.c Child→parent sender, parent buffer+relay, batched flush
Session forking (pipe setup) libatalk/dsi/dsi_getsess.c Creates hint pipe per child

Cache Tier Overview

Tier What Eviction Storage
Dircache struct dir entries (files & dirs) with stat fields ARC (adaptive) or LRU Hash tables + ARC/LRU queues
AD cache AppleDouble metadata (FinderInfo, dates, attributes) inline in struct dir Piggybacks on dircache entry lifecycle; invalidated by ctime change Byte arrays inline in struct dir
Rfork cache Resource fork content buffers Dedicated LRU with byte budget malloc()‘d buffers attached to struct dir
%%{ init: { 'themeVariables': { 'fontSize': '14px' }, 'flowchart': { 'nodeSpacing': 15, 'rankSpacing': 40 } } }%%
graph TB
    classDef blue fill:#74b9ff,stroke:#0984e3,color:#2d3436,rx:10,ry:10
    classDef green fill:#55efc4,stroke:#00b894,color:#2d3436,rx:10,ry:10
    classDef yellow fill:#ffeaa7,stroke:#fdcb6e,color:#2d3436,rx:10,ry:10
    classDef purple fill:#a29bfe,stroke:#6c5ce7,color:#2d3436,rx:10,ry:10
    classDef grey fill:#dfe6e9,stroke:#b2bec3,color:#2d3436,rx:10,ry:10
    classDef cream fill:#f8f4e8,stroke:#d4c5a9,color:#2d3436,rx:10,ry:10
    classDef salmon fill:#fab1a0,stroke:#e17055,color:#2d3436,rx:10,ry:10

    REQ["AFP command<br/>(e.g. FPGetFileDirParms)"]:::blue

    subgraph "Tier 1: struct dir (ARC/LRU dircache)"
        STAT["Cached stat fields<br/>dcache_ctime, dcache_ino,<br/>dcache_mode, dcache_size"]:::green
        AD["Cached AD metadata<br/>dcache_finderinfo[32]<br/>dcache_filedatesi[16]<br/>dcache_afpfilei[4]<br/>dcache_rlen"]:::green
    end

    subgraph "Tier 2: rfork content (byte-budget LRU)"
        RFORK["dcache_rfork_buf<br/>malloc'd resource fork data<br/>budget: rfork_cache_budget bytes"]:::yellow
    end

    DISK_STAT["stat() syscall"]:::grey
    DISK_AD["ad_metadata() → getxattr()"]:::grey
    DISK_RF["ad_read() → read/pread()"]:::grey

    REQ --> STAT
    STAT -->|"hit"| AD
    STAT -->|"miss / validate"| DISK_STAT --> STAT
    AD -->|"hit (rlen ≥ 0)"| RFORK
    AD -->|"miss (rlen = -1)"| DISK_AD --> AD
    AD -->|"no AD (rlen = -2)"| REQ
    RFORK -->|"hit (buf ≠ NULL)"| REQ
    RFORK -->|"miss (buf = NULL)"| DISK_RF --> RFORK

Tier 1: ARC Directory Cache

The directory cache (etc/afpd/dircache.c) implements the Adaptive Replacement Cache (ARC) algorithm from Megiddo & Modha (IBM, 2004). ARC automatically balances between recency and frequency by maintaining four lists:

List Purpose Contents
T1 Recently accessed entries Full struct dir (cached data)
T2 Frequently accessed entries Full struct dir (cached data)
B1 Ghost list from T1 evictions Full struct dir with DIRF_ARC_GHOST flag
B2 Ghost list from T2 evictions Full struct dir with DIRF_ARC_GHOST flag

The adaptation parameter p (target size for T1) shifts automatically: - B1 ghost hit → increase p (favor recency): arc_case_ii_adapt_and_replace() - B2 ghost hit → decrease p (favor frequency): arc_case_iii_adapt_and_replace()

%%{ init: { 'themeVariables': { 'fontSize': '14px' }, 'flowchart': { 'nodeSpacing': 15, 'rankSpacing': 40 } } }%%
graph LR
    classDef blue fill:#74b9ff,stroke:#0984e3,color:#2d3436,rx:10,ry:10
    classDef green fill:#55efc4,stroke:#00b894,color:#2d3436,rx:10,ry:10
    classDef yellow fill:#ffeaa7,stroke:#fdcb6e,color:#2d3436,rx:10,ry:10
    classDef purple fill:#a29bfe,stroke:#6c5ce7,color:#2d3436,rx:10,ry:10
    classDef grey fill:#dfe6e9,stroke:#b2bec3,color:#2d3436,rx:10,ry:10
    classDef cream fill:#f8f4e8,stroke:#d4c5a9,color:#2d3436,rx:10,ry:10
    classDef salmon fill:#fab1a0,stroke:#e17055,color:#2d3436,rx:10,ry:10

    subgraph "ARC Cache (size c)"
        T1["T1: Recent<br/>(target p entries)<br/>LRU → MRU"]:::blue
        T2["T2: Frequent<br/>(c − p entries)<br/>LRU → MRU"]:::green
    end

    subgraph "Ghost Directory (size c)"
        B1["B1: Ghost/Recent<br/>evicted from T1<br/>DIRF_ARC_GHOST"]:::grey
        B2["B2: Ghost/Frequent<br/>evicted from T2<br/>DIRF_ARC_GHOST"]:::grey
    end

    MISS["Cache miss<br/>arc_case_iv()"]:::cream
    HIT["Cache hit<br/>arc_case_i()"]:::cream

    MISS -->|"new entry"| T1
    T1 -->|"2nd access<br/>promote"| T2
    T1 -->|"evict LRU"| B1
    T2 -->|"evict LRU"| B2
    B1 -->|"ghost hit<br/>p increases<br/>Case II"| T2
    B2 -->|"ghost hit<br/>p decreases<br/>Case III"| T2
    HIT -->|"T1 hit"| T2
    HIT -->|"T2 hit → MRU"| T2

ARC Implementation Details

Configuration (from etc/afpd/dircache.h):

Constant Value Description
MIN_DIRCACHE_SIZE 1,024 Minimum cache entries
DEFAULT_DIRCACHE_SIZE 65,536 Default cache entries
MAX_DIRCACHE_SIZE 1,048,576 Maximum cache entries
DIRCACHE_FREE_QUANTUM 256 LRU mode: entries evicted per cycle

Hash indexes — the dircache maintains two parallel hash tables: - CNID index (hash_vid_did): lookup by volume ID + directory/file CNID — FNV-1a hash - DID/name index (hash_didname): lookup by parent DID + Unix filename — SuperFastHash

In ARC mode, hash tables are sized to 2c (c cached entries + c ghost entries per the ARC paper). In LRU mode, sized to c. See dircache_init() in dircache.c.

Ghost entries retain ALL data (full struct dir with DIRF_ARC_GHOST flag set). This differs from the IBM paper’s lightweight ghosts but allows zero-allocation promotion back to T2 via queue_move_to_tail_of(). Ghost entries stay in both hash tables for O(1) lookup.

ARC cases (from the paper, implemented in dircache.c):

Case Trigger Action Function
I Hit in T1 or T2 Move to MRU of T2 arc_case_i()
II Ghost hit in B1 Adapt p↑, REPLACE, promote B1→T2 arc_case_ii_adapt_and_replace()
III Ghost hit in B2 Adapt p↓, REPLACE, promote B2→T2 arc_case_iii_adapt_and_replace()
IV Complete miss Evict if full, add to T1 arc_case_iv()
REPLACE Subroutine Evict LRU of T1→B1 or T2→B2 based on p arc_replace()

LRU fallback: configurable via afp.conf dircache mode option. Uses a single queue with chunk eviction (DIRCACHE_FREE_QUANTUM = 256 entries at a time).

Cache Validation

Cache entries are validated against the filesystem using probabilistic validation to reduce stat() syscalls:

dircache_report_invalid_entry() tracks entries that passed validation but were found stale on actual use — this metric helps tune the validation frequency.

Tier 1b: AD Metadata Cache

AppleDouble metadata is cached inline in each struct dir entry (defined in include/atalk/directory.h):

Field Size AD Entry Description
dcache_finderinfo[32] 32 bytes ADEID_FINDERI Finder file type, creator, flags
dcache_filedatesi[16] 16 bytes ADEID_FILEDATESI Create/modify/backup dates (SERVED values with pre-computed mdate)
dcache_afpfilei[4] 4 bytes ADEID_AFPFILEI AFP file attributes
dcache_rlen off_t Resource fork length -1 = not yet loaded, -2 = confirmed no AD, ≥0 = cached

dcache_rlen State Machine

%%{ init: { 'themeVariables': { 'fontSize': '14px' }, 'flowchart': { 'nodeSpacing': 15, 'rankSpacing': 40 } } }%%
stateDiagram-v2
    [*] --> unloaded: calloc() init
    unloaded --> cached: ad_metadata() → rlen ≥ 0
    unloaded --> no_ad: ad_metadata() → ENOENT
    cached --> unloaded: ctime change → invalidate
    cached --> cached: AFP write → update
    no_ad --> unloaded: ctime change → re-check

    state "dcache_rlen = -1\n(not yet loaded)" as unloaded
    state "dcache_rlen ≥ 0\n(AD cached)" as cached
    state "dcache_rlen = -2\n(no AD file)" as no_ad

Unified AD Access: ad_metadata_cached()

ad_metadata_cached() (in etc/afpd/ad_cache.c) is the single entry point for read-only AD metadata:

On disk read (cache miss): ad_metadata()ad_store_to_cache() saves FinderInfo, FileDatesI, AFPFileI, and rlen into the struct dir. The ad_close() is called internally — callers needing a writable fd must call ad_open() separately.

Pre-computed mdate: ad_store_to_cache() stores max(ad_mdate, st_mtime) in dcache_filedatesi + AD_DATE_MODIFY. This is the served FILPBIT_MDATE value — avoids recomputation on every enumerate hit.

AD Cache Statistics

Logged by log_dircache_stat() in dircache.c on session close:

Counter Description
ad_cache_hits Served from dcache_* fields
ad_cache_misses Fell through to ad_metadata() disk read
ad_cache_no_ad Returned -2 “no AD” without disk I/O

Tier 2: Resource Fork Content Cache

For files with small resource forks (icons, custom metadata), the actual resource fork content is cached in a malloc()‘d buffer attached to struct dir:

%%{ init: { 'themeVariables': { 'fontSize': '14px' }, 'flowchart': { 'nodeSpacing': 15, 'rankSpacing': 40 } } }%%
graph TB
    classDef blue fill:#74b9ff,stroke:#0984e3,color:#2d3436,rx:10,ry:10
    classDef green fill:#55efc4,stroke:#00b894,color:#2d3436,rx:10,ry:10
    classDef yellow fill:#ffeaa7,stroke:#fdcb6e,color:#2d3436,rx:10,ry:10
    classDef grey fill:#dfe6e9,stroke:#b2bec3,color:#2d3436,rx:10,ry:10
    classDef cream fill:#f8f4e8,stroke:#d4c5a9,color:#2d3436,rx:10,ry:10
    classDef salmon fill:#fab1a0,stroke:#e17055,color:#2d3436,rx:10,ry:10

    READ["FPRead (resource fork)"]:::blue
    CHECK{"dcache_rfork_buf<br/>!= NULL?"}:::cream

    subgraph "Rfork Cache Hit"
        HIT["memcpy from<br/>dcache_rfork_buf"]:::green
    end

    subgraph "Rfork Cache Miss"
        AD_CHECK{"dcache_rlen > 0<br/>AND ≤ max_entry?"}:::cream
        DISK["ad_read(adp, eid,<br/>0, buf, rlen)"]:::grey
        BUDGET{"rfork_cache_used<br/>+ rlen ≤ budget?"}:::cream
        EVICT["rfork_cache_evict_to_budget()<br/>LRU head → free oldest"]:::salmon
        STORE["rfork_cache_store_from_fd()<br/>malloc, ad_read, enqueue LRU"]:::yellow
        SKIP["Serve from fd<br/>(not cached)"]:::grey
    end

    READ --> CHECK
    CHECK -->|"hit"| HIT
    CHECK -->|"miss"| AD_CHECK
    AD_CHECK -->|"yes"| BUDGET
    AD_CHECK -->|"too large or rlen ≤ 0"| SKIP
    BUDGET -->|"fits"| STORE
    BUDGET -->|"over budget"| EVICT --> STORE
    DISK -.->|"ad_read inside"| STORE
Field in struct dir Purpose
dcache_rfork_buf malloc()‘d buffer containing rfork data, or NULL
rfork_lru_node Position in dedicated rfork LRU list, or NULL

Invariant: dcache_rfork_buf != NULL implies dcache_rlen >= 0 (Tier 1 AD must be loaded first).

Budget Management

Globals defined in etc/afpd/dircache.c, declared in etc/afpd/dircache.h:

Variable Description
rfork_cache_budget Total byte budget (configured via afp.conf, stored in KB)
rfork_max_entry_size Per-entry size limit (forks larger than this are never cached)
rfork_cache_used Current bytes consumed
rfork_lru_count Number of entries in rfork LRU

API Functions (in etc/afpd/ad_cache.c)

Function Description
rfork_cache_store_from_fd() Reads full fork via ad_read(), allocates buffer, adds to LRU. Validates ad_read() returns exactly dcache_rlen bytes — if not, self-heals by invalidating all AD metadata.
rfork_cache_evict_to_budget() Walks rfork LRU from head (oldest) freeing buffers until rfork_cache_used + needed ≤ rfork_cache_budget
rfork_cache_free() Frees a single entry’s buffer, decrements budget, removes from LRU. Handles invariant violations gracefully.

Rfork Cache Statistics

Counter Description
rfork_stat_lookups Total rfork cache lookups
rfork_stat_hits Served from dcache_rfork_buf
rfork_stat_misses Fell through to ad_read()
rfork_stat_added New entries stored
rfork_stat_evicted Entries evicted by LRU budget pressure
rfork_stat_invalidated Entries invalidated by ctime change or AD write
rfork_stat_used_max Peak bytes consumed (high-water mark)

Cross-Process Cache Invalidation (IPC Hint System)

Since each worker child has its own private dircache, a cross-process hint relay system propagates invalidation signals when one child modifies the filesystem.

Architecture

%%{ init: { 'themeVariables': { 'fontSize': '14px' }, 'flowchart': { 'nodeSpacing': 15, 'rankSpacing': 40 } } }%%
sequenceDiagram
    participant W1 as Worker Child 1<br/>(modifying)
    participant P as afpd Parent<br/>(hint relay)
    participant W2 as Worker Child 2
    participant W3 as Worker Child 3

    Note over W1: AFP command modifies file<br/>(e.g., FPWrite, FPDelete)
    W1->>P: ipc_send_cache_hint()<br/>via IPC socketpair<br/>{REFRESH, vid, cnid}

    Note over P: ipc_server_read() →<br/>ipc_relay_cache_hint()<br/>appends to hint_buf[]

    Note over P: poll() returns (50ms timeout<br/>or hint_buf full)
    P->>P: hint_flush_pending()<br/>priority sort, serialize

    P->>W2: write(hint_pipe)<br/>PIPE_BUF-safe chunks
    P->>W3: write(hint_pipe)<br/>(excludes source W1)

    Note over W2: poll() on hint_pipe →<br/>process_cache_hints()
    W2->>W2: hash_lookup(cnid) →<br/>dir_modify() or dir_remove()

    Note over W3: poll() on hint_pipe →<br/>process_cache_hints()
    W3->>W3: hash_lookup(cnid) →<br/>dir_modify() or dir_remove()

Communication Channels

Each worker child has two channels to the parent (created in dsi_getsession() in libatalk/dsi/dsi_getsess.c):

Channel Type Direction Purpose
IPC socketpair socketpair(PF_UNIX, SOCK_STREAM) Bidirectional Session transfer (send_fd/recv_fd), state updates, cache hints child→parent
Hint pipe pipe() Parent→Child Cache invalidation hint delivery

Why separate channels? The hint pipe is separated from the IPC socketpair to avoid interfering with send_fd()/recv_fd() + SIGURG signaling used for AFP session transfer (primary reconnect). Pipe writes ≤ PIPE_BUF (4096) are POSIX-guaranteed atomic — the 22-byte hint messages always arrive as complete messages.

Wire Format

IPC header (IPC_HEADERLEN = 14 bytes):

[command:2][child_pid:4][uid:4][len:4]

Cache hint payload (struct ipc_cache_hint_payload in include/atalk/server_ipc.h, 8 bytes, __attribute__((packed))):

Field Size Description
event 1 byte Hint type (see below)
reserved 1 byte Alignment padding
vid 2 bytes Volume ID (network byte order)
cnid 4 bytes CNID of affected file/dir (network byte order)

Total message size: 14 + 8 = 22 bytes per hint.

Hint Types

Constant Value Receiver action
CACHE_HINT_REFRESH 0 ostat() + dir_modify(DCMOD_STAT | DCMOD_AD_INV | DCMOD_NO_PROMOTE) — stat refresh + AD invalidation, no ARC promotion
CACHE_HINT_DELETE 1 Direct dir_remove() by CNID
CACHE_HINT_DELETE_CHILDREN 2 dircache_remove_children() on parent dir + remove/refresh parent itself

Sender: ipc_send_cache_hint()

Called directly from AFP command handlers (e.g., afp_write(), afp_delete(), afp_moveandrename()). Writes to the child’s IPC socketpair fd (obj->ipc_fd). Independent of the external FCE system — always active when IPC is available.

Parent Relay: Batching

%%{ init: { 'themeVariables': { 'fontSize': '14px' }, 'flowchart': { 'nodeSpacing': 15, 'rankSpacing': 40 } } }%%
graph TB
    classDef purple fill:#a29bfe,stroke:#6c5ce7,color:#2d3436,rx:10,ry:10
    classDef yellow fill:#ffeaa7,stroke:#fdcb6e,color:#2d3436,rx:10,ry:10
    classDef green fill:#55efc4,stroke:#00b894,color:#2d3436,rx:10,ry:10
    classDef cream fill:#f8f4e8,stroke:#d4c5a9,color:#2d3436,rx:10,ry:10
    classDef cyan fill:#81ecec,stroke:#00cec9,color:#2d3436,rx:10,ry:10
    classDef grey fill:#dfe6e9,stroke:#b2bec3,color:#2d3436,rx:10,ry:10

    IPC["ipc_server_read()<br/>receives hint from child"]:::purple
    RELAY["ipc_relay_cache_hint()<br/>rate check (1000/sec)<br/>append to hint_buf[]"]:::purple
    BUF["hint_buf[128]<br/>accumulating"]:::yellow

    POLL{"poll() returns?"}:::cream
    TIMEOUT["50ms timeout<br/>(ret == 0)"]:::grey
    FULL["buffer full<br/>(count ≥ 128)"]:::grey

    FLUSH["hint_flush_pending()"]:::green
    SORT["Priority sort:<br/>REFRESH → DELETE →<br/>DELETE_CHILDREN"]:::green
    WRITE["Per-child: serialize<br/>skip source_pid<br/>PIPE_BUF chunked write"]:::cyan

    IPC --> RELAY --> BUF
    BUF --> POLL
    POLL -->|"50ms timeout"| TIMEOUT --> FLUSH
    POLL -->|"buffer full"| FULL --> FLUSH
    POLL -->|"fd event, not full"| BUF
    FLUSH --> SORT --> WRITE

Safety Mechanisms

Mechanism Constant Description
Rate limiting HINT_RATE_LIMIT = 1000/sec per child Prevents runaway child from flooding siblings
Pipe atomicity Message ≤ PIPE_BUF POSIX guarantees atomic delivery
Source exclusion source_pid in hint_entry Source child never receives its own hints
Validate before use DCMOD_NO_PROMOTE flag Cross-process hints don’t promote entries in ARC (avoids pollution)
Dead child safety SIGCHLD processed before flush Parent reaps dead children before iterating child table

Receiver: process_cache_hints()

process_cache_hints() (in etc/afpd/dircache.c) runs in the worker child, called from the afp_over_dsi() event loop when the hint pipe has data:

  1. Bulk-reads up to MAX_HINTS_PER_CYCLE (32) × 22 bytes in a single read()
  2. Parses in 22-byte increments (IPC header + payload)
  3. Resolves volume via getvolbyvid(hint.vid)
  4. Uses direct hash_lookup() on the CNID hash table (not dircache_search_by_did() which would remove file entries)
  5. Dispatches on hint.event:
  6. CACHE_HINT_REFRESH: ostat() + dir_modify(DCMOD_STAT | DCMOD_AD_INV | DCMOD_NO_PROMOTE)
  7. CACHE_HINT_DELETE: dir_remove(vol, entry, 0)
  8. CACHE_HINT_DELETE_CHILDREN: dircache_remove_children_defer() + parent ostat/remove

Handles partial reads via a static carry buffer for message boundary alignment.

Hint Statistics

Logged by log_dircache_stat() in dircache.c on session close:

Sender side (per child): - hints_sent: total hints written to IPC socketpair

Parent side (aggregate): - hints_batched: total hints processed through buffer - hints_rate_dropped: hints dropped by rate limiter - flush_count: number of flush cycles

Receiver side (per child): - hints_received: total hints read from pipe - hints_acted_on: hints that matched a local cache entry and triggered invalidation - hints_no_match: hints for entries not in local cache (zero cost — no work done)

Deferred Cleanup (Idle Worker)

Large dircache_remove_children() operations (e.g., after directory rename) can be deferred to an idle worker thread that runs during poll() idle periods:

struct dir Cache Fields

All cache data is stored inline in struct dir (defined in include/atalk/directory.h) for cache locality. Fields grouped by alignment:

Pointer Fields (8-byte)

Field Description
d_fullpath Complete unix path (bstring)
d_m_name Mac name (bstring)
d_u_name Unix name (bstring, may alias d_m_name)
d_m_name_ucs2 Mac name as UCS2
qidx_node Position in ARC/LRU queue
dcache_rfork_buf Tier 2 rfork content buffer (or NULL)
rfork_lru_node Position in rfork LRU list (or NULL)

Time/Size Fields (8-byte)

Field Description
d_ctime Inode ctime (reenumeration)
dcache_ctime Inode ctime (dircache validation)
dcache_ino Inode number (change detection)
dcache_mtime st_mtime modification time
dcache_size st_size file size
dcache_rlen Resource fork length state (-1/-2/≥0)

ID/Flag Fields (4-byte)

Field Description
d_flags DIRF_ISFILE, DIRF_ARC_GHOST, etc.
d_pdid Parent directory CNID
d_did This entry’s CNID
d_vid Volume ID (2 bytes)
d_offcnt Offspring count
dcache_mode st_mode permissions
dcache_uid st_uid owner
dcache_gid st_gid group

Byte Fields (1-byte + arrays)

Field Size Description
arc_list 1 byte ARC list membership: 0=NONE, 1=T1, 2=T2, 3=B1, 4=B2
dcache_afpfilei 4 bytes ADEID_AFPFILEI AFP attributes
dcache_finderinfo 32 bytes ADEID_FINDERI Finder info
dcache_filedatesi 16 bytes ADEID_FILEDATESI dates (served values)

Footnotes

This is a mirror of the Netatalk GitHub Wiki

Last updated 2026-04-06