Multithreaded Proxy Server

Complete Architecture & Flow Analysis

🏗️ System Architecture Overview

🌐 Client Layer

Multiple clients sending HTTP requests to proxy server on port 8080

🔄 Proxy Server

Main process handling connections with thread pool and caching

🖥️ Remote Servers

Target web servers that proxy fetches content from

🧵 Thread Pool

Up to 400 concurrent threads managed by semaphore

💾 LRU Cache

200MB cache with 10MB max element size using linked list

🔒 Synchronization

Semaphores for connection control, mutex for cache protection

📊 Detailed System Flow

1
Server Initialization
• Create proxy socket on specified port
• Initialize semaphore (MAX_CLIENTS = 400)
• Initialize cache mutex lock
• Start listening for connections
2
Connection Acceptance
• Main thread accepts client connections in infinite loop
• Extract client IP and port information
• Store socket descriptor in Connected_socketId[] array
• Create new thread for each client
3
Thread Synchronization
• New thread calls sem_wait() on semaphore
• If semaphore count > 0: thread proceeds
• If semaphore count = 0: thread blocks until slot available
• This limits concurrent connections to MAX_CLIENTS
4
HTTP Request Reception
• Receive data from client socket into buffer
• Loop until complete request received (look for \\r\\n\\r\\n)
• Handle partial requests arriving in multiple packets
• Create copy of request for cache key
5
Cache Lookup
• Acquire cache mutex lock
• Search linked list for matching URL
• If found: update LRU timestamp and return cached response
• If not found: proceed to remote server request
6
Request Parsing & Validation
• Parse HTTP request using ParsedRequest_parse()
• Validate HTTP method (only GET supported)
• Check HTTP version (1.0/1.1 accepted)
• Ensure host, path, and version are present
7
Remote Server Connection
• Resolve hostname using gethostbyname()
• Create new socket to remote server
• Connect to remote server on appropriate port
• Modify request headers (add Connection: close, Host header)
8
Request Forwarding & Response Streaming
• Send modified request to remote server
• Receive response in chunks (4KB buffers)
Simultaneously forward to client AND cache
• Dynamically expand cache buffer as needed
9
Cache Storage
• Acquire cache mutex lock
• Check if response size exceeds MAX_ELEMENT_SIZE
• Evict LRU elements until sufficient space available
• Add new cache element at head of linked list
10
Connection Cleanup
• Close remote server socket
• Close client socket
• Free allocated memory buffers
• Release semaphore with sem_post()
• Thread terminates

🔧 Thread Pool Management

Semaphore-Based Connection Control

Thread 1
ACTIVE
Thread 2
ACTIVE
Thread 3
ACTIVE
Thread 4
WAITING
Thread 5
ACTIVE
Thread 6
WAITING
Thread 7
ACTIVE
Thread 8
WAITING
sem_t seamaphore; // Counting semaphore sem_init(&seamaphore, 0, MAX_CLIENTS); // Initialize to 400 // In each thread: sem_wait(&seamaphore); // Acquire slot (blocks if full) // ... handle client request ... sem_post(&seamaphore); // Release slot

💾 Cache Architecture (LRU Implementation)

Linked List Structure

HEAD
Most Recent
time: 1640995200
Node 2
time: 1640995100
Node 3
time: 1640995000
TAIL
Least Recent
time: 1640994900
[EVICT]

Cache Operations

Cache Hit Process:

1. Acquire mutex lock
2. Linear search through linked list
3. Update lru_time_track to current time
4. Release mutex lock
5. Send cached response to client

Cache Miss Process:

1. Forward request to remote server
2. Stream response to client while buffering
3. Acquire mutex lock for cache addition
4. Remove LRU elements until space available
5. Add new element at head
6. Release mutex lock
struct cache_element { char* data; // HTTP response int len; // Data length char* url; // Complete HTTP request (key) time_t lru_time_track; // Last access timestamp cache_element* next; // Next element }; // Cache limits: #define MAX_SIZE 200*(1<<20) // 200MB total cache #define MAX_ELEMENT_SIZE 10*(1<<20) // 10MB per element

🔄 Request Processing States

🟦 WAITING

Thread blocked on semaphore until connection slot available

🟧 RECEIVING

Reading complete HTTP request from client socket

🟪 CACHE_LOOKUP

Searching cache with mutex protection

🟩 CACHE_HIT

Serving response directly from cache

🟥 REMOTE_FETCH

Connecting to remote server and streaming response

🟪 CACHING

Adding response to cache with LRU eviction

⚡ Performance Characteristics

Strengths:

Low Latency: Streaming responses while caching
High Concurrency: 400 simultaneous connections
Memory Efficient: Fixed cache size with LRU eviction
Thread Safe: Proper synchronization primitives

Bottlenecks:

Cache Search: O(n) linear search through linked list
Cache Eviction: O(n) to find LRU element
Serialized Cache Access: Mutex blocks all cache operations
Memory Allocation: Frequent malloc/free operations

🔧 Error Handling & Edge Cases

⚠️
Connection Errors
DNS resolution failures, connection timeouts, socket creation errors → Send HTTP 500 Internal Server Error
⚠️
Malformed Requests
Invalid HTTP syntax, missing headers, unsupported methods → Send HTTP 400 Bad Request or 501 Not Implemented
⚠️
Resource Exhaustion
Memory allocation failures, file descriptor limits → Graceful degradation with error responses