API Performance Diagnosis & Fix Guide¶
TOP 5 ROOT CAUSES (In Order of Likelihood)¶
1. Synchronous/Blocking HTTP Code ⭐ MOST LIKELY (80% probability)¶
Symptoms: - TIME_WAIT connections > 500 - HTTP connections > 1000 - Throughput exactly matches 100KB response ÷ 270ms
Why:
Your XConnect .NET code uses Task.WhenAll() which blocks while reading responses. With 100KB Agoda responses, this creates:
Fix:
// Use streaming instead of buffering
var response = await httpClient.GetAsync(url, HttpCompletionOption.ResponseHeadersRead);
using var stream = await response.Content.ReadAsStreamAsync();
// Process while reading, don't buffer entire response
2. Small TCP Receive Buffer ⭐ VERY LIKELY (70% probability)¶
Symptoms: - TCP Auto-Tuning disabled - TcpWindowSize < 65535 - Throughput consistently low
Why: Windows default TCP receive buffer (64KB) is smaller than Agoda responses (100KB): - Buffer fills → TCP window shrinks → Agoda slows down → HAProxy waits
Quick Fix:
# Run as Administrator
netsh int tcp set global autotuninglevel=normal
Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters" -Name "TcpWindowSize" -Value 262140
Restart-NetAdapter -Name "Ethernet"
3. HttpClient Anti-Pattern ⭐ LIKELY (60% probability)¶
Symptoms: - Many TIME_WAIT connections - .NET HTTP connections counter > 1000 - Socket exhaustion
Why:
Creating new HttpClient instances instead of reusing:
// BAD - Creates new socket every time
var client = new HttpClient();
var response = await client.GetAsync(url);
Fix:
// GOOD - Use IHttpClientFactory
services.AddHttpClient("AgodaClient")
.SetHandlerLifetime(TimeSpan.FromMinutes(5));
4. CPU/Memory Pressure ⭐ POSSIBLE (40% probability)¶
Symptoms: - CPU > 80% - Available memory < 500MB - High Gen 2 GC collections - Thread count > 500
Why: System can't schedule socket read operations fast enough due to resource contention.
Fix: - Add more CPU/RAM - Optimize application code - Scale horizontally
5. Antivirus Scanning ⭐ POSSIBLE (30% probability)¶
Symptoms: - Windows Defender real-time protection enabled - IOAV protection enabled - Consistent 200-300ms delay (matches scanning time)
Why: Every 100KB HTTP response gets scanned before delivery to application.
Test:
# Disable temporarily (test only!)
Set-MpPreference -DisableRealtimeMonitoring $true
# Run load test
# Re-enable
Set-MpPreference -DisableRealtimeMonitoring $false
📋 DIAGNOSTIC SCRIPTS CREATED¶
1. Comprehensive Diagnostic Script¶
File: /home/monitor/diagnose_api2_slowness.ps1
What it does: - ✅ Tests CPU & memory pressure - ✅ Analyzes application processes - ✅ Checks TCP configuration - ✅ Counts network connections - ✅ Detects antivirus interference - ✅ Measures disk I/O - ✅ Analyzes .NET performance counters - ✅ Tests network throughput
How to run:
# On Windows Server 10.32.8.135
# Copy file from monitoring server
scp monitor@10.32.8.209:/home/monitor/diagnose_api2_slowness.ps1 C:\temp\
# Run as Administrator
powershell -ExecutionPolicy Bypass -File C:\temp\diagnose_api2_slowness.ps1
# Save output
powershell -ExecutionPolicy Bypass -File C:\temp\diagnose_api2_slowness.ps1 | Tee-Object C:\temp\diagnostic_results.txt
Output interpretation: - ❌ RED = Critical issue found - ⚠️ YELLOW = Warning, possible issue - ✓ GREEN = Normal, no issue
2. Quick Fix Script (No Code Changes)¶
File: /home/monitor/fix_api2_tcp_performance.ps1
What it does: - ✅ Backs up current TCP settings - ✅ Enables TCP Auto-Tuning - ✅ Enables TCP Chimney Offload - ✅ Sets CTCP congestion provider - ✅ Increases TCP window size to 256KB - ✅ Enables window scaling (RFC 1323) - ✅ Increases max connections - ✅ Optionally restarts network adapter
How to run:
# On Windows Server 10.32.8.135
# Copy file from monitoring server
scp monitor@10.32.8.209:/home/monitor/fix_api2_tcp_performance.ps1 C:\temp\
# Run as Administrator
powershell -ExecutionPolicy Bypass -File C:\temp\fix_api2_tcp_performance.ps1
Expected result: - Client receive time drops from 270ms → <50ms - Total response time drops from 467ms → <250ms - Throughput increases from 362 KB/sec → 10+ MB/sec
🚀 STEP-BY-STEP FIX PROCESS¶
Phase 1: Immediate (No Downtime)¶
Step 1: Run diagnostic script
Step 2: Review results and identify which of the 5 causes applies
Step 3: Apply TCP fixes (if cause #2 detected)
Step 4: Restart network adapter or wait 60 seconds
Step 5: Monitor HAProxy logs for improvement
# On HAProxy server (10.32.8.209)
tail -f /var/log/haproxy.log | grep "10.32.8.135" | awk '{print $NF}'
# Watch the last number (Tt) - should drop significantly
Phase 2: Application Fix (Requires Deployment)¶
If diagnostic shows cause #1 or #3 (code issues), update XConnect code:
Fix 1: Enable Streaming
// In your HTTP call method
var response = await httpClient.GetAsync(
url,
HttpCompletionOption.ResponseHeadersRead // ← Add this
);
using var stream = await response.Content.ReadAsStreamAsync();
var result = await JsonSerializer.DeserializeAsync<T>(stream);
Fix 2: Use IHttpClientFactory
// In Program.cs or Startup.cs
services.AddHttpClient("AgodaClient", client => {
client.Timeout = TimeSpan.FromSeconds(10);
})
.ConfigurePrimaryHttpMessageHandler(() => new HttpClientHandler {
MaxConnectionsPerServer = 50
})
.SetHandlerLifetime(TimeSpan.FromMinutes(5));
// In your service
public class AgodaService {
private readonly IHttpClientFactory _httpClientFactory;
public AgodaService(IHttpClientFactory httpClientFactory) {
_httpClientFactory = httpClientFactory;
}
public async Task<T> CallAgoda() {
var client = _httpClientFactory.CreateClient("AgodaClient");
// Use client...
}
}
Phase 3: Monitoring (Ongoing)¶
Monitor these metrics:
-
HAProxy logs - Client receive time (should be <50ms)
-
Windows Performance Counters - TCP connections
-
Network throughput
-
.NET HTTP connections (should stay low)
📊 EXPECTED IMPROVEMENTS¶
Before Fix:¶
Backend Response Time (Tr): 197ms ✓
Client Receive Time: 270ms ❌
Total Time (Tt): 467ms ❌
Throughput: 362 KB/sec ❌
Requests >1000ms: 9.9% ❌
After TCP Fix:¶
Backend Response Time (Tr): 197ms ✓
Client Receive Time: <50ms ✓
Total Time (Tt): <250ms ✓
Throughput: 10+ MB/sec ✓
Requests >1000ms: <1% ✓
After Code Fix:¶
Backend Response Time (Tr): 197ms ✓
Client Receive Time: <10ms ✓
Total Time (Tt): <210ms ✓
Throughput: 50+ MB/sec ✓
Requests >1000ms: 0% ✓
🎯 VERIFICATION CHECKLIST¶
After applying fixes, verify:
- Diagnostic script shows no ❌ CRITICAL issues
- TCP Auto-Tuning is "normal" (not disabled)
- TcpWindowSize = 262140
- TIME_WAIT connections < 500
- .NET HTTP connections < 100
- CPU usage < 60%
- Available memory > 1GB
- HAProxy client receive time < 50ms
- HAProxy total time < 250ms
- No timeout errors in application logs
📞 ESCALATION PATH¶
If fixes don't work:
- Collect full diagnostic output and share with dev team
-
Enable detailed .NET logging:
<system.diagnostics> <sources> <source name="System.Net.Http" switchValue="All"> <listeners> <add name="myListener"/> </listeners> </source> </sources> <sharedListeners> <add name="myListener" type="System.Diagnostics.TextWriterTraceListener" initializeData="httpclient.log"/> </sharedListeners> </system.diagnostics> -
Capture network trace:
-
Contact Microsoft Support if OS-level issue suspected
🔗 RELATED FILES¶
- Diagnostic Script:
/home/monitor/diagnose_api2_slowness.ps1 - Quick Fix Script:
/home/monitor/fix_api2_tcp_performance.ps1 - HAProxy Analysis: Previous conversation output
- XConnect Code Analysis:
/home/monitor/XCONNECT_SUPPLIER_RESILIENCE_ANALYSIS.md
📝 SUMMARY¶
Root Cause: Client at 10.32.8.135 is slow to consume 100KB responses from HAProxy (only 362 KB/sec throughput).
Most Likely: Combination of: 1. Synchronous HTTP code blocking socket reads 2. Small TCP receive buffer 3. HttpClient anti-pattern creating socket exhaustion
Quick Fix: Run fix_api2_tcp_performance.ps1 (5 minutes, no downtime)
Long-term Fix: Update XConnect code to use streaming and IHttpClientFactory
Expected Result: 270ms client delay drops to <50ms, total response time drops from 467ms to <250ms
Ready to fix? Run the diagnostic script first to confirm the cause!