Skip to content

API Performance Diagnosis & Fix Guide


TOP 5 ROOT CAUSES (In Order of Likelihood)

1. Synchronous/Blocking HTTP Code ⭐ MOST LIKELY (80% probability)

Symptoms: - TIME_WAIT connections > 500 - HTTP connections > 1000 - Throughput exactly matches 100KB response ÷ 270ms

Why: Your XConnect .NET code uses Task.WhenAll() which blocks while reading responses. With 100KB Agoda responses, this creates:

100 KB ÷ 270ms = 370 KB/sec ← Exactly what HAProxy sees (362 KB/sec)!

Fix:

// Use streaming instead of buffering
var response = await httpClient.GetAsync(url, HttpCompletionOption.ResponseHeadersRead);
using var stream = await response.Content.ReadAsStreamAsync();
// Process while reading, don't buffer entire response


2. Small TCP Receive Buffer ⭐ VERY LIKELY (70% probability)

Symptoms: - TCP Auto-Tuning disabled - TcpWindowSize < 65535 - Throughput consistently low

Why: Windows default TCP receive buffer (64KB) is smaller than Agoda responses (100KB): - Buffer fills → TCP window shrinks → Agoda slows down → HAProxy waits

Quick Fix:

# Run as Administrator
netsh int tcp set global autotuninglevel=normal
Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters" -Name "TcpWindowSize" -Value 262140
Restart-NetAdapter -Name "Ethernet"


3. HttpClient Anti-Pattern ⭐ LIKELY (60% probability)

Symptoms: - Many TIME_WAIT connections - .NET HTTP connections counter > 1000 - Socket exhaustion

Why: Creating new HttpClient instances instead of reusing:

// BAD - Creates new socket every time
var client = new HttpClient();
var response = await client.GetAsync(url);

Fix:

// GOOD - Use IHttpClientFactory
services.AddHttpClient("AgodaClient")
    .SetHandlerLifetime(TimeSpan.FromMinutes(5));


4. CPU/Memory Pressure ⭐ POSSIBLE (40% probability)

Symptoms: - CPU > 80% - Available memory < 500MB - High Gen 2 GC collections - Thread count > 500

Why: System can't schedule socket read operations fast enough due to resource contention.

Fix: - Add more CPU/RAM - Optimize application code - Scale horizontally


5. Antivirus Scanning ⭐ POSSIBLE (30% probability)

Symptoms: - Windows Defender real-time protection enabled - IOAV protection enabled - Consistent 200-300ms delay (matches scanning time)

Why: Every 100KB HTTP response gets scanned before delivery to application.

Test:

# Disable temporarily (test only!)
Set-MpPreference -DisableRealtimeMonitoring $true
# Run load test
# Re-enable
Set-MpPreference -DisableRealtimeMonitoring $false


📋 DIAGNOSTIC SCRIPTS CREATED

1. Comprehensive Diagnostic Script

File: /home/monitor/diagnose_api2_slowness.ps1

What it does: - ✅ Tests CPU & memory pressure - ✅ Analyzes application processes - ✅ Checks TCP configuration - ✅ Counts network connections - ✅ Detects antivirus interference - ✅ Measures disk I/O - ✅ Analyzes .NET performance counters - ✅ Tests network throughput

How to run:

# On Windows Server 10.32.8.135
# Copy file from monitoring server
scp monitor@10.32.8.209:/home/monitor/diagnose_api2_slowness.ps1 C:\temp\

# Run as Administrator
powershell -ExecutionPolicy Bypass -File C:\temp\diagnose_api2_slowness.ps1

# Save output
powershell -ExecutionPolicy Bypass -File C:\temp\diagnose_api2_slowness.ps1 | Tee-Object C:\temp\diagnostic_results.txt

Output interpretation: - ❌ RED = Critical issue found - ⚠️ YELLOW = Warning, possible issue - ✓ GREEN = Normal, no issue


2. Quick Fix Script (No Code Changes)

File: /home/monitor/fix_api2_tcp_performance.ps1

What it does: - ✅ Backs up current TCP settings - ✅ Enables TCP Auto-Tuning - ✅ Enables TCP Chimney Offload - ✅ Sets CTCP congestion provider - ✅ Increases TCP window size to 256KB - ✅ Enables window scaling (RFC 1323) - ✅ Increases max connections - ✅ Optionally restarts network adapter

How to run:

# On Windows Server 10.32.8.135
# Copy file from monitoring server
scp monitor@10.32.8.209:/home/monitor/fix_api2_tcp_performance.ps1 C:\temp\

# Run as Administrator
powershell -ExecutionPolicy Bypass -File C:\temp\fix_api2_tcp_performance.ps1

Expected result: - Client receive time drops from 270ms → <50ms - Total response time drops from 467ms → <250ms - Throughput increases from 362 KB/sec → 10+ MB/sec


🚀 STEP-BY-STEP FIX PROCESS

Phase 1: Immediate (No Downtime)

Step 1: Run diagnostic script

powershell -ExecutionPolicy Bypass -File C:\temp\diagnose_api2_slowness.ps1 > C:\temp\results.txt

Step 2: Review results and identify which of the 5 causes applies

Step 3: Apply TCP fixes (if cause #2 detected)

powershell -ExecutionPolicy Bypass -File C:\temp\fix_api2_tcp_performance.ps1

Step 4: Restart network adapter or wait 60 seconds

Step 5: Monitor HAProxy logs for improvement

# On HAProxy server (10.32.8.209)
tail -f /var/log/haproxy.log | grep "10.32.8.135" | awk '{print $NF}'
# Watch the last number (Tt) - should drop significantly


Phase 2: Application Fix (Requires Deployment)

If diagnostic shows cause #1 or #3 (code issues), update XConnect code:

Fix 1: Enable Streaming

// In your HTTP call method
var response = await httpClient.GetAsync(
    url,
    HttpCompletionOption.ResponseHeadersRead // ← Add this
);

using var stream = await response.Content.ReadAsStreamAsync();
var result = await JsonSerializer.DeserializeAsync<T>(stream);

Fix 2: Use IHttpClientFactory

// In Program.cs or Startup.cs
services.AddHttpClient("AgodaClient", client => {
    client.Timeout = TimeSpan.FromSeconds(10);
})
.ConfigurePrimaryHttpMessageHandler(() => new HttpClientHandler {
    MaxConnectionsPerServer = 50
})
.SetHandlerLifetime(TimeSpan.FromMinutes(5));

// In your service
public class AgodaService {
    private readonly IHttpClientFactory _httpClientFactory;

    public AgodaService(IHttpClientFactory httpClientFactory) {
        _httpClientFactory = httpClientFactory;
    }

    public async Task<T> CallAgoda() {
        var client = _httpClientFactory.CreateClient("AgodaClient");
        // Use client...
    }
}


Phase 3: Monitoring (Ongoing)

Monitor these metrics:

  1. HAProxy logs - Client receive time (should be <50ms)

    tail -f /var/log/haproxy.log | grep affiliateapi5861 | grep "10.32.8.135"
    

  2. Windows Performance Counters - TCP connections

    Get-Counter '\TCPv4\Connections Established' -Continuous
    

  3. Network throughput

    Get-Counter '\Network Interface(*)\Bytes Received/sec' -Continuous
    

  4. .NET HTTP connections (should stay low)

    Get-Counter '\.NET CLR Networking 4.0.0.0(*)\Connections Established' -Continuous
    


📊 EXPECTED IMPROVEMENTS

Before Fix:

Backend Response Time (Tr):  197ms  ✓
Client Receive Time:         270ms  ❌
Total Time (Tt):             467ms  ❌
Throughput:                  362 KB/sec ❌
Requests >1000ms:            9.9%  ❌

After TCP Fix:

Backend Response Time (Tr):  197ms  ✓
Client Receive Time:         <50ms  ✓
Total Time (Tt):             <250ms ✓
Throughput:                  10+ MB/sec ✓
Requests >1000ms:            <1%   ✓

After Code Fix:

Backend Response Time (Tr):  197ms  ✓
Client Receive Time:         <10ms  ✓
Total Time (Tt):             <210ms ✓
Throughput:                  50+ MB/sec ✓
Requests >1000ms:            0%    ✓

🎯 VERIFICATION CHECKLIST

After applying fixes, verify:

  • Diagnostic script shows no ❌ CRITICAL issues
  • TCP Auto-Tuning is "normal" (not disabled)
  • TcpWindowSize = 262140
  • TIME_WAIT connections < 500
  • .NET HTTP connections < 100
  • CPU usage < 60%
  • Available memory > 1GB
  • HAProxy client receive time < 50ms
  • HAProxy total time < 250ms
  • No timeout errors in application logs

📞 ESCALATION PATH

If fixes don't work:

  1. Collect full diagnostic output and share with dev team
  2. Enable detailed .NET logging:

    <system.diagnostics>
      <sources>
        <source name="System.Net.Http" switchValue="All">
          <listeners>
            <add name="myListener"/>
          </listeners>
        </source>
      </sources>
      <sharedListeners>
        <add name="myListener" type="System.Diagnostics.TextWriterTraceListener" initializeData="httpclient.log"/>
      </sharedListeners>
    </system.diagnostics>
    

  3. Capture network trace:

    # Start packet capture
    netsh trace start capture=yes tracefile=C:\temp\network.etl
    
    # Reproduce issue (1-2 minutes)
    
    # Stop capture
    netsh trace stop
    
    # Analyze with Microsoft Message Analyzer or Wireshark
    

  4. Contact Microsoft Support if OS-level issue suspected


  • Diagnostic Script: /home/monitor/diagnose_api2_slowness.ps1
  • Quick Fix Script: /home/monitor/fix_api2_tcp_performance.ps1
  • HAProxy Analysis: Previous conversation output
  • XConnect Code Analysis: /home/monitor/XCONNECT_SUPPLIER_RESILIENCE_ANALYSIS.md

📝 SUMMARY

Root Cause: Client at 10.32.8.135 is slow to consume 100KB responses from HAProxy (only 362 KB/sec throughput).

Most Likely: Combination of: 1. Synchronous HTTP code blocking socket reads 2. Small TCP receive buffer 3. HttpClient anti-pattern creating socket exhaustion

Quick Fix: Run fix_api2_tcp_performance.ps1 (5 minutes, no downtime)

Long-term Fix: Update XConnect code to use streaming and IHttpClientFactory

Expected Result: 270ms client delay drops to <50ms, total response time drops from 467ms to <250ms


Ready to fix? Run the diagnostic script first to confirm the cause!