Skip to main content

12 posts tagged with "Bug Fix"

View all tags

Airflow PostgresHook Multi-Statement SQL Silently Drops Results? Split by Semicolon and Execute One by One

ยท 6 min read

When an Airflow DAG reads a .sql template file as a single string and passes it to PostgresHook.get_pandas_df(), prior SELECT results are silently dropped โ€” the DAG reports "SQL query returned no results", but copying the same SQL into psql returns data normally.

Encountered this while building AI Analytics โ€” an LLM-powered analytics pipeline where an Airflow DAG reads multi-query report templates from .sql files and executes them.

TL;DRโ€‹

PostgresHook.get_pandas_df(sql) internally calls pandas.read_sql(sql, conn) โ†’ psycopg2 cursor.execute(sql). When sql is a single string with multiple ;-separated SELECTs, the DBAPI only exposes the cursor of the last result set โ€” prior query results are silently dropped with no error. Fix: split by top-level semicolons into a list[str] and call get_pandas_df per statement, or pass the list directly so DbApiHook runs them sequentially.

Symptomโ€‹

The DAG task executing shop_monthly_overview.sql reports "SQL query returned no results":

sql_count = 1   โ† template clearly contains 4 queries
result = "โŒ SQL query returned no results"

But the same SQL pasted into psql against the same database with the same parameters returns data for all 4 SELECTs.

Reproductionโ€‹

Verify get_pandas_df behavior with multi-statement SQL directly inside the Airflow container:

from airflow.providers.postgres.hooks.postgres import PostgresHook

hook = PostgresHook(postgres_conn_id="postgres_default")

# Three SELECTs concatenated into one string
sql = "SELECT 1 AS a; SELECT 2 AS b; SELECT 99 AS c WHERE 1=0;"

df = hook.get_pandas_df(sql)
print(df.columns.tolist()) # ['c'] โ† only got the last statement's columns
print(df) # Empty โ† the last statement itself returns 0 rows

Expected three result sets, got only the last one (SELECT 99 ... WHERE 1=0, 0 rows). The first two completely disappear with no error or warning.

Root Causeโ€‹

The call chain is PostgresHook.get_pandas_df โ†’ DbApiHook.get_pandas_df โ†’ pandas.io.sql.read_sql โ†’ psycopg2 cursor.execute(sql).

The DBAPI protocol (PEP 249) allows execute to accept a string with multiple ;-separated statements. PostgreSQL executes all of them, but the cursor only exposes the last result set โ€” this is inherent PostgreSQL wire protocol behavior, not an Airflow or pandas bug.

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ SELECT 1; โ† executed, result set 1 dropped at once โ”‚
โ”‚ SELECT 2; โ† executed, result set 2 dropped at once โ”‚
โ”‚ SELECT 99 WHERE 1=0; โ† executed, result set 3 exposed โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ†“
pandas.read_sql only fetches result set 3

The root cause in our code is task_execute_sql reading the entire .sql file as a single string and passing it to get_pandas_df:

# โŒ Problematic code
sql_text = open(sql_path).read() # full text with 4 SELECTs
df = pg_hook.get_pandas_df(sql_text) # only gets the last result

Why does psql return data? Because the psql frontend actively iterates through all result sets and prints them one by one, while a DBAPI cursor does not.

Solutionโ€‹

Suitable for .sql template files โ€” they contain comments, quotes, and multiple queries that need robust splitting.

def split_sql_statements(sql: str) -> list:
"""
Split SQL by top-level semicolons, correctly handling:
- Semicolons inside single-quoted strings ('a;b' is not split)
- SQL-standard '' escape ('it''s' is not split)
- Semicolons inside -- line comments (-- note; not split is not split)
"""
statements = []
buf = []
i, n = 0, len(sql)
in_quote = False

while i < n:
ch = sql[i]

# Inside a single-quoted string
if in_quote:
buf.append(ch)
if ch == "'":
# '' = literal single quote, does not end the string
if i + 1 < n and sql[i + 1] == "'":
buf.append(sql[i + 1])
i += 2
continue
in_quote = False
i += 1
continue

# Top level
if ch == "'":
in_quote = True
buf.append(ch)
elif ch == '-' and i + 1 < n and sql[i + 1] == '-':
# Line comment, swallow to end of line (; inside is not a split point)
while i < n and sql[i] != '\n':
buf.append(sql[i])
i += 1
continue
elif ch == ';':
stmt = ''.join(buf).strip()
if stmt:
statements.append(stmt)
buf = []
i += 1
continue
else:
buf.append(ch)
i += 1

# Trailing block without a final semicolon
stmt = ''.join(buf).strip()
if stmt:
statements.append(stmt)

return statements


# Caller
sql_text = open(sql_path).read()
statements = split_sql_statements(sql_text)

# Execute one by one, collect all results
all_results = []
for idx, stmt in enumerate(statements, start=1):
df = pg_hook.get_pandas_df(stmt)
if not df.empty:
all_results.append({
"sql_index": idx,
"sql": stmt,
"data": df.to_dict("records"),
"columns": df.columns.tolist(),
"row_count": len(df),
})

Option B: Pass a list directly to DbApiHookโ€‹

Airflow's DbApiHook.run and get_records accept list[str] and execute sequentially โ€” but get_pandas_df return behavior in list mode is inconsistent across providers. For production, Option A gives you full control.

Why not sqlparse.split?โ€‹

Community answers often recommend sqlparse.split(sqlparse.format(sql, strip_comments=True)), but strip_comments=True discards comments. If your downstream processor depends on metadata in comments (e.g. -- dimension: shop), you lose context. A hand-rolled splitter preserves the original comment text and gives you control.

Caveatsโ€‹

Caveats

  • Do not use sql.split(';') โ€” it will mis-cut semicolons inside quoted strings like WHERE name = 'a;b', and inside -- comment; line comments
  • split_sql_statements only handles single-quoted strings and -- line comments; if your SQL uses /* block comments */ or dollar-quoted strings ($$...$$), extend the splitter
  • After the fix, the semantics of sql_index for downstream processors change (1-based sequential index); audit all df.iloc[sql_index] style usages
  • If your SQL is program-generated rather than file-read, the safer pattern is to build a list at generation time rather than split later
  • A related trap: if you've also hit issues with SQL expressions being silently parameterized in Drizzle ORM, see Drizzle sql template mixing parameterized values with SQL expressions โ€” same family of "the framework did a transformation you didn't expect" bugs

FAQโ€‹

How do I execute multiple SQL statements in Airflow PostgresHook?โ€‹

Pass list[str] instead of a single string. DbApiHook.get_pandas_df and run accept sql as a list and execute sequentially; a single string with semicolon-separated statements causes psycopg2 to return only the last result set. For production, split yourself and call per-statement so you control result aggregation and sql_index.

Why does get_pandas_df only return the last result for multi-statement SQL?โ€‹

pandas.io.sql.read_sql calls psycopg2 cursor.execute with the full string; the DBAPI protocol only exposes the cursor of the last result set for multi-statement execution, and prior SELECT results are dropped by the server immediately, with no error or warning. psql returns data because its frontend actively iterates all result sets, while a DBAPI cursor does not.

How do I split SQL by semicolon safely with comments and quotes?โ€‹

Scan character by character and split only at top-level semicolons outside single-quoted strings and -- line comments. Single-quote literals use the SQL-standard '' escape; do not use str.split(';'), it will mis-cut semicolons inside comments and strings. If you use sqlparse.split, note that strip_comments=True discards the original comment text.


CCLEE

Independent developer, 24 years in e-commerce, focused on grounding AI in real business scenarios.

Work with me

WordPress REST API Image Upload Returns 405? Check Your Hostinger CDN

ยท 4 min read

While building a WooCommerce product import tool for a client, POST /wp-json/wp/v2/media would succeed for the first few images, then suddenly return 405 Not Allowed for all subsequent requests.

TL;DRโ€‹

Hostinger CDN (hcdn) blocks POST /wp-json/wp/v2/media requests by default. The response headers server: hcdn and x-hcdn-request-id are the smoking gun. Disable CDN or contact Hostinger support to whitelist /wp-json/* POST requests.

The Problemโ€‹

Uploading images to WordPress Media Library via REST API:

curl -X POST 'https://example.com/wp-json/wp/v2/media' \
-u 'user:app_password' \
-H 'Content-Disposition: attachment; filename="product-01.jpg"' \
-H 'Content-Type: image/jpeg' \
--data-binary @image.jpg

The first 2-4 images return 201 Created, then all subsequent requests fail with:

<html>
<head><title>405 Not Allowed</title></head>
<body>
<center><h1>405 Not Allowed</h1></center>
<hr><center>nginx</center>
</body>
</html>

This "partial success" pattern is misleading โ€” it looks like rate limiting, but the real cause is entirely different.

Root Causeโ€‹

Using curl -v to inspect the full response headers revealed:

< HTTP/2 405
< server: hcdn
< x-hcdn-request-id: cfc5ad1198938cd9f1e02ce71ed0ae61-kul-edge1

Key findings:

  • server: hcdn โ€” This is Hostinger's custom CDN (hcdn), not the origin nginx server
  • x-hcdn-request-id โ€” CDN edge node ID (kul-edge1 = Kuala Lumpur), confirming the request was blocked at the CDN layer before reaching WordPress

Hostinger CDN's default security rules block POST requests to /wp-json/wp/v2/media. The initial successes were likely due to CDN rule cold-start or cache misses.

Solutionโ€‹

Option 1: Disable CDN (Quick Fix)โ€‹

Go to Hostinger hPanel โ†’ Website โ†’ CDN โ†’ Disable.

This takes effect immediately but removes CDN acceleration. Suitable for staging environments or emergency fixes.

Submit a support ticket requesting to whitelist POST requests to /wp-json/*. Hostinger's Manage panel currently doesn't offer custom CDN rule configuration โ€” you must go through support.

Option 3: Add Retry Logic in Code (Defensive Measure)โ€‹

Even with correct CDN configuration, retry logic handles occasional CDN throttling:

import time
import random

def upload_image(url, image_bytes, filename, auth, max_retries=3):
for attempt in range(max_retries):
resp = httpx.post(
url,
content=image_bytes,
headers={
"Content-Disposition": f'attachment; filename="{filename}"',
"Content-Type": "image/jpeg",
},
auth=auth,
timeout=30,
)
if resp.status_code != 405:
return resp
delay = 3 * (attempt + 1) + random.uniform(0, 2)
time.sleep(delay)
resp.raise_for_status()

Troubleshooting Journeyโ€‹

This issue led down several dead ends. Here's the fullๆŽ’ๆŸฅ path for reference:

HypothesisActionResult
WP plugin blockingDisabled Speed Optimizer / Auto Upload ImagesStill 405, ruled out
Rate limitingAdded 2-5s delay between uploads + retryStill 405, ruled out
REST API disabledGET /wp-json/wp/v2/settingsReturned normally, ruled out
Auth credentialsWC Test ConnectionSucceeded, ruled out
CDN blockingcurl -v to inspect response headersserver: hcdn confirmed CDN blocking

The turning point was using curl -v and spotting server: hcdn โ€” only then did we realize the requests never reached WordPress.

Important Notes

  • After disabling CDN, DNS cache may take a few minutes to refresh โ€” don't retry immediately
  • If your site is on Hostinger and uses REST API for batch operations, test CDN behavior before going live
  • WooCommerce WC API (/wc/v3/products) uses different authentication (Consumer Key) and is typically unaffected; this mainly impacts WP REST API (/wp-json/wp/v2/*) write operations

FAQโ€‹

Why does WordPress REST API image upload return 405 Not Allowed?โ€‹

Check the server field in response headers. If it shows hcdn (Hostinger CDN) or another CDN identifier, the request is being blocked at the CDN layer before reaching WordPress. Disable the CDN or contact your hosting provider to whitelist the endpoint.

How to tell if 405 comes from CDN or WordPress?โ€‹

Use curl -v and inspect response headers: a server value of hcdn, cloudflare, or other CDN identifiers indicates CDN-level blocking; a server value of nginx/apache with X-WP-* or X-RateLimit-* headers means the request reached WordPress.


Encountered this issue while building a WooCommerce product import tool for a client. If you're also developing with Hostinger + WordPress and running into REST API issues, reach out.

CCLEE

Independent developer, 24 years in e-commerce, focused on grounding AI in real business scenarios.

Work with me

WordPress Block Theme Changes Not Taking Effect? FSE Development Troubleshooting Guide

ยท 7 min read

Encountered these five issues repeatedly while developing WordPress Block Themes for clients. Each one took significant debugging time. This guide covers the root causes and provides ready-to-use solutions.

TL;DRโ€‹

Five issues ranked by frequency: file changes not applying (database cache overrides files), block nesting errors (unclosed comments), child theme content not rendering (missing post-content block), SVG icons disappearing (WP_Filesystem polluted by plugins), and WP-CLI mail failures (SMTP plugins don't hook in CLI). Each scenario includes copy-paste diagnostic commands.

Embedding a Timeline in SITE123 Event Pages? First Dodge These 5 Platform Limits

ยท 5 min read

TL;DRโ€‹

Embedding a self-hosted timeline into a SITE123 event page hits 5 platform walls: Custom Code can't target a page or position, scripts run before DOM is ready, selectors hit hidden elements, float layout pushes adjacent blocks out of place, and platform cache injects the script twice. Fix: JS DOM manipulation + horizontal fishbone layout + DOMContentLoaded wrapper.

Fix React List Key Duplication Causing DOM Errors

ยท 2 min read

Encountered this issue while building an AI Agent chat interface. Here's the root cause and solution.

TL;DRโ€‹

Date.now() millisecond timestamps can duplicate within the same millisecond. When used as React list keys, this causes DOM errors. Fix by adding a random suffix or using crypto.randomUUID().