Preferences

SQL Injection: UNION-based extraction

Exfiltrate the entire database one column at a time with UNION SELECT. Discover the column count, align the types, dump whole tables.

Intermediate18 min

SQLiUNIONData Exfil

Loading lesson…

Preferences

SQL Injection·Lesson 6 of 20

SQL Injection: UNION-based extraction

Exfiltrate the entire database one column at a time with UNION SELECT. Discover the column count, align the types, dump whole tables.

Intermediate18 min

SQLiUNIONData Exfil

The login bypass from Lesson 1 only proved you could get into the building. UNION-based injection is when you walk out with the filing cabinets. The UNION operator stitches two SELECT statements together - if the original query returns four columns, an attacker can append a second query that returns whatever they want, and the application dutifully renders the combined result.

What you'll be able to do

Discover the column count of a target query with ORDER BY.
Construct a UNION SELECT payload with matching columns and types.
Toggle the same payload between vulnerable and parameterised code to see the difference.
Identify the precondition: the page must echo data back.

Key terms

UNION: A SQL operator that combines the result sets of two SELECT statements, provided they have the same number of columns with compatible types.
Column count discovery: The technique of incrementing ORDER BY n until an error appears, which reveals the number of columns the original query returns.
information_schema: A metadata catalog present in MySQL and PostgreSQL that lists every table and column in every database on the server.

What is it?

Two queries, one response

A profile page might look up a user by id with SELECT id, name, email, role FROM users WHERE id = {id}. The original query returns four columns. Replace the id with 0 UNION SELECT username, password, role, email FROM users and the page will dump every username, password, role, and email from the database, one row at a time. With the right column count, the right data types, and the right column ordering, the entire database - users, sessions, payment data, secrets - is one HTTP request away.

The trade-off is loud. The application needs to actually display the data for UNION to be useful. If the page swallows errors and renders nothing, you move to blind techniques (Lesson 8). If the driver blocks stacked queries, or if the application rejects multi-row responses, the attacker falls back to blind or out-of-band exfiltration. UNION is the easy case - when it works, it works fast.

Stitching two queries together

Try it

Exfiltrate a fake user table

The sandbox simulates a user lookup endpoint. Type a number to fetch a single user. Then try 0 UNION SELECT username, password, role, email FROM users. The toggle in the corner switches the underlying code from string concatenation to a parameterised query - watch the same payload fail when the fix is in place.

stagingexport.staging.acme.io/users

data-export-console

User lookup

SELECT id, name, email, role FROM users WHERE id = ''

Real-world relevance

The mass-exfil playbook

Most public SQLi disclosures are UNION-based. A single parameter - a search box, a product filter, a profile id - yields the entire schema. Automated tools like sqlmap can map a database in under a minute once they find a working injection point, dumping every table, every row, every column, one UNION at a time. When a breach announcement says “we believe the attacker may have accessed” some subset of records, the union-based variety is usually what happened.

The 2019 disclosure of a stored XSS on a legacy stats subdomain of a major game platform is illustrative: the XSS planted in the stats page (which was reachable from a separate auth subdomain) chained with a SQLi on the same legacy host to hijack OAuth tokens via an SSO redirect. The endpoint was supposed to return a single boolean. With the right payload, the attacker pivoted through the legacy stack and into the live session database. Patched within a week, but the architecture was the lesson: legacy hosts are the foothold, the modern stack is the target.

Mitigation

Parameterise - and stop displaying raw errors

Parameterised queries remain the primary defense. But UNION attacks have a second precondition: the application must echo the query result back to the user. Defenses in depth include rejecting the request when input does not match the expected shape (integers that are not integers, enums that are not in the whitelist), enforcing least privilege on the database user, and disabling verbose error messages in production.

javascriptparameterised

// VULNERABLE
app.get('/user/:id', (req, res) => {
  const q = "SELECT id, name, email, role FROM users WHERE id = '" + req.params.id + "'";
  db.query(q).then(rows => res.json(rows));
});

// SAFE: parse + bind
app.get('/user/:id', (req, res) => {
  const id = Number.parseInt(req.params.id, 10);
  if (!Number.isInteger(id) || id < 1) return res.status(400).end();
  db.query('SELECT id, name, email, role FROM users WHERE id = $1', [id]).then(rows => res.json(rows));
});

Notice the second example does two things: it parses the input as an integer before binding it, and it rejects the request with HTTP 400 if the parse fails. The bind is the primary defense; the parse is the second layer.

What to remember

UNION SELECT turns a one-row response into an N-row response - the entire database if needed.
The column count and types in the injected query must match the original SELECT.
Parameterised queries defeat UNION entirely, because the input is never interpreted as SQL.
Whitelist input shape (integer, enum, slug) before binding. Defense in depth.
Disable verbose database errors in production - they leak schema information even when the query fails.

Knowledge check

0/2 answered · 0 correct

1.You are testing a search box that produces no visible error and always returns 200 OK. Which attack family is most likely to work?
2.You send 1 ORDER BY 5-- and the page 500s. You send 1 ORDER BY 4-- and the page 200s. What does this tell you?