127 lines
4.7 KiB
Markdown
127 lines
4.7 KiB
Markdown
|
|
# URL canonicalization and normalization guide
|
||
|
|
|
||
|
|
This document defines the standard flow for query-string canonicalization in page/controllers.
|
||
|
|
Use it for all new route work and when touching existing page logic.
|
||
|
|
|
||
|
|
## Why this exists
|
||
|
|
|
||
|
|
Canonical URLs make route behavior predictable and secure by:
|
||
|
|
- removing unknown query parameters,
|
||
|
|
- normalizing known parameters to expected types,
|
||
|
|
- preventing duplicate URL variants for the same page state,
|
||
|
|
- reducing controller-specific ad-hoc redirect logic.
|
||
|
|
|
||
|
|
## Shared helper
|
||
|
|
|
||
|
|
All route canonicalization must use:
|
||
|
|
- `app/helpers/url_canonicalizer.php`
|
||
|
|
|
||
|
|
Core functions:
|
||
|
|
- `app_url_build_query_from_policy(array $sourceQuery, array $policy): array`
|
||
|
|
- `app_url_redirect_to_canonical_query(string $appRoot, array $currentQuery, array $canonicalQuery): void`
|
||
|
|
- `app_url_build_internal(string $appRoot, array $query): string`
|
||
|
|
- `app_url_policy_value(string $targetKey, array $rule, array $sourceQuery)`
|
||
|
|
|
||
|
|
## Standard controller flow
|
||
|
|
|
||
|
|
For GET routes, follow this order:
|
||
|
|
|
||
|
|
1. Resolve request context (`app_root`, user/session state, etc.).
|
||
|
|
2. Resolve a defensive GET guard (`$isGetRequest`) from `$_SERVER['REQUEST_METHOD']`.
|
||
|
|
3. Define canonical policy rules for the route.
|
||
|
|
4. Build canonical query from `$_GET`.
|
||
|
|
5. Redirect if current query differs from canonical query.
|
||
|
|
6. Continue regular page logic (rendering, DB loading, etc.).
|
||
|
|
|
||
|
|
Reference pattern:
|
||
|
|
|
||
|
|
```php
|
||
|
|
require_once APP_PATH . 'helpers/url_canonicalizer.php';
|
||
|
|
|
||
|
|
$isGetRequest = strtoupper((string)($_SERVER['REQUEST_METHOD'] ?? 'GET')) === 'GET';
|
||
|
|
if ($isGetRequest) {
|
||
|
|
$canonicalPolicy = [
|
||
|
|
'page' => [
|
||
|
|
'type' => 'literal',
|
||
|
|
'value' => 'example',
|
||
|
|
],
|
||
|
|
];
|
||
|
|
|
||
|
|
$canonicalQuery = app_url_build_query_from_policy($_GET, $canonicalPolicy);
|
||
|
|
|
||
|
|
// Keep example URLs constrained to supported route state.
|
||
|
|
app_url_redirect_to_canonical_query((string)$app_root, $_GET, $canonicalQuery);
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## Policy rule types
|
||
|
|
|
||
|
|
Supported rule `type` values:
|
||
|
|
- `literal`: fixed value from policy (`value`)
|
||
|
|
- `string`: trimmed scalar string
|
||
|
|
- `int`: integer with optional bounds (`min`, `max`)
|
||
|
|
- `enum`: string limited to `allowed` values
|
||
|
|
- `bool_flag`: emits `value_true` for truthy request inputs
|
||
|
|
- `string_list`: normalized list values (optionally `unique`)
|
||
|
|
|
||
|
|
Useful options:
|
||
|
|
- `source`: map canonical key from another source key
|
||
|
|
- `default`: fallback value
|
||
|
|
- `include_if`: callable gate to include rule conditionally
|
||
|
|
- `omit_if`: drop key when value equals sentinel
|
||
|
|
- `transform`: callable value transformer
|
||
|
|
- `validator`: callable final validator
|
||
|
|
|
||
|
|
## Route design rules
|
||
|
|
|
||
|
|
When adding canonicalization:
|
||
|
|
- Always include `page` as `literal`.
|
||
|
|
- Keep allowed query set minimal.
|
||
|
|
- Use `enum` for fixed states (`tab`, `action`, `status`, etc.).
|
||
|
|
- Use `int` with bounds for IDs and pagination.
|
||
|
|
- Use `omit_if` to avoid noisy defaults in URLs (for example `p=1`).
|
||
|
|
- Preserve only query keys that materially represent page state.
|
||
|
|
|
||
|
|
## What not to canonicalize as page URLs
|
||
|
|
|
||
|
|
Do not force page-style canonicalization on non-page endpoints that intentionally behave as API/callback streams, for example:
|
||
|
|
- JSON suggestion endpoints,
|
||
|
|
- payment webhook/callback handlers,
|
||
|
|
- binary/document output handlers,
|
||
|
|
- static asset streaming handlers.
|
||
|
|
|
||
|
|
For these endpoints, keep strict input validation and explicit allowlists as currently implemented.
|
||
|
|
|
||
|
|
## Redirect behavior
|
||
|
|
|
||
|
|
`app_url_redirect_to_canonical_query` compares normalized current and canonical queries.
|
||
|
|
If different, it sends a `Location` header and exits.
|
||
|
|
|
||
|
|
Implications:
|
||
|
|
- Logic after the call runs only for canonical request URLs.
|
||
|
|
- Downstream code may continue reading `$_GET`; values are already canonicalized by redirect gate.
|
||
|
|
- If custom redirect URL construction is needed after POST actions, use `app_url_build_internal` with a policy-built query.
|
||
|
|
|
||
|
|
## Update checklist for new/edited routes
|
||
|
|
|
||
|
|
When changing a route:
|
||
|
|
1. Add/confirm `require_once` for `url_canonicalizer.php`.
|
||
|
|
2. Use the standardized defensive guard:
|
||
|
|
`$isGetRequest = strtoupper((string)($_SERVER['REQUEST_METHOD'] ?? 'GET')) === 'GET';`
|
||
|
|
3. Add/adjust GET canonical policy near route entry.
|
||
|
|
4. Keep existing business logic unchanged unless explicitly requested.
|
||
|
|
5. Add concise inline comment for non-trivial policy/condition blocks.
|
||
|
|
6. Update deployment-facing route documentation used in your environment.
|
||
|
|
7. Run syntax checks and PHPUnit as part of validation cadence.
|
||
|
|
|
||
|
|
## Deployment notes
|
||
|
|
|
||
|
|
Coverage is deployment-scoped.
|
||
|
|
|
||
|
|
When auditing a specific environment:
|
||
|
|
- verify enabled route entry points use policy-based canonicalization,
|
||
|
|
- keep non-page API/callback/document/asset endpoints on strict allowlist
|
||
|
|
validation,
|
||
|
|
- keep local operational/developer documentation updated according to the
|
||
|
|
documentation set available in that installation.
|