Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add support for running raw SQL files #729

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

rolandboon
Copy link
Contributor

@rolandboon rolandboon commented Nov 24, 2020

Work in progress on this issue: #422

Is this the direction you're looking for?

While setting this up I had some questions/concerns:

  • On which routes do we want to serve these SQL files?
  • How do we distinguish between GET, POST, PUT etc.? Maybe some sort of comment/annotation on the top of the SQL file? Or do we introduce a PHP SQL parse dependency?
  • Do we need a root JSON key for the result? Similar to records?
  • Do we always serialize the result to an array? Or do we want to annotate SQL files for "single" record responses and serialize the first result into an object?
  • Generating an OpenAPI spec for these endpoints is a bit hard. Do we want this? Maybe it's possible with a SQL parser package.
  • I haven't read the code of all the middlewares, but I guess middlewares such as Authorization, MultiTenancy, PageLimits and Validation are pretty much tight to database tables, so those won't be usable on these "views"
  • I discovered actual views can also be queried through this package. What is the exact "business case" of this feature? To support complex insert/updates?

Todos on this PR:

  • Accept/handle input arguments
  • Tests
  • Readme
  • Additional routes
  • JSON root key?
  • OpenAPI spec generation?
@mevdschee
Copy link
Owner

This is so awesome! Thank you. Let me try to answer your questions/concerns

  • On which routes do we want to serve these SQL files?

I like the /procedures endpoint.

  • How do we distinguish between GET, POST, PUT etc.? Maybe some sort of comment/annotation on the top of the SQL file? Or do we introduce a PHP SQL parse dependency?

Other projects (like prest) do it in the filename/path.

  • Do we need a root JSON key for the result? Similar to records?

I think we need to model the response after the results returned by the database (driver). I'm not sure we need a root key, maybe an array of result sets will do.

  • Do we always serialize the result to an array? Or do we want to annotate SQL files for "single" record responses and serialize the first result into an object?

I think we may need to iterate the result sets of the database driver, not making any assumptions of the contents of each result set.

  • Generating an OpenAPI spec for these endpoints is a bit hard. Do we want this? Maybe it's possible with a SQL parser package.

We may have some way of specifying input parameters, not sure how though.

  • I haven't read the code of all the middlewares, but I guess middlewares such as Authorization, MultiTenancy, PageLimits and Validation are pretty much tight to database tables, so those won't be usable on these "views"

I think the Authorization middleware can be mapped, the others may not be usable.

  • I discovered actual views can also be queried through this package. What is the exact "business case" of this feature? To support complex insert/updates?

Mainly to support an API to whatever is in your database.

@rolandboon
Copy link
Contributor Author

Quick update:

I like the pREST approach for handling HTTP verbs, so I replicated that 👍
https://docs.postgres.rest/executing-sql-scripts/#scripts-templates-rules

And I've added input argument handling in 2 ways

  1. Query and request body params are bound to the PDO statement if the SQL string contains these parameter indentifiers. This is the preferred approach to interpolate user input because PDO will properly escape the input and prevent SQL injections:
    https://github.com/rolandboon/php-crud-api/blob/a43ae835e897c4dc17b4217839e7845ddeff838a/src/Tqdev/PhpCrudApi/Database/GenericDB.php#L327

  2. The SQL string is parsed as PHP with all input arguments extracted to global variables. It's a bit experimental and I'm not 100% sure we should keep this, but it offers far more powerful templating features. It's inspired by pREST's template functions. An alternative (more safe) solution would be to include a simple template parser.
    https://github.com/rolandboon/php-crud-api/blob/a43ae835e897c4dc17b4217839e7845ddeff838a/src/Tqdev/PhpCrudApi/Procedure/ProcedureService.php#L26

@mevdschee
Copy link
Owner

I will review your parsing/template options shortly.

@@ -22,6 +22,7 @@ class Config
'debug' => false,
'basePath' => '',
'openApiBase' => '{"info":{"title":"PHP-CRUD-API","version":"1.0.0"}}',
'proceduresDir' => './procedures/'
Copy link
Owner

@mevdschee mevdschee Nov 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be 'procedurePath' as key and 'procedures' as the value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored this.


class ProcedureService {
private $db;
private $baseDir;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not rename procedureDir/procedurePath here to baseDir.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored this.

{
$file = RequestUtils::getPathSegment($request, 2);
$operation = RequestUtils::getOperation($request);
$queryParams = RequestUtils::getParams($request, false);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we also/primarily support Path parameters? The body parameters may be reserved for parameters of a table type (for multiple-inserts for instance).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To support path parameters we'd need to extend the SQL file with some annotations, so users can describe which path parameters are required for the file.

Annotations would allow to this feature to be a bit more "robust", we could use it to describe all user input and sanitize it before plugging it into the query. However, it would also make using this custom SQL feature more complex. Users need to learn the annotation format we've come up with to start using the feature.

private function parseSqlTemplate(string $path, array $context) {
ob_start();
extract($context);
include($path);
Copy link
Owner

@mevdschee mevdschee Nov 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need to split the file into a head with some meta information and a body containing the sql query. The include is a risk as the $path variable should not contain (unchecked) user input to avoid path traversal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think path traversing is an issue at this point, because the path is limited to a single URL path segment. The value of $path at this location is defined as follows:

$path = './' . Config->getProcedurePath() . '/'. RequestUtils::getPathSegment($request, 2) . '.' . $operation . '.sql';

The getPathSegment() blocks path traversing.

However, some meta info would allow us to handle user input and server responses better. We could define the files like this:

./procedures/example.GET.php

<?php
$procedure = [
	/*
	 * Define user input from path
	 * In this example the route will be GET /procedure/example/:id
	 * The value of :id will be as :id in the PDO statement
	 */
	'path': ['id'],
	
	/*
	 * Define user input from query string
	 * Route will be /procedure/example?foo=hello&bar=world
	 / The values will be available as :foo and :bar in the PDO statement
	 */
	'query': ['foo', 'bar'],
	
	/*
	 * Define user input from the request body
	 */
	'body': ['buz'],
	
	'statement': '
		SELECT p.id, p.content, c.name 
		FROM posts p
		INNER JOIN categories c ON c.id = p.category_id
		WHERE p.id = :id AND categories.name = :foo
	',
];
@@ -19,12 +19,14 @@ public static function getHeader(ServerRequestInterface $request, string $header
return isset($headers[0]) ? $headers[0] : '';
}

public static function getParams(ServerRequestInterface $request): array
public static function getParams(ServerRequestInterface $request, bool $forceArray = true): array
Copy link
Owner

@mevdschee mevdschee Nov 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this addition here. I feel it could also be solved in the controller by getting the first/last element of all the entries returned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored this.

@@ -63,6 +65,17 @@ public static function getOperation(ServerRequestInterface $request): string
case 'PATCH':
return 'increment';
}
case 'procedures':
Copy link
Owner

@mevdschee mevdschee Nov 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we merge this with the 'records' way of determining the operation? Or we implement that the operation is set to the verb in case of procedures. We could also support: /procedures/{table_name}/{verb}.sql, where {table_name} is a name you give this group of stored procedures (could be related to a table).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verb or 'records' are fine with me. If we go for 'records', how do we distinguish between read and list?

SELECT p.id, p.content, c.name
FROM posts p
INNER JOIN categories c ON c.id = p.category_id
WHERE p.id = <?= $id ?>
Copy link
Owner

@mevdschee mevdschee Nov 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a serious security issue, string concatenation in SQL with user input. Judging from your earlier comments and the different approach above, it seems that you are already aware that this is not the way to go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, of course this is a particular bad example. But this feature (ob_start, include, ob_get_clean instead of file_get_contents) does give a lot of flexibility, for example:

INSERT INTO users SET username = :username, password = "<?= password_hash($password, PASSWORD_DEFAULT) ?>"

Or

INSERT INTO logs SET foo = :foo, user_ip = "<?= $_SERVER['REMOTE_ADDR'] ?>"
Copy link
Owner

@mevdschee mevdschee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are a great start, let's continue to make this functionality even better! Thank you.

@mevdschee mevdschee self-assigned this Nov 28, 2020
@mevdschee mevdschee mentioned this pull request Dec 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
2 participants