Skip to content

Conversation

@joshiggins
Copy link

This PR makes pyrqlite use the params API instead of client side string substitution.

It fixes an issue where placeholder characters could not appear inside string literals in the SQL statement.

It also potentially improves security by allowing the server to prepare and bind the parameters.

These changes should not break existing use cases for pyrqlite.

Copilot AI review requested due to automatic review settings August 4, 2025 12:24
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR migrates pyrqlite from client-side string substitution to using the rqlite params API for parameter binding. This change improves security by allowing server-side parameter preparation and binding, and fixes issues where placeholder characters could appear inside string literals.

  • Refactored parameter handling to use JSON-based parameter passing instead of string substitution
  • Updated adapter functions to prepare Python values for JSON serialization rather than string formatting
  • Enhanced parameter validation with proper string literal parsing to avoid false placeholder matches

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/test/test_dbapi.py Removes expected failure decorator and adds iterator methods to test parameter sequence handling
src/pyrqlite/extensions.py Updates adapters to prepare values for JSON serialization instead of string formatting
src/pyrqlite/cursors.py Replaces string substitution with params API, adds string literal parsing and parameter validation

Comment on lines 292 to 293
assert x == 0
if x >= self.__len__():
Copy link

Copilot AI Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bounds check will never trigger because it's placed after assert x == 0. The assertion will fail for any x >= 1, making the IndexError check unreachable. Consider removing the assertion or restructuring the logic.

Suggested change
assert x == 0
if x >= self.__len__():
if x < 0 or x >= self.__len__():

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed redundant bounds check

@otoolep
Copy link
Member

otoolep commented Aug 7, 2025

Hi @joshiggins -- this ready for review?

cc @zmedico

@joshiggins
Copy link
Author

Yes ready for review, thanks

Copy link
Member

@otoolep otoolep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a unit test you can add that would fail before your change, but now passes?


def __getitem__(self, x):
assert x == 0
assert x == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing whitespace -- please remove.

@joshiggins
Copy link
Author

@otoolep added 2 tests to check that qmark and colon characters appearing inside string literals in the SQL statement are not interpreted as parameter placeholders.

Failure message before this change:

FAILED src/test/test_dbapi.py::CursorTests::test_CheckExecuteWithColonInString - sqlite3.ProgrammingError: parameter required but not given: create table testc(id integer primary key, name text ...
FAILED src/test/test_dbapi.py::CursorTests::test_CheckExecuteWithQmarkInString - sqlite3.ProgrammingError: parameter required but not given: create table testq(id integer primary key, name text ...

Also I enabled the test test_CheckExecuteArgStringWithZeroByte which was an expected failure before but is passing now.

It looked like previously the zero byte was treated somewhere along the line as a string terminator and rqlite ends up getting a truncated SQL statement. When it's a bound parameter this test passes since the 5 characters (one of them being the zero byte) are stored and returned as expected.

Failure message before this change:

FAILED src/test/test_dbapi.py::CursorTests::test_CheckExecuteArgStringWithZeroByte - sqlite3.Error: {"error": "unrecognized token: \"'Hu\""}

Finally I enabled test_CheckUnsupportedDict as there are other tests for named params support and this one seems good to have.

@otoolep
Copy link
Member

otoolep commented Aug 14, 2025

Just so we're clear, can you tell me what you mean by "client side substitution"? Are you saying this library would rewrite, say, ? with actual values? I have not studied this library closely, as it was written by others. If so, perhaps this was probably before rqlite support proper parameterized queries.

@otoolep
Copy link
Member

otoolep commented Aug 14, 2025

rqlite API and parameters: https://rqlite.io/docs/api/api/#parameterized-statements

Surely there is no need to manipulate SQL statements strings. It's up to users to get them right, no?

@joshiggins
Copy link
Author

Are you saying this library would rewrite, say, ? with actual values?

Yes, exactly. The original method is here

def _substitute_params(self, operation, parameters):

'''
SQLite natively supports only the types TEXT, INTEGER, REAL, BLOB and
NULL
This function removes string literals from the SQL operation so we
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshiggins -- this is what I don't understand then. Why modify the SQL string at all? Should it not be supplied in the SQL strings place holder in the Params API? The SQLite code will take care of it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, I had the wrong mental model for this change (I'm not super familiar with this library). This is about breaking apart the SQL entered by the client library so the API call to rqlite can be made correctly. Let me take a look now.

@otoolep
Copy link
Member

otoolep commented Aug 21, 2025

@joshiggins -- I've got a question for you. This change appears to be checking the parameter count within the SQL to the parameters supplied by the user. Why do that? Why not just push the stuff into the rqlite API as it, and let it return errors?

What I mean is look at the db2 docs, an example:

https://docs.python.org/3/library/sqlite3.html#sqlite3.Cursor.execute

If I was to code a client library I would just take the SQL string, shove that in the HTTP API request to rqlite, and then look at the type of parameters. If it's a list form the HTTP API request one way, if it's a dict, for the HTTP API request another way.

This library was probably like this already, but I don't see any point in checking that the user has supplied the right number of parameters for the number of '?' or named params (as a dictionary) in the SQL query. rqlite will check all that, and return an error to the user. if there is an error Is there a good reason to also do the checking in a client library like this one? It could be brittle. By simply focusing on building the HTTP API request and sending that the rqlite, rqlite will do the checking for you in a bullet-proof manner (since rqlite in turn will use its copy of SQLite to check).

statements = json.dumps([self._get_operation_with_params(operation, parameters)])
except TypeError as e:
raise InterfaceError(e)

Copy link
Collaborator

@zmedico zmedico Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could catch an unexpected TypeError, so I would prefer that _get_operation_with_params internally converted TypeError to InterfaceError if needed.

A TypeError from json.dumps can be handled separately like this:

statements = self._get_operation_with_params(operation, parameters)
try:
    statements = json.dumps([statements])
except TypeError as e:
    raise InterfaceError(e)

try:
statements.append(self._get_operation_with_params(operation, parameters))
except TypeError as e:
raise InterfaceError(e)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could catch an unexpected TypeError, so I would prefer that _get_operation_with_params internally converted TypeError to InterfaceError if needed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a json.dumps(statements) call later in this function, and we want to do the TypeError to InterfaceError conversion for that.

def _adapt_bytes(value):
# Use byte array for the params API
if not isinstance(value, bytes):
value = value.encode('utf-8')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we safely omit this isinstance check because _adapt_from_python will only pass in bytes type here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants