When the model query APIs don’t go far enough, you
can fall back to writing raw SQL. Django gives you two ways of performing raw
SQL queries: you can use Manager.raw()
to perform raw queries and
return model instances, or you can avoid the model layer entirely and
execute custom SQL directly.
The raw()
manager method can be used to perform raw SQL queries that
return model instances:
Manager.
raw
(raw_query, params=None, translations=None)¶This method method takes a raw SQL query, executes it, and returns a
RawQuerySet
instance. This
RawQuerySet
instance can be iterated
over just like an normal QuerySet to provide object instances.
This is best illustrated with an example. Suppose you’ve got the following model:
class Person(models.Model):
first_name = models.CharField(...)
last_name = models.CharField(...)
birth_date = models.DateField(...)
You could then execute custom SQL like so:
>>> for p in Person.objects.raw('SELECT * FROM myapp_person'):
... print p
John Smith
Jane Jones
Of course, this example isn’t very exciting – it’s exactly the same as
running Person.objects.all()
. However, raw()
has a bunch of other
options that make it very powerful.
Model table names
Where’d the name of the Person
table come from in that example?
By default, Django figures out a database table name by joining the
model’s “app label” – the name you used in manage.py startapp
– to
the model’s class name, with an underscore between them. In the example
we’ve assumed that the Person
model lives in an app named myapp
,
so its table would be myapp_person
.
For more details check out the documentation for the
db_table
option, which also lets you manually set the
database table name.
Warning
No checking is done on the SQL statement that is passed in to .raw()
.
Django expects that the statement will return a set of rows from the
database, but does nothing to enforce that. If the query does not
return rows, a (possibly cryptic) error will result.
Warning
If you are performing queries on MySQL, note that MySQL’s silent type coercion
may cause unexpected results when mixing types. If you query on a string
type column, but with an integer value, MySQL will coerce the types of all values
in the table to an integer before performing the comparison. For example, if your
table contains the values 'abc'
, 'def'
and you query for WHERE mycolumn=0
,
both rows will match. To prevent this, perform the correct typecasting
before using the value in a query.
raw()
automatically maps fields in the query to fields on the model.
The order of fields in your query doesn’t matter. In other words, both of the following queries work identically:
>>> Person.objects.raw('SELECT id, first_name, last_name, birth_date FROM myapp_person')
...
>>> Person.objects.raw('SELECT last_name, birth_date, first_name, id FROM myapp_person')
...
Matching is done by name. This means that you can use SQL’s AS
clauses to
map fields in the query to model fields. So if you had some other table that
had Person
data in it, you could easily map it into Person
instances:
>>> Person.objects.raw('''SELECT first AS first_name,
... last AS last_name,
... bd AS birth_date,
... pk as id,
... FROM some_other_table''')
As long as the names match, the model instances will be created correctly.
Alternatively, you can map fields in the query to model fields using the
translations
argument to raw()
. This is a dictionary mapping names of
fields in the query to names of fields on the model. For example, the above
query could also be written:
>>> name_map = {'first': 'first_name', 'last': 'last_name', 'bd': 'birth_date', 'pk': 'id'}
>>> Person.objects.raw('SELECT * FROM some_other_table', translations=name_map)
raw()
supports indexing, so if you need only the first result you can
write:
>>> first_person = Person.objects.raw('SELECT * from myapp_person')[0]
However, the indexing and slicing are not performed at the database level. If
you have a big amount of Person
objects in your database, it is more
efficient to limit the query at the SQL level:
>>> first_person = Person.objects.raw('SELECT * from myapp_person LIMIT 1')[0]
Fields may also be left out:
>>> people = Person.objects.raw('SELECT id, first_name FROM myapp_person')
The Person
objects returned by this query will be deferred model instances
(see defer()
). This means that the
fields that are omitted from the query will be loaded on demand. For example:
>>> for p in Person.objects.raw('SELECT id, first_name FROM myapp_person'):
... print p.first_name, # This will be retrieved by the original query
... print p.last_name # This will be retrieved on demand
...
John Smith
Jane Jones
From outward appearances, this looks like the query has retrieved both the first name and last name. However, this example actually issued 3 queries. Only the first names were retrieved by the raw() query – the last names were both retrieved on demand when they were printed.
There is only one field that you can’t leave out - the primary key
field. Django uses the primary key to identify model instances, so it
must always be included in a raw query. An InvalidQuery
exception
will be raised if you forget to include the primary key.
You can also execute queries containing fields that aren’t defined on the model. For example, we could use PostgreSQL’s age() function to get a list of people with their ages calculated by the database:
>>> people = Person.objects.raw('SELECT *, age(birth_date) AS age FROM myapp_person')
>>> for p in people:
... print "%s is %s." % (p.first_name, p.age)
John is 37.
Jane is 42.
...
raw()
¶If you need to perform parameterized queries, you can use the params
argument to raw()
:
>>> lname = 'Doe'
>>> Person.objects.raw('SELECT * FROM myapp_person WHERE last_name = %s', [lname])
params
is a list of parameters. You’ll use %s
placeholders in the
query string (regardless of your database engine); they’ll be replaced with
parameters from the params
list.
Warning
Do not use string formatting on raw queries!
It’s tempting to write the above query as:
>>> query = 'SELECT * FROM myapp_person WHERE last_name = %s' % lname
>>> Person.objects.raw(query)
Don’t.
Using the params
list completely protects you from SQL injection
attacks, a common exploit where attackers inject arbitrary SQL into
your database. If you use string interpolation, sooner or later you’ll
fall victim to SQL injection. As long as you remember to always use the
params
list you’ll be protected.
Sometimes even Manager.raw()
isn’t quite enough: you might need to
perform queries that don’t map cleanly to models, or directly execute
UPDATE
, INSERT
, or DELETE
queries.
In these cases, you can always access the database directly, routing around the model layer entirely.
The object django.db.connection
represents the
default database connection, and django.db.transaction
represents the
default database transaction. To use the database connection, call
connection.cursor()
to get a cursor object. Then, call
cursor.execute(sql, [params])
to execute the SQL and cursor.fetchone()
or cursor.fetchall()
to return the resulting rows. After performing a data
changing operation, you should then call
transaction.commit_unless_managed()
to ensure your changes are committed
to the database. If your query is purely a data retrieval operation, no commit
is required. For example:
def my_custom_sql():
from django.db import connection, transaction
cursor = connection.cursor()
# Data modifying operation - commit required
cursor.execute("UPDATE bar SET foo = 1 WHERE baz = %s", [self.baz])
transaction.commit_unless_managed()
# Data retrieval operation - no commit required
cursor.execute("SELECT foo FROM bar WHERE baz = %s", [self.baz])
row = cursor.fetchone()
return row
If you are using more than one database you can use
django.db.connections
to obtain the connection (and cursor) for a
specific database. django.db.connections
is a dictionary-like
object that allows you to retrieve a specific connection using its
alias:
from django.db import connections
cursor = connections['my_db_alias'].cursor()
# Your code here...
transaction.commit_unless_managed(using='my_db_alias')
By default, the Python DB API will return results without their field
names, which means you end up with a list
of values, rather than a
dict
. At a small performance cost, you can return results as a
dict
by using something like this:
def dictfetchall(cursor):
"Returns all rows from a cursor as a dict"
desc = cursor.description
return [
dict(zip([col[0] for col in desc], row))
for row in cursor.fetchall()
]
Here is an example of the difference between the two:
>>> cursor.execute("SELECT id, parent_id from test LIMIT 2");
>>> cursor.fetchall()
((54360982L, None), (54360880L, None))
>>> cursor.execute("SELECT id, parent_id from test LIMIT 2");
>>> dictfetchall(cursor)
[{'parent_id': None, 'id': 54360982L}, {'parent_id': None, 'id': 54360880L}]
When you make a raw SQL call, Django will automatically mark the current transaction as dirty. You must then ensure that the transaction containing those calls is closed correctly. See the notes on the requirements of Django’s transaction handling for more details.
Prior to Django 1.3, it was necessary to manually mark a transaction
as dirty using transaction.set_dirty()
when using raw SQL calls.
connection
and cursor
mostly implement the standard Python DB-API
described in PEP 249 (except when it comes to transaction handling). If you’re not familiar with the Python DB-API, note
that the SQL statement in cursor.execute()
uses placeholders, "%s"
,
rather than adding parameters directly within the SQL. If you use this
technique, the underlying database library will automatically add quotes and
escaping to your parameter(s) as necessary. (Also note that Django expects the
"%s"
placeholder, not the "?"
placeholder, which is used by the SQLite
Python bindings. This is for the sake of consistency and sanity.)
Nov 29, 2016