How can I do a batch insert into an Oracle database using Python?


Question

I have some monthly weather data that I want to insert into an Oracle database table but I want to insert the corresponding records in a batch in order to be more efficient. Can anyone advise as to how I'd go about doing this in Python?

For example let's say my table has four fields: a station ID, a date, and two value fields. The records are uniquely identified by the station ID and date fields (composite key). The values I'll have to insert for each station will be kept in a list with X number of full years worth of data, so for example if there are two years of values then the value lists will contain 24 values.

I assume that below is the way I'd do this if I wanted to insert the records one at a time:

connection_string = "scott/tiger@testdb"
connection = cx_Oracle.Connection(connection_string)
cursor = cx_Oracle.Cursor(connection)
station_id = 'STATION_1'
start_year = 2000

temps = [ 1, 3, 5, 7, 9, 1, 3, 5, 7, 9, 1, 3 ]
precips = [ 2, 4, 6, 8, 2, 4, 6, 8, 2, 4, 6, 8 ]
number_of_years = len(temps) / 12
for i in range(number_of_years):
    for j in range(12):
        # make a date for the first day of the month
        date_value = datetime.date(start_year + i, j + 1, 1)
        index = (i * 12) + j
        sql_insert = 'insert into my_table (id, date_column, temp, precip) values (%s, %s, %s, %s)', (station_id, date_value, temps[index], precips[index]))
        cursor.execute(sql_insert)
connection.commit()

Is there a way to do what I'm doing above but in a way that performs a batch insert in order to increase efficiency? BTW my experience is with Java/JDBC/Hibernate so if someone can give an explanation/example which compares to the Java approach then it'd be especially helpful.

EDIT: Perhaps I need to use cursor.executemany() as described here?

Thanks in advance for any suggestions, comments, etc.

1
13
2/15/2013 10:09:58 PM

Accepted Answer

Here's what I've come up with which appears to work well (but please comment if there's a way to improve this):

# build rows for each date and add to a list of rows we'll use to insert as a batch 
rows = [] 
numberOfYears = endYear - startYear + 1
for i in range(numberOfYears):
    for j in range(12):
        # make a date for the first day of the month
        dateValue = datetime.date(startYear + i, j + 1, 1)
        index = (i * 12) + j
        row = (stationId, dateValue, temps[index], precips[index])
        rows.append(row)

# insert all of the rows as a batch and commit
ip = '192.1.2.3' 
port = 1521
SID = 'my_sid'
dsn = cx_Oracle.makedsn(ip, port, SID)
connection = cx_Oracle.connect('username', 'password', dsn)
cursor = cx_Oracle.Cursor(connection)
cursor.prepare('insert into ' + database_table_name + ' (id, record_date, temp, precip) values (:1, :2, :3, :4)')
cursor.executemany(None, rows)
connection.commit()
cursor.close()
connection.close()
17
2/16/2013 1:07:29 AM

Use Cursor.prepare() and Cursor.executemany().

From the cx_Oracle documentation:

Cursor.prepare(statement[, tag])

This can be used before a call to execute() to define the statement that will be executed. When this is done, the prepare phase will not be performed when the call to execute() is made with None or the same string object as the statement. [...]

Cursor.executemany(statement, parameters)

Prepare a statement for execution against a database and then execute it against all parameter mappings or sequences found in the sequence parameters. The statement is managed in the same way as the execute() method manages it.

Thus, using the above two functions, your code becomes:

connection_string = "scott/tiger@testdb"
connection = cx_Oracle.Connection(connection_string)
cursor = cx_Oracle.Cursor(connection)
station_id = 'STATION_1'
start_year = 2000

temps = [ 1, 3, 5, 7, 9, 1, 3, 5, 7, 9, 1, 3 ]
precips = [ 2, 4, 6, 8, 2, 4, 6, 8, 2, 4, 6, 8 ]
number_of_years = len(temps) / 12

# list comprehension of dates for the first day of the month
date_values = [datetime.date(start_year + i, j + 1, 1) for i in range(number_of_years) for j in range(12)]

# second argument to executemany() should be of the form:
# [{'1': value_a1, '2': value_a2}, {'1': value_b1, '2': value_b2}]
dict_sequence = [{'1': date_values[i], '2': temps[i], '3': precips[i]} for i in range(1, len(temps))]

sql_insert = 'insert into my_table (id, date_column, temp, precip) values (%s, :1, :2, :3)', station_id)
cursor.prepare(sql_insert)
cursor.executemany(None, dict_sequence)
connection.commit()

Also see Oracle's Mastering Oracle+Python series of articles.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon