Tuesday, June 2, 2009

perl DBI vs c libpq PostgreSQL ingestion

I'm planning to ingest ~1B rows into a PostgreSQL data base.

There are some interesting storage requirements here, but in the first instance I've been working on getting ingestion throughput up.

For my trial ingestion of 5M rows, i've found libPQ to be roughly twice as fast as Perl DBI.

Some notes:
  • indexing is omitted during ingestion.

  • libpq included data type hints, my Perl code, using DBI, didn't

  • each row ingested included a timestamp, and a few (<5) numeric values

  • running multiple ingestion process improved throughput, although less than linearly (for <4 cpus). I.e. 2 CPUs ~ doubled throughput, but it tailed off after that.

No comments: