Learning to love PostgreSQL arrays

I’ll admit it: When arrays were added to PostgreSQL a number of years ago, I thought that this was a really bad idea.  I’m a firm believer in normalization when it comes to database design and storage; and the idea of putting multiple values inside of a single column struck me as particularly foolish.  Besides, my impression was that PostgreSQL arrays were clumsy to work with, and didn’t really add much to my data model.

Of course, it turns out that arrays are extremely useful in PostgreSQL.  I still cringe when people want to use them for general-purpose storage, instead of working to normalize their database design.  But over the last few months, as I’ve been doing all sorts of complex PostgreSQL queries for my PhD dissertation, I’ve found that PostgreSQL arrays are extremely useful when it comes to aggregating and reporting data.

I’ve thus decided to dedicate a number of blog posts to PostgreSQL arrays: How to create them, use them, manipulate them, and decide when to use them.

Let’s start with the very basics; over the next few blog posts, I’ll try to show how arrays can be interesting — and even useful, and fit into more complex queries and needs.

You can create an array of just about any data type in PostgreSQL.  As the documentation says, “Arrays of any built-in or user-defined base type, enum type, or composite type can be created. Arrays of domains are not yet supported.” This means that you can create arrays of just about any data type you want: Integers, text, enums, other arrays (for multidimensional arrays), or even user-defined types.  To date, I have generally created arrays of integers and text, but that might not be representative of your use case.

To create a table with a text array in one column, just add square brackets ([]) after the type:

CREATE TABLE Foo (
id SERIAL NOT NULL,
stuff TEXT[],
PRIMARY KEY(id)
);

When I then ask for the definition of my “Foo” table, I see the following:

[local]/reuven=# \d foo
 Table "public.foo"
┌────────┬─────────┬──────────────────────────────────────────────────┐
│ Column │ Type │ Modifiers │
├────────┼─────────┼──────────────────────────────────────────────────┤
│ id     │ integer │ not null default nextval('foo_id_seq'::regclass) │
│ stuff  │ text[]  │                                                  │
└────────┴─────────┴──────────────────────────────────────────────────┘
Indexes:
 "foo_pkey" PRIMARY KEY, btree (id)

Notice that the type of the “stuff” column is indeed recorded as “text[]”, showing that it’s an array.  If we try to insert a plain-text value into that column, PostgreSQL will complain:

[local]/reuven=# insert into foo (stuff) values ('abc');
ERROR: 22P02: array value must start with "{" or dimension information
LINE 1: insert into foo (stuff) values ('abc');
                                        ^

One of the many things that I love about PostgreSQL is the attention to detail in the error messages.  Not only does it tell us that the table is expecting an array value, but that the array must begin with a { character.  It also shows us, using a ^ character, where the parser had problems.  That’s not always a perfect indicator of where the problem lies, but it’s a great start.

If I want to insert an array value into my table, I can thus use the literal array syntax that PostgreSQL provides, with (as indicated above) curly braces:

[local]/reuven=# insert into foo (stuff) values ('{abc}');
INSERT 0 1
[local]/reuven=# insert into foo (stuff) values ('{abc,def}');
INSERT 0 1
[local]/reuven=# insert into foo (stuff) values ('{abc,def,ghi jkl}');
INSERT 0 1

The above commands insert three rows into our table.  In all three cases, we are inserting array values into our table.  Notice that in all cases, the array is inserted as a string, surrounded by single quote marks.  Thus, ‘{abc}’ becomes a one-element array, and ‘{abc,def}’ becomes a two-element array.

What happens when there is a space character inside of the text?  PostgreSQL automatically quotes the value (with double quotes — be careful!).  What happens if you want a comma or single quote as part of the text?  Then things get even uglier.

A nice solution, and a better way (I believe) to insert arrays in any event, is to use the built-in ARRAY constructor syntax.  Then you don’t have to worry about such things.  For example, I can rewrite all of the above INSERT commands in what I believe to be a much nicer way:

[local]/reuven=# insert into foo (stuff) values (ARRAY['abc']);
INSERT 0 1

[local]/reuven=# insert into foo (stuff) values (ARRAY['abc', 'def']);
INSERT 0 1

[local]/reuven=# insert into foo (stuff) values (ARRAY['abc', 'def', 'ghi jkl']);
INSERT 0 1

[local]/reuven=# select * from foo;
┌────┬─────────────────────┐
│ id │ stuff │
├────┼─────────────────────┤
│ 8  │ {abc}               │
│ 9  │ {abc,def}           │
│ 10 │ {abc,def,"ghi jkl"} │
└────┴─────────────────────┘
(3 rows)

The same data was inserted into the table, but with less hassle than before.

Now, just because we can insert arrays directly into our tables doesn’t necessarily mean that we should do so.  You’ll see, over the course of this series, that I view arrays as a great way to aggregate and analyze existing data, particularly within the context of a view or a CTE.  So please don’t be tempted to start stuffing all of the data you want and need into a single column; normalization is still a good idea, and arrays can be tempting.  However, being familiar with the basics of defining and inserting array data into the database is quite useful, and will serve us well throughout the rest of this series.

 

2 thoughts on “Learning to love PostgreSQL arrays”

Leave a Reply

Your email address will not be published. Required fields are marked *

eleven − two =