Clickhouse CSV Upload Primer

Clickhouse has many different way to upload a file. If you have CSV file with a delimiter, Clickhouse support out of box loading into a table.

For this to work, you need to install clickhouse-client. This is an either linux or windows client. You

Here is the break down of statements

Insert into table — this is the table that you want to load the csv data

The SELECT statement is selection of the columns from the CSV file. The select statement is matched with the table column order. This table column order does not need to be in same order the CSV file column order. Only thing is that the Select statement must choose the name from the CSV file column header names. You can have clickhouse functions on this select statement or static string in it. For example, now() will set a current date and you can have load status. The string should be in single quotes that means if you have using python you need to include the single quote as part of string.

For the “FROM INPUT”, this is the column header in the CSV file. This should match the column header and this also shall match with the select columns.

Following is an example of the inter into table “hackernews” and csv file is comments.tsv

clickhouse-client \
    --host avw5r4qs3y.us-east-2.aws.clickhouse.cloud \
    --secure \
    --port 9440 \
    --password Myp@ssw0rd \
    --query "
    INSERT INTO hackernews
    SELECT
        id,
        type,
        lower(author),
        timestamp,
        comment,
        children,
        extractAll(comment, '\\w+') as tokens
    FROM input('id UInt32, type String, author String, timestamp DateTime, comment String, children Array(UInt32)')
    FORMAT TabSeparatedWithNames
" < comments.tsv

if you are building a command to spin a subprocess, the command will look like below

cmd = cat comments.tsv | sed "s/'/ /g" | click house-client -h hostname --user -- password --query " INSERT INTO hackernews
    SELECT
        id,
        type,
        lower(author),
        timestamp,
        comment,
        children,
        extractAll(comment, '\\w+') as tokens
    FROM input('id UInt32, type String, author String, timestamp DateTime, comment String, children Array(UInt32)')
    FORMAT CSVWithNames "
 

I have added cat and sed. You can use ‘sed’ to remove single quotes in the file as this will cause issues.

References