Clickhouse has many different way to upload a file. If you have CSV file with a delimiter, Clickhouse support out of box loading into a table.
For this to work, you need to install clickhouse-client. This is an either linux or windows client. You
Here is the break down of statements
Insert into table — this is the table that you want to load the csv data
The SELECT statement is selection of the columns from the CSV file. The select statement is matched with the table column order. This table column order does not need to be in same order the CSV file column order. Only thing is that the Select statement must choose the name from the CSV file column header names. You can have clickhouse functions on this select statement or static string in it. For example, now() will set a current date and you can have load status. The string should be in single quotes that means if you have using python you need to include the single quote as part of string.
For the “FROM INPUT”, this is the column header in the CSV file. This should match the column header and this also shall match with the select columns.
Following is an example of the inter into table “hackernews” and csv file is comments.tsv
clickhouse-client \
--host avw5r4qs3y.us-east-2.aws.clickhouse.cloud \
--secure \
--port 9440 \
--password Myp@ssw0rd \
--query "
INSERT INTO hackernews
SELECT
id,
type,
lower(author),
timestamp,
comment,
children,
extractAll(comment, '\\w+') as tokens
FROM input('id UInt32, type String, author String, timestamp DateTime, comment String, children Array(UInt32)')
FORMAT TabSeparatedWithNames
" < comments.tsv
if you are building a command to spin a subprocess, the command will look like below
cmd = cat comments.tsv | sed "s/'/ /g" | click house-client -h hostname --user -- password --query " INSERT INTO hackernews
SELECT
id,
type,
lower(author),
timestamp,
comment,
children,
extractAll(comment, '\\w+') as tokens
FROM input('id UInt32, type String, author String, timestamp DateTime, comment String, children Array(UInt32)')
FORMAT CSVWithNames "
I have added cat and sed. You can use ‘sed’ to remove single quotes in the file as this will cause issues.
References
- Clickhouse input and output, https://clickhouse.com/docs/en/interfaces/formats/#csvwithnames
- Clickhouse insert local file reference, https://clickhouse.com/docs/en/integrations/data-ingestion/insert-local-files/
- Clickhouse select into reference, https://clickhouse.com/docs/en/sql-reference/statements/insert-into/
- Discussion on insert into issues, https://github.com/ClickHouse/ClickHouse/issues/38543
- Clickhouse column manipulation, https://clickhouse.com/docs/en/sql-reference/statements/alter/column/
- Clickhouse functions, https://clickhouse.com/docs/en/sql-reference/functions/
- Clickhouse using multiple index, https://clickhouse.com/docs/en/guides/improving-query-performance/sparse-primary-indexes/sparse-primary-indexes-multiple/#note-about-data-skipping-index