{"id":2375,"date":"2022-12-13T08:46:02","date_gmt":"2022-12-13T13:46:02","guid":{"rendered":"https:\/\/shirishranjit.com\/blog1\/?page_id=2375"},"modified":"2022-12-13T11:42:41","modified_gmt":"2022-12-13T16:42:41","slug":"clickhouse-csv-upload-primer","status":"publish","type":"page","link":"https:\/\/shirishranjit.com\/blog1\/big-data\/clickhouse-cheatsheet\/clickhouse-csv-upload-primer","title":{"rendered":"Clickhouse CSV Upload Primer"},"content":{"rendered":"\n<p>Clickhouse has many different way to upload a file. If you have CSV file with a delimiter, Clickhouse support out of box loading into a table. <\/p>\n\n\n\n<p>For this to work, you need to install clickhouse-client. This is an either linux or windows client. You<\/p>\n\n\n\n<p>Here is the break down of statements<\/p>\n\n\n\n<p>Insert into table &#8212; this is the table that you want to load the csv data<\/p>\n\n\n\n<p>The SELECT statement is selection of the columns from the CSV file. The select statement is matched with the table column order. This table column order does not need to be in same order the CSV file column order. Only thing is that the Select statement must choose the name from the CSV file column header names. You can have clickhouse functions on this select statement or static string in it. For example, now() will set a current date and you can have load status. The string should be in single quotes that means if you have using python you need to include the single quote as part of string.<\/p>\n\n\n\n<p>For the &#8220;FROM INPUT&#8221;, this is  the column header in the CSV file. This should match the column header and this also shall match with the select columns.<\/p>\n\n\n\n<p>Following is an example of the inter into table &#8220;hackernews&#8221; and csv file is comments.tsv<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><strong>clickhouse-client<\/strong> \\\n    --host avw5r4qs3y.us-east-2.aws.clickhouse.cloud \\\n    --secure \\\n    --port 9440 \\\n    --password Myp@ssw0rd \\\n    --query \"\n    INSERT INTO hackernews\n    SELECT\n        id,\n        type,\n        lower(author),\n        timestamp,\n        comment,\n        children,\n        extractAll(comment, '\\\\w+') as tokens\n    <strong>FROM input(<\/strong>'id UInt32, type String, author String, timestamp DateTime, comment String, children Array(UInt32)')\n    <strong>FORMAT<\/strong> TabSeparatedWithNames\n\" &lt; <strong>comments.tsv<\/strong><\/code><\/pre>\n\n\n\n<p>if you are building a command to spin a subprocess, the command will look like below<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>cmd = cat comments.tsv | sed \"s\/'\/ \/g\" | click house-client -h hostname --user -- password --query \" INSERT INTO hackernews\n    SELECT\n        id,\n        type,\n        lower(author),\n        timestamp,\n        comment,\n        children,\n        extractAll(comment, '\\\\w+') as tokens\n    <strong>FROM input(<\/strong>'id UInt32, type String, author String, timestamp DateTime, comment String, children Array(UInt32)')\n    <strong>FORMAT<\/strong> CSVWithNames \"\n <\/code><\/pre>\n\n\n\n<p>I have added cat and sed. You can use &#8216;sed&#8217; to remove single quotes in the file as this will cause issues.  <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">References<\/h2>\n\n\n\n<ul class=\"wp-block-list\"><li>Clickhouse input and output, <a href=\"https:\/\/clickhouse.com\/docs\/en\/interfaces\/formats\/#csvwithnames\">https:\/\/clickhouse.com\/docs\/en\/interfaces\/formats\/#csvwithnames<\/a><\/li><li>Clickhouse insert local file reference, <a href=\"https:\/\/clickhouse.com\/docs\/en\/integrations\/data-ingestion\/insert-local-files\/\">https:\/\/clickhouse.com\/docs\/en\/integrations\/data-ingestion\/insert-local-files\/<\/a><\/li><li>Clickhouse select into reference, <a href=\"https:\/\/clickhouse.com\/docs\/en\/sql-reference\/statements\/insert-into\/\">https:\/\/clickhouse.com\/docs\/en\/sql-reference\/statements\/insert-into\/<\/a><\/li><li>Discussion on insert into issues, <a href=\"https:\/\/github.com\/ClickHouse\/ClickHouse\/issues\/38543\">https:\/\/github.com\/ClickHouse\/ClickHouse\/issues\/38543<\/a><\/li><li>Clickhouse column manipulation, <a href=\"https:\/\/clickhouse.com\/docs\/en\/sql-reference\/statements\/alter\/column\/\">https:\/\/clickhouse.com\/docs\/en\/sql-reference\/statements\/alter\/column\/<\/a><\/li><li>Clickhouse functions, <a href=\"https:\/\/clickhouse.com\/docs\/en\/sql-reference\/functions\/\">https:\/\/clickhouse.com\/docs\/en\/sql-reference\/functions\/<\/a><\/li><li>Clickhouse using multiple index, <a href=\"https:\/\/clickhouse.com\/docs\/en\/guides\/improving-query-performance\/sparse-primary-indexes\/sparse-primary-indexes-multiple\/#note-about-data-skipping-index\">https:\/\/clickhouse.com\/docs\/en\/guides\/improving-query-performance\/sparse-primary-indexes\/sparse-primary-indexes-multiple\/#note-about-data-skipping-index<\/a><\/li><li><\/li><\/ul>\n<div class=\"twttr_buttons\"><div class=\"twttr_twitter\">\n\t\t\t\t\t<a href=\"http:\/\/twitter.com\/share?text=Clickhouse+CSV+Upload+Primer\" class=\"twitter-share-button\" data-via=\"\" data-hashtags=\"\"  data-size=\"default\" data-url=\"https:\/\/shirishranjit.com\/blog1\/big-data\/clickhouse-cheatsheet\/clickhouse-csv-upload-primer\"  data-related=\"\" target=\"_blank\">Tweet<\/a>\n\t\t\t\t<\/div><div class=\"twttr_followme\">\n\t\t\t\t\t\t<a href=\"https:\/\/twitter.com\/shiranjit\" class=\"twitter-follow-button\" data-size=\"default\"  data-show-screen-name=\"false\"  target=\"_blank\">Follow me<\/a>\n\t\t\t\t\t<\/div><\/div>","protected":false},"excerpt":{"rendered":"<p>Clickhouse has many different way to upload a file. If you have CSV file with a delimiter, Clickhouse support out of box loading into a table. For this to work, you need to install clickhouse-client. This is an either linux &hellip; <a href=\"https:\/\/shirishranjit.com\/blog1\/big-data\/clickhouse-cheatsheet\/clickhouse-csv-upload-primer\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":4,"featured_media":0,"parent":2238,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2375","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/shirishranjit.com\/blog1\/wp-json\/wp\/v2\/pages\/2375"}],"collection":[{"href":"https:\/\/shirishranjit.com\/blog1\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/shirishranjit.com\/blog1\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/shirishranjit.com\/blog1\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/shirishranjit.com\/blog1\/wp-json\/wp\/v2\/comments?post=2375"}],"version-history":[{"count":8,"href":"https:\/\/shirishranjit.com\/blog1\/wp-json\/wp\/v2\/pages\/2375\/revisions"}],"predecessor-version":[{"id":2390,"href":"https:\/\/shirishranjit.com\/blog1\/wp-json\/wp\/v2\/pages\/2375\/revisions\/2390"}],"up":[{"embeddable":true,"href":"https:\/\/shirishranjit.com\/blog1\/wp-json\/wp\/v2\/pages\/2238"}],"wp:attachment":[{"href":"https:\/\/shirishranjit.com\/blog1\/wp-json\/wp\/v2\/media?parent=2375"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}