Clickhouse基本面:第2部分
#sql #database #性能 #clickhouse

更改表TTLS
您可以使用Alter表...修改TTL修改Clickhouse中表的ttl。例如:

ALTER TABLE database.table MODIFY TTL event_date + INTERVAL 30 DAY;

但是,Clickhouse将重写所有表分区,包括未受TTL更改影响的桌子分区。这可能是一个非常昂贵的操作,尤其是对于大桌子。

为了避免影响我们的数据库的性能,我们可以将Intoritalize_ttl_after_modify设置为0并手动清除旧分区。

这避免了重写所有表分区的巨大性能影响,但确实意味着还有其他手动努力。

例如:

set materialize_ttl_after_modify=0;
ALTER TABLE database.table MODIFY TTL event_date + INTERVAL 30 DAY;
ALTER TABLE database.table DROP PARTITION 202205;
ALTER TABLE database.table DROP PARTITION 202206;
ALTER TABLE database.table DROP PARTITION 202207;

使用ClickHouse-Local
分析本地文件 ClickHouse-Local就像运行仅持续会话的临时Clickhouse服务器一样。非常适合探索本地文件快速尝试数据,而无需设置正确的ClickHouse部署。

可以直接从本地文件系统分析结构化数据的文件。

SELECT count() FROM file('final.ndjson');

SELECT count()
FROM file('final.ndjson')

Query id: a0a1f4b5-40cb-4125-b68b-4ed978c41576

┌─count()─┐
│  100000 │
└─────────┘

1 row in set. Elapsed: 0.659 sec. Processed 55.38 thousand rows, 96.97 MB (84.04 thousand rows/s., 147.16 MB/s.)


SELECT countDistinct(public_ip) FROM file('final.ndjson');

SELECT countDistinct(public_ip)
FROM file('final.ndjson')

Query id: 21df7ca5-e3bf-4010-b2a0-bf8b854502d2

┌─uniqExact(public_ip)─┐
│                   71 │
└──────────────────────┘

1 row in set. Elapsed: 0.225 sec. Processed 77.53 thousand rows, 96.45 MB (345.22 thousand rows/s., 429.46 MB/s.)

如果要对数据进行多个分析,则可以从本地文件创建表。当您的ClickHouse-Local会话结束时,该表将被销毁。

CREATE TABLE auxiliar Engine=MergeTree() ORDER BY tuple() AS SELECT * FROM file('final.ndjson');

CREATE TABLE auxiliar
ENGINE = MergeTree
ORDER BY tuple() AS
SELECT *
FROM file('final.ndjson')

Query id: a1732be5-a912-41a5-bf8e-e524db8f12f4

Ok.

0 rows in set. Elapsed: 0.486 sec. Processed 100.00 thousand rows, 161.88 MB (205.73 thousand rows/s., 333.03 MB/s.)


SHOW CREATE TABLE auxiliar;

SHOW CREATE TABLE auxiliar

Query id: dffbcd4b-2c08-4d07-916c-b8e1b668c202


│ CREATE TABLE _local.auxiliar
(
    `timestamp_iso8601` Nullable(DateTime64(9)),
    `host` Nullable(String),
    `public_ip` Nullable(String),
    `request_method` Nullable(String),
    `request_path` Nullable(String),
    `status` Nullable(Int64),
    `body_bytes_sent` Nullable(Int64),
    `request_length` Nullable(Int64),
    `first_byte` Nullable(Float64),
    `request_time` Nullable(Float64),
    `lambda_name` Nullable(String),
    `lambda_region` Nullable(String),
    `path_type` Nullable(String),
    `hit_level` Nullable(String),
    `hit_state` Nullable(String),
    `error_details` Nullable(String),
    `owner_id` Nullable(String),
    `project_id` Nullable(String),
    `target_path` Nullable(String),
    `deployment_plan` Nullable(String),
    `lambda_duration` Nullable(Float64),
    `lambda_billed_duration` Nullable(Int64),
    `lambda_memory_size` Nullable(Int64),
    `http_user_agent` Nullable(String),
    `full_vercel_id` Nullable(String),
    `dc` Nullable(String),
    `public_ip_country` Nullable(String),
    `public_ip_city` Nullable(String),
    `asn_id` Nullable(String),
    `asn_name` Nullable(String)
)
ENGINE = MergeTree
ORDER BY tuple()
SETTINGS index_granularity = 8192 │


1 row in set. Elapsed: 0.001 sec.

SELECT count(), status - status % 100 AS status_range FROM auxiliar GROUP BY status_range;

SELECT
    count(),
    status - (status % 100) AS status_range
FROM auxiliar
GROUP BY status_range

Query id: 2685e0d4-827a-4306-8598-5d6e589dbd15

┌─count()─┬─status_range─┐
│   74000 │          200 │
│    5000 │          400 │
│   21000 │          300 │
└─────────┴──────────────┘

3 rows in set. Elapsed: 0.015 sec.

添加新列的默认值
当您将新列添加到表格时,ClickHouse将使用默认值添加它:

CREATE TABLE local
ENGINE = MergeTree
ORDER BY number AS
SELECT *
FROM numbers(1000000);

ALTER TABLE local
    ADD COLUMN IF NOT EXISTS `date` DateTime;

OPTIMIZE TABLE local FINAL; -- To speed up the mutation / lazy way to know it has finished

SELECT *
FROM local
LIMIT 10

Query id: b5fedb97-a1c8-475f-a674-0b1658c8e889

┌─number─┬────────────────date─┐
│      0 │ 1970-01-01 01:00:00 │
│      1 │ 1970-01-01 01:00:00 │
│      2 │ 1970-01-01 01:00:00 │
│      3 │ 1970-01-01 01:00:00 │
│      4 │ 1970-01-01 01:00:00 │
│      5 │ 1970-01-01 01:00:00 │
│      6 │ 1970-01-01 01:00:00 │
│      7 │ 1970-01-01 01:00:00 │
│      8 │ 1970-01-01 01:00:00 │
│      9 │ 1970-01-01 01:00:00 │
└────────┴─────────────────────┘

要更改旧行的默认值,您需要在列定义中声明默认值:

ALTER TABLE local
    ADD COLUMN IF NOT EXISTS `new_date` DateTime DEFAULT now();

OPTIMIZE TABLE local FINAL;

SELECT *
FROM local
LIMIT 10

Query id: b5ff3afd-78f7-4ea3-8d43-adc7fe14f0a0

┌─number─┬────────────────date─┬────────────new_date─┐
│      0 │ 1970-01-01 01:00:00 │ 2022-09-23 13:53:38 │
│      1 │ 1970-01-01 01:00:00 │ 2022-09-23 13:53:38 │
│      2 │ 1970-01-01 01:00:00 │ 2022-09-23 13:53:38 │
│      3 │ 1970-01-01 01:00:00 │ 2022-09-23 13:53:38 │
│      4 │ 1970-01-01 01:00:00 │ 2022-09-23 13:53:38 │
│      5 │ 1970-01-01 01:00:00 │ 2022-09-23 13:53:38 │
│      6 │ 1970-01-01 01:00:00 │ 2022-09-23 13:53:38 │
│      7 │ 1970-01-01 01:00:00 │ 2022-09-23 13:53:38 │
│      8 │ 1970-01-01 01:00:00 │ 2022-09-23 13:53:38 │
│      9 │ 1970-01-01 01:00:00 │ 2022-09-23 13:53:38 │
└────────┴─────────────────────┴─────────────────────┘

10 rows in set. Elapsed: 0.002 sec.

请注意,这意味着如果不在插入上声明新行也将获得默认值。

ALTER TABLE local
    MODIFY COLUMN `new_date` DateTime DEFAULT yesterday();

INSERT INTO local(number) VALUES (999999999);

SELECT *
FROM local
WHERE number = 999999999

Query id: 02527ad6-4644-42ff-8755-8869a9df30fa

┌────number─┬────────────────date─┬────────────new_date─┐
│ 999999999 │ 1970-01-01 01:00:00 │ 2022-09-22 00:00:00 │
└───────────┴─────────────────────┴─────────────────────┘