更改表TTLS
您可以使用Alter表...修改TTL修改Clickhouse中表的ttl。例如:
ALTER TABLE database.table MODIFY TTL event_date + INTERVAL 30 DAY;
但是,Clickhouse将重写所有表分区,包括未受TTL更改影响的桌子分区。这可能是一个非常昂贵的操作,尤其是对于大桌子。
为了避免影响我们的数据库的性能,我们可以将Intoritalize_ttl_after_modify设置为0并手动清除旧分区。
这避免了重写所有表分区的巨大性能影响,但确实意味着还有其他手动努力。
例如:
set materialize_ttl_after_modify=0;
ALTER TABLE database.table MODIFY TTL event_date + INTERVAL 30 DAY;
ALTER TABLE database.table DROP PARTITION 202205;
ALTER TABLE database.table DROP PARTITION 202206;
ALTER TABLE database.table DROP PARTITION 202207;
使用ClickHouse-Local
分析本地文件
ClickHouse-Local就像运行仅持续会话的临时Clickhouse服务器一样。非常适合探索本地文件快速尝试数据,而无需设置正确的ClickHouse部署。
可以直接从本地文件系统分析结构化数据的文件。
SELECT count() FROM file('final.ndjson');
SELECT count()
FROM file('final.ndjson')
Query id: a0a1f4b5-40cb-4125-b68b-4ed978c41576
┌─count()─┐
│ 100000 │
└─────────┘
1 row in set. Elapsed: 0.659 sec. Processed 55.38 thousand rows, 96.97 MB (84.04 thousand rows/s., 147.16 MB/s.)
SELECT countDistinct(public_ip) FROM file('final.ndjson');
SELECT countDistinct(public_ip)
FROM file('final.ndjson')
Query id: 21df7ca5-e3bf-4010-b2a0-bf8b854502d2
┌─uniqExact(public_ip)─┐
│ 71 │
└──────────────────────┘
1 row in set. Elapsed: 0.225 sec. Processed 77.53 thousand rows, 96.45 MB (345.22 thousand rows/s., 429.46 MB/s.)
如果要对数据进行多个分析,则可以从本地文件创建表。当您的ClickHouse-Local会话结束时,该表将被销毁。
CREATE TABLE auxiliar Engine=MergeTree() ORDER BY tuple() AS SELECT * FROM file('final.ndjson');
CREATE TABLE auxiliar
ENGINE = MergeTree
ORDER BY tuple() AS
SELECT *
FROM file('final.ndjson')
Query id: a1732be5-a912-41a5-bf8e-e524db8f12f4
Ok.
0 rows in set. Elapsed: 0.486 sec. Processed 100.00 thousand rows, 161.88 MB (205.73 thousand rows/s., 333.03 MB/s.)
SHOW CREATE TABLE auxiliar;
SHOW CREATE TABLE auxiliar
Query id: dffbcd4b-2c08-4d07-916c-b8e1b668c202
│ CREATE TABLE _local.auxiliar
(
`timestamp_iso8601` Nullable(DateTime64(9)),
`host` Nullable(String),
`public_ip` Nullable(String),
`request_method` Nullable(String),
`request_path` Nullable(String),
`status` Nullable(Int64),
`body_bytes_sent` Nullable(Int64),
`request_length` Nullable(Int64),
`first_byte` Nullable(Float64),
`request_time` Nullable(Float64),
`lambda_name` Nullable(String),
`lambda_region` Nullable(String),
`path_type` Nullable(String),
`hit_level` Nullable(String),
`hit_state` Nullable(String),
`error_details` Nullable(String),
`owner_id` Nullable(String),
`project_id` Nullable(String),
`target_path` Nullable(String),
`deployment_plan` Nullable(String),
`lambda_duration` Nullable(Float64),
`lambda_billed_duration` Nullable(Int64),
`lambda_memory_size` Nullable(Int64),
`http_user_agent` Nullable(String),
`full_vercel_id` Nullable(String),
`dc` Nullable(String),
`public_ip_country` Nullable(String),
`public_ip_city` Nullable(String),
`asn_id` Nullable(String),
`asn_name` Nullable(String)
)
ENGINE = MergeTree
ORDER BY tuple()
SETTINGS index_granularity = 8192 │
1 row in set. Elapsed: 0.001 sec.
SELECT count(), status - status % 100 AS status_range FROM auxiliar GROUP BY status_range;
SELECT
count(),
status - (status % 100) AS status_range
FROM auxiliar
GROUP BY status_range
Query id: 2685e0d4-827a-4306-8598-5d6e589dbd15
┌─count()─┬─status_range─┐
│ 74000 │ 200 │
│ 5000 │ 400 │
│ 21000 │ 300 │
└─────────┴──────────────┘
3 rows in set. Elapsed: 0.015 sec.
添加新列的默认值
当您将新列添加到表格时,ClickHouse将使用默认值添加它:
CREATE TABLE local
ENGINE = MergeTree
ORDER BY number AS
SELECT *
FROM numbers(1000000);
ALTER TABLE local
ADD COLUMN IF NOT EXISTS `date` DateTime;
OPTIMIZE TABLE local FINAL; -- To speed up the mutation / lazy way to know it has finished
SELECT *
FROM local
LIMIT 10
Query id: b5fedb97-a1c8-475f-a674-0b1658c8e889
┌─number─┬────────────────date─┐
│ 0 │ 1970-01-01 01:00:00 │
│ 1 │ 1970-01-01 01:00:00 │
│ 2 │ 1970-01-01 01:00:00 │
│ 3 │ 1970-01-01 01:00:00 │
│ 4 │ 1970-01-01 01:00:00 │
│ 5 │ 1970-01-01 01:00:00 │
│ 6 │ 1970-01-01 01:00:00 │
│ 7 │ 1970-01-01 01:00:00 │
│ 8 │ 1970-01-01 01:00:00 │
│ 9 │ 1970-01-01 01:00:00 │
└────────┴─────────────────────┘
要更改旧行的默认值,您需要在列定义中声明默认值:
ALTER TABLE local
ADD COLUMN IF NOT EXISTS `new_date` DateTime DEFAULT now();
OPTIMIZE TABLE local FINAL;
SELECT *
FROM local
LIMIT 10
Query id: b5ff3afd-78f7-4ea3-8d43-adc7fe14f0a0
┌─number─┬────────────────date─┬────────────new_date─┐
│ 0 │ 1970-01-01 01:00:00 │ 2022-09-23 13:53:38 │
│ 1 │ 1970-01-01 01:00:00 │ 2022-09-23 13:53:38 │
│ 2 │ 1970-01-01 01:00:00 │ 2022-09-23 13:53:38 │
│ 3 │ 1970-01-01 01:00:00 │ 2022-09-23 13:53:38 │
│ 4 │ 1970-01-01 01:00:00 │ 2022-09-23 13:53:38 │
│ 5 │ 1970-01-01 01:00:00 │ 2022-09-23 13:53:38 │
│ 6 │ 1970-01-01 01:00:00 │ 2022-09-23 13:53:38 │
│ 7 │ 1970-01-01 01:00:00 │ 2022-09-23 13:53:38 │
│ 8 │ 1970-01-01 01:00:00 │ 2022-09-23 13:53:38 │
│ 9 │ 1970-01-01 01:00:00 │ 2022-09-23 13:53:38 │
└────────┴─────────────────────┴─────────────────────┘
10 rows in set. Elapsed: 0.002 sec.
请注意,这意味着如果不在插入上声明新行也将获得默认值。
ALTER TABLE local
MODIFY COLUMN `new_date` DateTime DEFAULT yesterday();
INSERT INTO local(number) VALUES (999999999);
SELECT *
FROM local
WHERE number = 999999999
Query id: 02527ad6-4644-42ff-8755-8869a9df30fa
┌────number─┬────────────────date─┬────────────new_date─┐
│ 999999999 │ 1970-01-01 01:00:00 │ 2022-09-22 00:00:00 │
└───────────┴─────────────────────┴─────────────────────┘