ids =tuple([21, 22, 23, 24, 25])con.execute(f""" CREATE TABLE synth1 AS SELECT * FROM read_csv_auto('/Users/egenn/icloud/Data/synth1.csv') """)con.execute("SELECT * FROM synth1").df()
ID
Age
Group
0
1
49.001420
D
1
2
28.266542
B
2
3
31.025146
A
3
4
25.554986
C
4
5
36.689864
B
...
...
...
...
495
496
36.633847
A
496
497
34.457815
C
497
498
49.096331
A
498
499
31.708578
A
499
500
53.850253
A
500 rows × 3 columns
26.6.2 Read only specific IDs
ids =tuple([21, 22, 23, 24, 25])con.execute(f""" CREATE TABLE synth1f AS SELECT * FROM read_csv_auto('/Users/egenn/icloud/Data/synth1.csv') WHERE ID in {ids}; """)con.execute("SELECT * FROM synth1f").df()
ID
Age
Group
0
21
43.644608
A
1
22
43.836510
A
2
23
51.134057
C
3
24
52.115098
B
4
25
36.516745
C
26.7 Using Queries with the Python API
ids =tuple([21, 22, 23, 24, 25])sq = duckdb.query(f""" SELECT * FROM read_csv_auto('/Users/egenn/icloud/Data/synth1.csv') WHERE ID in {ids}; """)sq
┌───────┬──────────────────┬─────────┐
│ ID │ Age │ Group │
│ int64 │ double │ varchar │
├───────┼──────────────────┼─────────┤
│ 21 │ 43.6446078379911 │ A │
│ 22 │ 43.8365100056002 │ A │
│ 23 │ 51.1340571729247 │ C │
│ 24 │ 52.1150978723465 │ B │
│ 25 │ 36.516745383723 │ C │
└───────┴──────────────────┴─────────┘
You can execute the query at any time:
sq.execute().df()
ID
Age
Group
0
21
43.644608
A
1
22
43.836510
A
2
23
51.134057
C
3
24
52.115098
B
4
25
36.516745
C
26.8 Compose queries
How to specify path, select, and filter programmatically
26.8.1 Select
distinct ="DISTINCT"# set to "" to not filter unique rowssep =","path ="/Users/egenn/icloud/Data/synth1.csv"cols =tuple(["ID", "Age"])cols ='"'+'", "'.join(cols) +'"'ignore_errors =True# cols = "*" # to select all programmaticallysq = duckdb.query(f""" SELECT {distinct + cols} FROM read_csv_auto('{path}', sep = '{sep}', ignore_errors = {ignore_errors}); """)sq.execute().df()
ID
Age
0
2
28.266542
1
31
41.177789
2
70
44.229731
3
92
42.604247
4
115
40.441800
...
...
...
495
445
43.608565
496
450
41.649456
497
454
41.697756
498
475
40.230046
499
484
52.954887
500 rows × 2 columns
26.8.2 Filter
Suppose you have a list of IDs in a list, convert to tuple first
ids =list(range(80, 91))filter_col ="ID"sq = sq.filter(f"{filter_col} in {tuple(ids)}")sq.execute().df()