Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23131

How to create column based on other DataFrame row filter?

$
0
0

I have a LazyFrame called "hourly_data", which contains a hourly datetime column called "time".I also have a DataFrame called "future_periods", which contains two datetime columns called "start", the start datetime of future periods, and "end", the end time of future periods. Importantly, these future periods are not overlapping.

I want to create a column called "period" for the hourly_data LazyFrame, which should have an int value based on for which period (which future_periods DataFrame row so going from 0 to 9 if there are 10 periods) the time column value of hourly_data is between the start and end column values of future_periods.

I tried to do the following:

periods = pl.Series(range(future_periods.height))hourly_data = hourly_data.with_columns(    (        pl.when(((future_periods.get_column('start') <= pl.col('time')) & (pl.col('time') <= future_periods.get_column('end'))).any())        .then(periods.filter(pl.Series((future_periods.get_column('start') <= pl.col('real_time')) & (pl.col('real_time') <= future_periods.get_column('end')))).to_list()[0])        .otherwise(None)    ).alias('period'))

But this gave me the error: TypeError: Series constructor called with unsupported type 'Expr' for the values parameter

What I want to accomplish:Input:

hourly_data:┌────────────────────┐│ time               ││ ---                ││ datetime           │╞════════════════════╡│ 2024-01-01 00:00:00││ 2024-01-01 01:00:00││ 2024-01-01 02:00:00││         ...        ││ 2024-03-31 23:00:00││ 2024-04-01 00:00:00││ 2024-04-01 01:00:00││         ...        ││ 2024-06-01 00:00:00│└────────────────────┘future_periods:┌─────────────────────────┬───────────────────────┐│ start                   ┆ end                   ││ ---                     ┆ ---                   ││ datetime                ┆ datetime              │╞═════════════════════════╪═══════════════════════╡│ 2024-01-01 00:00:00     ┆ 2024-01-31 23:00:00   ││ 2024-02-01 00:00:00     ┆ 2024-02-28 23:00:00   ││ 2024-03-01 00:00:00     ┆ 2024-03-31 23:00:00   ││ 2024-04-01 00:00:00     ┆ 2024-05-31 23:00:00   │└─────────────────────────┴───────────────────────┘

Output:

hourly_data:┌─────────────────────────┬────────┐│ time                    ┆ period ││ ---                     ┆ ---    ││ datetime                ┆ int    │╞═════════════════════════╪════════╡│ 2024-01-01 00:00:00     ┆ 0      ││ 2024-01-01 01:00:00     ┆ 0      ││ 2024-01-01 02:00:00     ┆ 0      ││          ...            ┆ ...    ││ 2024-03-31 23:00:00     ┆ 2      ││ 2024-04-01 00:00:00     ┆ 3      ││ 2024-04-01 01:00:00     ┆ 3      ││          ...            ┆ ...    ││ 2024-06-01 00:00:00     ┆ None   │└─────────────────────────┴────────┘

Viewing all articles
Browse latest Browse all 23131

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>