Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 13981

merge some columns in a pandas dataframe and duplicate the others

$
0
0

I have a similar problem to how to select all columns from a list in a polars dataframe, but slightly different:

import polars as plimport numpy as npimport stringrng = np.random.default_rng(42)nr = 3letters = list(string.ascii_letters)uppercase = list(string.ascii_uppercase)words, groups = [], []for i in range(nr):    word = ''.join([rng.choice(letters) for _ in range(rng.integers(3, 20))])    words.append(word)    group = rng.choice(uppercase)    groups.append(group)df = pl.DataFrame(    {"a_0": np.linspace(0, 1, nr),"a_1": np.linspace(1, 2, nr),"a_2": np.linspace(2, 3, nr),"b_0": np.random.rand(nr),"b_1": 2 * np.random.rand(nr),"b_2": 3 * np.random.rand(nr),"words": words,"groups": groups,    })print(df)  shape: (3, 8)┌─────┬─────┬─────┬──────────┬──────────┬──────────┬─────────────────┬────────┐│ a_0 ┆ a_1 ┆ a_2 ┆ b_0      ┆ b_1      ┆ b_2      ┆ words           ┆ groups ││ --- ┆ --- ┆ --- ┆ ---      ┆ ---      ┆ ---      ┆ ---             ┆ ---    ││ f64 ┆ f64 ┆ f64 ┆ f64      ┆ f64      ┆ f64      ┆ str             ┆ str    │╞═════╪═════╪═════╪══════════╪══════════╪══════════╪═════════════════╪════════╡│ 0.0 ┆ 1.0 ┆ 2.0 ┆ 0.653892 ┆ 0.234362 ┆ 0.880558 ┆ OIww            ┆ W      ││ 0.5 ┆ 1.5 ┆ 2.5 ┆ 0.408888 ┆ 0.213767 ┆ 1.833025 ┆ KkeB            ┆ Z      ││ 1.0 ┆ 2.0 ┆ 3.0 ┆ 0.423949 ┆ 0.646378 ┆ 0.116173 ┆ NLOAgRxAtjWOHuQ ┆ O      │└─────┴─────┴─────┴──────────┴──────────┴──────────┴─────────────────┴────────┘

I want again to concatenate the columns a_0, a_1,... into a column a, columns b_0, b_1,... into a column b. However, unlike the preceding question, this time a = [a_0; a_1; ...]. I.e., all the elements of a_0 go first, followed by all the elements of a_1, etc. All the columns whose name doesn't end with a _ followed by a digit (in this example, words and groups) must be duplicated enough times to match the length of a. Let nr and nc be the number of rows/columns in df. Then the output dataframe must have m*nr rows (m=3 in this case) and nc-2*(m-1) columns, i.e.

shape: (9, 4)┌─────────────────┬────────┬─────┬──────────┐│ words           ┆ groups ┆ a   ┆ b        ││ ---             ┆ ---    ┆ --- ┆ ---      ││ str             ┆ str    ┆ f64 ┆ f64      │╞═════════════════╪════════╪═════╪══════════╡│ OIww            ┆ W      ┆ 0.0 ┆ 0.653892 ││ KkeB            ┆ Z      ┆ 0.5 ┆ 0.408888 ││ NLOAgRxAtjWOHuQ ┆ O      ┆ 1.0 ┆ 0.423949 ││ OIww            ┆ W      ┆ 1.0 ┆ 0.234362 ││ KkeB            ┆ Z      ┆ 1.5 ┆ 0.213767 ││ NLOAgRxAtjWOHuQ ┆ O      ┆ 2.0 ┆ 0.646378 ││ OIww            ┆ W      ┆ 2.0 ┆ 0.880558 ││ KkeB            ┆ Z      ┆ 2.5 ┆ 1.833025 ││ NLOAgRxAtjWOHuQ ┆ O      ┆ 3.0 ┆ 0.116173 │└─────────────────┴────────┴─────┴──────────┘

How can I do that?


Viewing all articles
Browse latest Browse all 13981

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>