Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23160

Regex capture group in pandas extract call for single digit followed by single letter

$
0
0

I need to extract substring instances in a pandas series that match this regex: "[3-6]X"ie "3X", "4X", "5X", or "6X" from arbitrary strings like "Hello this is an 6X interesting sentence".

I have spent hours trying solutions, reading regex information, etc. I am not and will never be and am not interested in understanding regex in any detail, its only useful to me in small places very, very occasionally. I also have never encountered such a difficulty, I can usually search out a regex for my problem such as email-address matching, phone number matching, etc., and then copy-paste-modify with a bit of homework and get a solution in seconds or minutes. I have no idea what the problem is here that I'm not understanding, if its with my regex pattern, or how I'm using it with pandas. I realize that stack overflow is very high class and says I should not post this question, but I have spent 3 hours reading and trying solutions, and 60 hours a week for 10 years doing similar work and never encountered such a simple problem that posed such great difficulty.

df.some_column.str.extract("([3-6])").fillna(False)

gives expected result, some matches for each of the digits when they occur and a lot of non-matching rows.

I have tried these (including chaining in astype(str) before the extract, just in case that makes a difference...) for example, with many many more variations of the regex string that I think are not helpful to list since they don't work. I have also tinkered with the findall and contains methods. contains works as expected with "[3-6]X", however findall and extract seem to need something different to match the same patterns.

df.some_column.str.extract("([3-6]X)").fillna(False)

df.some_column.str.extract("([3-6]{X})").fillna(False)

I also threw in a word-break, which caused the extract to fail, even though every instance of the numbers I have are space-separated.

df.some_column.str.extract("(\b[3-6])").fillna(False)


Viewing all articles
Browse latest Browse all 23160

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>