Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex match to newline characters on windows10 #16

Open
ChandlerLutz opened this issue Jan 20, 2019 · 2 comments
Open

Regex match to newline characters on windows10 #16

ChandlerLutz opened this issue Jan 20, 2019 · 2 comments

Comments

@ChandlerLutz
Copy link

Hi Anthony,

Thank you for the wonderful book.

I'm working through Ch. 13 on regular expressions on Windows10 using postgreSQL10 in pgAdmin4

I was having trouble getting the regular expressions to work for example in code listing 13-7.

I believe that the issue is related to the way new line characters are handled on windows.

This also may be related to the following issues (I am using a clean version of the imported csv file from the crime data):

#4 and #10

I was able to solve this issue with this SO answer: https://stackoverflow.com/a/20056634. Apparently windows may match newlines to \r\n

Here is my sql code for the crime time and the output, where crime_type_orig is the original from the book and the other crime_type2 and crime_type3 are based on the above SO answer:

select 
	regexp_match(original_text, '\n(?:\w+ \w+|\w+)\n(.*):') as crime_type_orig,
	-- See https://stackoverflow.com/a/20056634
	regexp_match(original_text, '\r\n(?:\w+ \w+|\w+)\r\n(.*):') as crime_type2,
	-- Based on https://stackoverflow.com/a/20056634
	regexp_match(original_text, '(?:\r\n|\r|\n)(?:\w+ \w+|\w+)(?:\r\n|\r|\n)(.*):') as crime_type3
from crime_reports;

Here is the output from pgAdmin

image


@anthonydb
Copy link
Owner

Hi, @ChandlerLutz! Thanks for sharing this and glad you got it working.

Did you experience a similar issue with any of the other regexes, or just the one for the crime_type?

I tried just now with a new install of PostgreSQL 11 and pgAdmin 4.1 on a on a Windows 7 laptop and a fresh download of the data from this repo, and the book's code worked as expected. So, I'm a bit puzzled as to what's occurring on your side. It sounds to me like you did everything as expected, including a clean download of the file.

Please let me know your thoughts, and we'll leave this issue open in case others run into the same.

@ChandlerLutz
Copy link
Author

Hi Anthony,

So, I essentially ran in to the problem every time that a regex pattern contain the newline character \n. It might be some setting on my Windows10 machine (If I figure it out, I'll let you know). In the end I just replaced every \n with the negated capture group (?:\r\n|\r|\n). This worked well and gave me a little more regex practice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants