Welcome, Guest: Register On Nairaland / LOGIN! / Trending / Recent / New
Stats: 3,150,841 members, 7,810,244 topics. Date: Saturday, 27 April 2024 at 01:58 AM

Help With This Regular Expression: - Programming - Nairaland

Nairaland Forum / Science/Technology / Programming / Help With This Regular Expression: (2479 Views)

[Help Request] How Do I Use PHP Regular Expression Functions? / I Need Different Expression In Ms Access (2) (3) (4)

(1) (2) (Reply) (Go Down)

Help With This Regular Expression: by Fayimora(m): 8:00pm On Jul 20, 2011
Am trying to pick data that looks like this:

PET SHELTER ADDRESS LIST
Master created 2011-07-20
----------------------------------------
Eustace Golightly, 1725 Mansfield Court, Success, AK    72470
Reginald Winterbottom, 190 Porter Road, Peabody, MA 01960
Penelope Smallbone,8462 Byron Way, Woonsocket, RI  02895
Cuthbert Eggleston, 9084 Constable Drive, Tuba City AZ 86045-9542
Barack Obama, 1600 Pennsylvania Avenue, Washington, DC 20500
----------------------------------------
Report Ends
the data is about 4000 lines long so i couldnt think of a better way than regular expressions,  Am using perl to do the manipulation and so far this is my regular expression

/\A([^,]),\s?([^,]+),\s([^,]+),?\s+([A-Z][A-Z])\s+(\d{5})(-\d{4})?\Z/

Have manipulated it for 1hr now but the best i have gotten is no result, Any help would be appreciable, Something somewhere is not right sad

The output should be like this:

Name: Barack Obama
Address: 1600 Pennsylvania Avenue
City: Washington
State: DC
Zip Code: 20500


Thats across all lines,
Re: Help With This Regular Expression: by worldbest(m): 8:22pm On Jul 20, 2011
You'll get quick answers if you get help from StackOverflow.
Re: Help With This Regular Expression: by Fayimora(m): 8:24pm On Jul 20, 2011
Never mind, Already figured it out cheesy

Yeah i cud get answers from stackoverflow but its better i allow people here to giv it a try.
Re: Help With This Regular Expression: by logica(m): 9:07pm On Jul 20, 2011
It's easy enough to simply use a StringTokenizer (comma separated) and then split the last token using a space separator to get the state and zip-code. But I already see what seems to be an error in the data which will make this fail; a missing comma:

Cuthbert Eggleston, 9084 Constable Drive, Tuba City AZ[size=36pt],[/size] 86045-9542

You could also use any standard CSV file processor.
Re: Help With This Regular Expression: by naijaswag1: 10:35pm On Jul 20, 2011
logica:

It's easy enough to simply use a StringTokenizer (comma separated) and then split the last token using a space separator to get the state and zip-code. But I already see what seems to be an error in the data which will make this fail; a missing comma:

Cuthbert Eggleston, 9084 Constable Drive, Tuba City AZ[size=36pt],[/size] 86045-9542

You could also use any standard CSV file processor.

he is got a serious point from my perspective.I dont know regular expression so i had most times work with and array and a stringtokenizer.between the name,street,city and state and zip code will be be tokenized,thats four tokens with comma,then u split the state and zip code and then use for loops and ifs to concatenated the strings and run a while or for loop to get done.this is naija_swag for you,i do my things manually.
-
Re: Help With This Regular Expression: by Fayimora(m): 10:42pm On Jul 20, 2011
Errm unfortunately you cant do that here,  The text file contains stuff up to 400 lines. Not all of them are addresses. Some lines are headers and some footers just like you have seen above. Its going to be a serious waste of time trying to split strings. Also, there is no straight forward format for the file. It can be separated with as many spaces as possible some have extra stuff on their addresses as you can see so its just a waste of time. Thats where regular expressions come in.  Also, you have to consider the speed, efficiency. I was able to parse 3000 addresses in less than 4secs. How about thatcheesy

The final regular expression is
/\A([^,]+),\s?([^,]+),\s*([^,]+),?\s+([A-Z][A-Z])\s+(\d{5})(-\d{4})?\Z/

I parsed the text file with perl if at all you want to see the code then i can post it here cheesy

naija_swag:
this is naija_swag for you,i do my things manually.
Don't forget that it is advised to use available classes(most especially Java) rather than coding yourself. If you have a method in some class on the API that calculates the square root of a number, use it and dont write yours. Same thing applies here, when you have regular expressions to do the job for you, why pick the more expensive way? cheesy
Re: Help With This Regular Expression: by logica(m): 10:47pm On Jul 20, 2011
LOL. So you think regular expressions don't themselves result in loops? It's just that it's transparent to you. Headers and footers can easily be skipped by ensuring the tokenizer produces an expected number of tokens (in your case 4). Well, you know the format of the records you are dealing with.
Re: Help With This Regular Expression: by Fayimora(m): 10:52pm On Jul 20, 2011
really? and what if you have a line of instruction formatted just the way the actual data is formatted? lol
Re: Help With This Regular Expression: by logica(m): 10:54pm On Jul 20, 2011
I am quite surprised you didn't realize this also applies to your regular expression - it (the hypothetical line of instruction) will also be matched. smiley
Re: Help With This Regular Expression: by Fayimora(m): 10:58pm On Jul 20, 2011
But all i just need to do it add 4 characters and it wont be cheesy. If i go ur way then am dead cause its not gonna be that easy manipulating it.
Re: Help With This Regular Expression: by logica(m): 11:12pm On Jul 20, 2011
Let me tell you a bit about reg exp: if you are not an expert in them, don't rely too much on them. They get complex very quickly and you will have a hard time knowing what the problem is. I'll use your reg exp to prove my point:
Your regular expression will likely not match a name like: "William S Gates 3" or "Jackson & Sons" or maybe even "Mary-Louise Parker"  which are valid names. If it does match (I didn't confirm), congratulations. It's just for you to know - there are standard regular expressions for various data classifications from names to email addresses. You might want to borrow from them.
Re: Help With This Regular Expression: by naijaswag1: 11:26pm On Jul 20, 2011
I think am buying the regular expression thing if it makes the program run faster.I have a problem i am tryng to solve where i have to search 200,000 words for some patterns but i will try and do it in java.my initial solution with loops takes eternity to run.
Re: Help With This Regular Expression: by Fayimora(m): 11:31pm On Jul 20, 2011
Hehehe yeah i get where u r coming from. Before i got the hang of regex, I always used an alternative. I learnt perl because i wanted to learn regex and it paid off. My regex matches ur names grin. I know they get very complex and thats even the more reason why I like them, they make me think. lol.

Of course am not going to be writing a regex to validate an email(tho i already wrote one) when i could just get 3 from the internet and modify them to my taste if necessary. But you know there are times where you just have to face reality.Also, I think regex is one place where you believe the fact that "practice actually makes perfect".

Thanks for the advice
Re: Help With This Regular Expression: by Fayimora(m): 11:33pm On Jul 20, 2011
naija_swag:

I think am buying the regular expression thing if it makes the program run faster.I have a problem i am tryng to solve where i have to search 200,000 words for some patterns but i will try and do it in java.my initial solution with loops takes eternity to run.

Cool funny enough, in as much as they are fast, they can be very slow also. Just depends on how you use them. Also, the way you construct them matters. One mistake people make is trying to match things that never happen, its like entering a loop that would end but after a loooong while
Re: Help With This Regular Expression: by logica(m): 11:35pm On Jul 20, 2011
naija_swag:

I think am buying the regular expression thing if it makes the program run faster.I have a problem i am tryng to solve where i have to search 200,000 words for some patterns but i will try and do it in java.my initial solution with loops takes eternity to run.
RegExp will not necessarily make you program run faster as they internally loop just as you would; and in some cases can actually be less efficient (and of course depending on your expertise) - e.g knowing which expression does a greedy match et cetera. I am not saying you shouldn't master Reg Exp though. It can ease tasks sometimes, when you know exactly what you are doing, but in most cases you may want to use Reg Exp, is no more than killing a mosquito with a bazooka.
Re: Help With This Regular Expression: by Fayimora(m): 11:39pm On Jul 20, 2011
Ok have finished the task, who wants to try it out, I have a 15 line code(Perl) and a text file with about 11k lines and it does the job in approx 1sec cheesy
Re: Help With This Regular Expression: by logica(m): 11:50pm On Jul 20, 2011
Fayimora:

My regex matches your names grin
Oh yeah, you are matching everything that is not a comma, which is not very accurate (which is probably the only reason you should use Reg Exp - for validation enforcement and data integrity). Your current Reg Exp will even match such names as "235258235235" and "&@$!@*&WWR".

If they matched the previous names, I'm sure they won't match these:

B. A. Abaniwonda - (has the dot special character).

Adeola (Mrs) - (has brackets).

. . . and several other names that have special characters (meta-characters).
Re: Help With This Regular Expression: by Fayimora(m): 11:56pm On Jul 20, 2011
Hahahahha it matches them all, Try again! tongue
Re: Help With This Regular Expression: by logica(m): 12:04am On Jul 21, 2011
Fayimora:

Hahahahha it matches them all, Try again! tongue
True. It should match them since they are not in the Reg Exp itself. But it still matches the nonsensical names. I am putting you through all the things you will have to consider when you use regular expressions - the various possibilities you may need to check might not be worth it, but in this case, it works for you. smiley
Re: Help With This Regular Expression: by Fayimora(m): 12:06am On Jul 21, 2011
Yaaaay! I passed the test cheesy
Re: Help With This Regular Expression: by logica(m): 12:14am On Jul 21, 2011
Not quite, but barely. But that is a problem of the file format itself. You should stick to standards and use a real CSV format because what you have appears not to be. Otherwise you will have problems with data having commas e.g a name like "Abiodun, Dare Alade" (which are escaped in the CSV format).
Re: Help With This Regular Expression: by Fayimora(m): 12:16am On Jul 21, 2011
Ok I think its hight time i asked. What exactly is this csv of a thing? Anytime I open a csv file it starts opening m excel so i just close it right away, What exactly is it? Also, i prefer using XML cheesy
Re: Help With This Regular Expression: by logica(m): 12:21am On Jul 21, 2011
You always close it? You think it's a virus? Nah. It tries to open in Excel because CSV is a spreadsheet format, and of course on Windows your standard spreadsheet app is Excel.

CSV = Comma Separated Values. It is merely a standard (formatted) text file containing data records separated by commas and funny enough, that is exactly the kind of data you are dealing with here; so any CSV file reader will do the work you are trying to do. (There are literally millions of them online for download and I even reinvented the wheel too and I implemented 2 using 2 different strategies for experiment and control  - Reg Exp and Tokenizer - and guess which was faster. . .).

XML is not generally human readable even if it is much more flexible than CSV. But CSV is human readable meaning that any body who knows how to use Excel can edit the file. XML will require much more expertise. In general terms, XML is useful in a module-to-module interface, but CSV is more useful in a human-to-computer interface. Also XML files will generally be larger than the corresponding CSV files.

Pros and cons. . .
Re: Help With This Regular Expression: by Fayimora(m): 12:39am On Jul 21, 2011
Hahahha naa I know it aint no virus. Just dnt have time for excel and i usually think its not opening with the right program. Just checked it out now. Yeah in terms of giving someone to read then maybe ur CSV is better but structurally and from a programming pont of view, XML should be better.
Re: Help With This Regular Expression: by candylips(m): 9:44am On Aug 03, 2011
csv files typically open in excel because of the windows file association.

I think having a good understanding of reg-ex is very important for a developer.

There are many instances where you will need to quickly parse files e.g logs for specific patterns . these types of files will certainly not be in csv format
Re: Help With This Regular Expression: by iGravity(m): 10:39am On Aug 03, 2011
A good tool to test regexes is Rubular - www.rubular.com
I program in Ruby so I am kinda biased towards the language's tools.
Re: Help With This Regular Expression: by logica(m): 10:51am On Aug 03, 2011
candylips:

There are many instances where you will need to quickly parse files e.g logs for specific patterns .
That's the job of a System Admin or Analyst, but yes it's good to master Regex.
Re: Help With This Regular Expression: by candylips(m): 12:06pm On Aug 03, 2011
i've had the painful job of integrating with a legacy system that keeps its data in a proprietory data format . .

the only way i could get the data i wanted was to scrap its file for specific patterns.
Re: Help With This Regular Expression: by Fayimora(m): 5:19pm On Aug 03, 2011
I program in ruby too, I use rubular when working with rails. However I dnt like it.lol. What happens if you are not online or gat some bad network? I have 3 diff tools(Mac widgets) which I still think is better cheesy
Re: Help With This Regular Expression: by iGravity(m): 9:36pm On Aug 03, 2011
Fayimora:

I program in ruby too, I use rubular when working with rails. However I dnt like it.lol. What happens if you are not online or gat some bad network? I have 3 diff tools(Mac widgets) which I still think is better cheesy

True, Rubular is a webapp. There are desktop apps too that you can get. Which tools do you use? I have RegExpr which is an RIA provisioned for OSX - serves me well
Re: Help With This Regular Expression: by Fayimora(m): 10:07pm On Aug 03, 2011
I have some funny ones but the main one i make use of(cause it works well and very good) is Regex Widget
Re: Help With This Regular Expression: by Nobody: 10:51pm On Aug 03, 2011
RegEx the most scary stuff in my life.can't say am good at it,can't even learn with a full haircut cuz i'll spend the whole day scratching my head.

(1) (2) (Reply)

C# School Timetable Generator / Most Easy And Understandable Programming Language For Beginners? / Why I Think NL Should Upgrade To HTML5

(Go Up)

Sections: politics (1) business autos (1) jobs (1) career education (1) romance computers phones travel sports fashion health
religion celebs tv-movies music-radio literature webmasters programming techmarket

Links: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

Nairaland - Copyright © 2005 - 2024 Oluwaseun Osewa. All rights reserved. See How To Advertise. 41
Disclaimer: Every Nairaland member is solely responsible for anything that he/she posts or uploads on Nairaland.