Welcome, Guest: Register On Nairaland / LOGIN! / Trending / Recent / NewStats: 3,194,330 members, 7,954,337 topics. Date: Friday, 20 September 2024 at 04:58 PM |
Nairaland Forum / Science/Technology / Programming / Help With This Regular Expression: (2518 Views)
[Help Request] How Do I Use PHP Regular Expression Functions? / I Need Different Expression In Ms Access (2) (3) (4)
Help With This Regular Expression: by Fayimora(m): 8:00pm On Jul 20, 2011 |
Am trying to pick data that looks like this: the data is about 4000 lines long so i couldnt think of a better way than regular expressions, Am using perl to do the manipulation and so far this is my regular expression /\A([^,]),\s?([^,]+),\s([^,]+),?\s+([A-Z][A-Z])\s+(\d{5})(-\d{4})?\Z/ Have manipulated it for 1hr now but the best i have gotten is no result, Any help would be appreciable, Something somewhere is not right The output should be like this: Name: Barack Obama Address: 1600 Pennsylvania Avenue City: Washington State: DC Zip Code: 20500 Thats across all lines, |
Re: Help With This Regular Expression: by worldbest(m): 8:22pm On Jul 20, 2011 |
You'll get quick answers if you get help from StackOverflow. |
Re: Help With This Regular Expression: by Fayimora(m): 8:24pm On Jul 20, 2011 |
Never mind, Already figured it out Yeah i cud get answers from stackoverflow but its better i allow people here to giv it a try. |
Re: Help With This Regular Expression: by logica(m): 9:07pm On Jul 20, 2011 |
It's easy enough to simply use a StringTokenizer (comma separated) and then split the last token using a space separator to get the state and zip-code. But I already see what seems to be an error in the data which will make this fail; a missing comma: Cuthbert Eggleston, 9084 Constable Drive, Tuba City AZ[size=36pt],[/size] 86045-9542 You could also use any standard CSV file processor. |
Re: Help With This Regular Expression: by naijaswag1: 10:35pm On Jul 20, 2011 |
logica: he is got a serious point from my perspective.I dont know regular expression so i had most times work with and array and a stringtokenizer.between the name,street,city and state and zip code will be be tokenized,thats four tokens with comma,then u split the state and zip code and then use for loops and ifs to concatenated the strings and run a while or for loop to get done.this is naija_swag for you,i do my things manually. - |
Re: Help With This Regular Expression: by Fayimora(m): 10:42pm On Jul 20, 2011 |
Errm unfortunately you cant do that here, The text file contains stuff up to 400 lines. Not all of them are addresses. Some lines are headers and some footers just like you have seen above. Its going to be a serious waste of time trying to split strings. Also, there is no straight forward format for the file. It can be separated with as many spaces as possible some have extra stuff on their addresses as you can see so its just a waste of time. Thats where regular expressions come in. Also, you have to consider the speed, efficiency. I was able to parse 3000 addresses in less than 4secs. How about that The final regular expression is /\A([^,]+),\s?([^,]+),\s*([^,]+),?\s+([A-Z][A-Z])\s+(\d{5})(-\d{4})?\Z/ I parsed the text file with perl if at all you want to see the code then i can post it here naija_swag:Don't forget that it is advised to use available classes(most especially Java) rather than coding yourself. If you have a method in some class on the API that calculates the square root of a number, use it and dont write yours. Same thing applies here, when you have regular expressions to do the job for you, why pick the more expensive way? |
Re: Help With This Regular Expression: by logica(m): 10:47pm On Jul 20, 2011 |
LOL. So you think regular expressions don't themselves result in loops? It's just that it's transparent to you. Headers and footers can easily be skipped by ensuring the tokenizer produces an expected number of tokens (in your case 4). Well, you know the format of the records you are dealing with. |
Re: Help With This Regular Expression: by Fayimora(m): 10:52pm On Jul 20, 2011 |
really? and what if you have a line of instruction formatted just the way the actual data is formatted? lol |
Re: Help With This Regular Expression: by logica(m): 10:54pm On Jul 20, 2011 |
I am quite surprised you didn't realize this also applies to your regular expression - it (the hypothetical line of instruction) will also be matched. |
Re: Help With This Regular Expression: by Fayimora(m): 10:58pm On Jul 20, 2011 |
But all i just need to do it add 4 characters and it wont be . If i go ur way then am dead cause its not gonna be that easy manipulating it. |
Re: Help With This Regular Expression: by logica(m): 11:12pm On Jul 20, 2011 |
Let me tell you a bit about reg exp: if you are not an expert in them, don't rely too much on them. They get complex very quickly and you will have a hard time knowing what the problem is. I'll use your reg exp to prove my point: Your regular expression will likely not match a name like: "William S Gates 3" or "Jackson & Sons" or maybe even "Mary-Louise Parker" which are valid names. If it does match (I didn't confirm), congratulations. It's just for you to know - there are standard regular expressions for various data classifications from names to email addresses. You might want to borrow from them. |
Re: Help With This Regular Expression: by naijaswag1: 11:26pm On Jul 20, 2011 |
I think am buying the regular expression thing if it makes the program run faster.I have a problem i am tryng to solve where i have to search 200,000 words for some patterns but i will try and do it in java.my initial solution with loops takes eternity to run. |
Re: Help With This Regular Expression: by Fayimora(m): 11:31pm On Jul 20, 2011 |
Hehehe yeah i get where u r coming from. Before i got the hang of regex, I always used an alternative. I learnt perl because i wanted to learn regex and it paid off. My regex matches ur names . I know they get very complex and thats even the more reason why I like them, they make me think. lol. Of course am not going to be writing a regex to validate an email(tho i already wrote one) when i could just get 3 from the internet and modify them to my taste if necessary. But you know there are times where you just have to face reality.Also, I think regex is one place where you believe the fact that "practice actually makes perfect". Thanks for the advice |
Re: Help With This Regular Expression: by Fayimora(m): 11:33pm On Jul 20, 2011 |
naija_swag: Cool funny enough, in as much as they are fast, they can be very slow also. Just depends on how you use them. Also, the way you construct them matters. One mistake people make is trying to match things that never happen, its like entering a loop that would end but after a loooong while |
Re: Help With This Regular Expression: by logica(m): 11:35pm On Jul 20, 2011 |
naija_swag:RegExp will not necessarily make you program run faster as they internally loop just as you would; and in some cases can actually be less efficient (and of course depending on your expertise) - e.g knowing which expression does a greedy match et cetera. I am not saying you shouldn't master Reg Exp though. It can ease tasks sometimes, when you know exactly what you are doing, but in most cases you may want to use Reg Exp, is no more than killing a mosquito with a bazooka. |
Re: Help With This Regular Expression: by Fayimora(m): 11:39pm On Jul 20, 2011 |
Ok have finished the task, who wants to try it out, I have a 15 line code(Perl) and a text file with about 11k lines and it does the job in approx 1sec |
Re: Help With This Regular Expression: by logica(m): 11:50pm On Jul 20, 2011 |
Fayimora:Oh yeah, you are matching everything that is not a comma, which is not very accurate (which is probably the only reason you should use Reg Exp - for validation enforcement and data integrity). Your current Reg Exp will even match such names as "235258235235" and "&@$!@*&WWR". If they matched the previous names, I'm sure they won't match these: B. A. Abaniwonda - (has the dot special character). Adeola (Mrs) - (has brackets). . . . and several other names that have special characters (meta-characters). |
Re: Help With This Regular Expression: by Fayimora(m): 11:56pm On Jul 20, 2011 |
Hahahahha it matches them all, Try again! |
Re: Help With This Regular Expression: by logica(m): 12:04am On Jul 21, 2011 |
Fayimora:True. It should match them since they are not in the Reg Exp itself. But it still matches the nonsensical names. I am putting you through all the things you will have to consider when you use regular expressions - the various possibilities you may need to check might not be worth it, but in this case, it works for you. |
Re: Help With This Regular Expression: by Fayimora(m): 12:06am On Jul 21, 2011 |
Yaaaay! I passed the test |
Re: Help With This Regular Expression: by logica(m): 12:14am On Jul 21, 2011 |
Not quite, but barely. But that is a problem of the file format itself. You should stick to standards and use a real CSV format because what you have appears not to be. Otherwise you will have problems with data having commas e.g a name like "Abiodun, Dare Alade" (which are escaped in the CSV format). |
Re: Help With This Regular Expression: by Fayimora(m): 12:16am On Jul 21, 2011 |
Ok I think its hight time i asked. What exactly is this csv of a thing? Anytime I open a csv file it starts opening m excel so i just close it right away, What exactly is it? Also, i prefer using XML |
Re: Help With This Regular Expression: by logica(m): 12:21am On Jul 21, 2011 |
You always close it? You think it's a virus? Nah. It tries to open in Excel because CSV is a spreadsheet format, and of course on Windows your standard spreadsheet app is Excel. CSV = Comma Separated Values. It is merely a standard (formatted) text file containing data records separated by commas and funny enough, that is exactly the kind of data you are dealing with here; so any CSV file reader will do the work you are trying to do. (There are literally millions of them online for download and I even reinvented the wheel too and I implemented 2 using 2 different strategies for experiment and control - Reg Exp and Tokenizer - and guess which was faster. . .). XML is not generally human readable even if it is much more flexible than CSV. But CSV is human readable meaning that any body who knows how to use Excel can edit the file. XML will require much more expertise. In general terms, XML is useful in a module-to-module interface, but CSV is more useful in a human-to-computer interface. Also XML files will generally be larger than the corresponding CSV files. Pros and cons. . . |
Re: Help With This Regular Expression: by Fayimora(m): 12:39am On Jul 21, 2011 |
Hahahha naa I know it aint no virus. Just dnt have time for excel and i usually think its not opening with the right program. Just checked it out now. Yeah in terms of giving someone to read then maybe ur CSV is better but structurally and from a programming pont of view, XML should be better. |
Re: Help With This Regular Expression: by candylips(m): 9:44am On Aug 03, 2011 |
csv files typically open in excel because of the windows file association. I think having a good understanding of reg-ex is very important for a developer. There are many instances where you will need to quickly parse files e.g logs for specific patterns . these types of files will certainly not be in csv format |
Re: Help With This Regular Expression: by iGravity(m): 10:39am On Aug 03, 2011 |
A good tool to test regexes is Rubular - www.rubular.com I program in Ruby so I am kinda biased towards the language's tools. |
Re: Help With This Regular Expression: by logica(m): 10:51am On Aug 03, 2011 |
candylips:That's the job of a System Admin or Analyst, but yes it's good to master Regex. |
Re: Help With This Regular Expression: by candylips(m): 12:06pm On Aug 03, 2011 |
i've had the painful job of integrating with a legacy system that keeps its data in a proprietory data format . . the only way i could get the data i wanted was to scrap its file for specific patterns. |
Re: Help With This Regular Expression: by Fayimora(m): 5:19pm On Aug 03, 2011 |
I program in ruby too, I use rubular when working with rails. However I dnt like it.lol. What happens if you are not online or gat some bad network? I have 3 diff tools(Mac widgets) which I still think is better |
Re: Help With This Regular Expression: by iGravity(m): 9:36pm On Aug 03, 2011 |
Fayimora: True, Rubular is a webapp. There are desktop apps too that you can get. Which tools do you use? I have RegExpr which is an RIA provisioned for OSX - serves me well |
Re: Help With This Regular Expression: by Fayimora(m): 10:07pm On Aug 03, 2011 |
I have some funny ones but the main one i make use of(cause it works well and very good) is Regex Widget |
Re: Help With This Regular Expression: by Nobody: 10:51pm On Aug 03, 2011 |
RegEx the most scary stuff in my life.can't say am good at it,can't even learn with a full haircut cuz i'll spend the whole day scratching my head. |
Working On Creating A Test Database. Anyone With Sample Data For Free / How Not To Learn Programming / Get This Premium Wordpress Theme For Free Here
(Go Up)
Sections: politics (1) business autos (1) jobs (1) career education (1) romance computers phones travel sports fashion health religion celebs tv-movies music-radio literature webmasters programming techmarket Links: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Nairaland - Copyright © 2005 - 2024 Oluwaseun Osewa. All rights reserved. See How To Advertise. 55 |