Most efficient way of extracting hyperlinks from a file?

Hi Guys,

I was wondering if some of you can please give me suggestions on how to extract the hyper-links from the file most efficiently. I was thinking of using regular expressions. Any other better ways of doing it?

Thanks
regex seems the best option but it depends on the format you are processing
It's gonna be HTML. Do you think that still regex would be the way to go?
Regex will work fine. With HTML you may also want to look for a specific library
All right. Thanks Bazzy!
One thing to note, if the HTML has some script or comment you may get messed results if you use simple regex
eg:
...
<script>
/*
    for some reason there's <a> thing in here
*/
</script>
...
<a href="..." > the &lt;a&gt; in the script will end here: </a>
...
<!--
  this won't be rendered but you'll get it anyway:
  <a>blah blah</a>
-->
...
Last edited on
Ah ok. That's a good point. Thanks a lot.
Topic archived. No new replies allowed.