Archive for May, 2009

Today I needed to pull some web page down from the internet and extract some specific contents in PHP. Sounds like a crawler, huh? Actually not the real crawler, just pulling our own contents. I was doing this because it’s not convenient for me to access the database directly.

I’m not quite familiar with PHP, but with version 5 on my local dev machine, I was able to do this very quickly. Just use file_get_contents to get the whole page as a string, and then use preg_match_all to search for the parts I want.

Unexpected things happened after I uploaded the script to the server. It said function file_get_contents was not defined. Then I realized that I was on a machine with Red Hat 9, the PHP I was using was version 4.2.2 bundled with RH9. OK. I rewrote the code to use fopen/fread directly. This time, it complained that it couldn’t handle the scheme (I don’t remember the error report string clearly).

I don’t know if it was because of my configuration, or version 4.2.2 doesn’t support the wrappers. It made me crazy. I don’t want to do an upgrade because all the packages are old. It takes time and may cause more problems. I even couldn’t find the apxs binary to compile PHP from source.

Finally, I got a workaround. First use exec to call wget to download the url to a file in /tmp, and then use fopen/fread to read this temp file. It really works.

Another problem was that preg_match_all doesn’t accept the last $offset parameter in PHP 4.2.2, but it’s simple to fix, I think.

This took me some time, but made me realize that how the development of software/language tools eased our daily work.

One month ago, I became interested in Django and made studying Python well a goal for myself.

Yes I know there are other ways to study a language. For example, learn Python by practicing with Django. But I want to be a bit familiar with Python before coding Django websites. So I decided to implement the algorithms in the famous book “Introduction to Algorithms“. The even greater benefit for me, I thought, was that I could get more familiar with algorithms.

It’s a great plan for me, one without great determination. Some friends said it’s hard when I told them. Now the fact turns out to be I really can’t go on with it. At least it must be paused, if not terminated.

I just got a new job. Although I really love it, I’m overwhelmed by the amount of new tools and knowledge I must learn. The good news is that I will learn Python for this job. The bad is that I’m afraid I can’t learn Python through implementing the famous algorithms. I must learn fast through practicing in real productions work.

So to learn Python is easy. But the road I chose to this goal is hard. Maybe it’ll return great profits – reading the book helped me a lot in the interview for this great job.

Will I resume the process when time is not so expensive as now? I wish.