Handling the response headers from a socket connection using PHP
The headers are always followed by a blank line. So, you need to split the response from the remote server into an array, using blank lines as the separator between each element. Since the headers precede everything else, the first array element contains the headers. The remaining array elements constitute the body of the file and can be joined back together by inserting a blank line between each element.
Although you don’t want to display the headers, they contain useful information. The first header contains the status code, which you need to create an error message if there is a problem retrieving the file. Another useful header is Content-Type. This tells you what type of file is being sent: XML, HTML, plain text, and so on. However, the Content-Type header is present only if the file is successfully retrieved.
If you can find the Content-Type header, you not only know the file was retrieved, you can use the information it contains to determine how to eliminate any random characters. The Content-Type header from the friends of ED site looks like this:
Content-Type: application/xml
This indicates that it is an XML file. A typical HTML page usually sends this header:
Content-Type: text/html
Both XML and HTML files (assuming they are correctly formed) always start with an opening angle bracket (<) and end with a closing angle bracket (>). So, all you need to do is look for the first < and last >, capture them and everything in between, and discard the rest. This is easy to do with the following PCRE:
/<.+>/s
The .+ in this regular expression means “find any character at least once, but as many as possible.” The angle brackets on either side mean that, to register as a match, the result must start with an opening angle bracket, and end with a closing one. Normally, the period in a PCRE matches everything except new line characters, but adding s after the closing delimiter instructs the regular expression to include new lines. So, this simple but powerful pattern enables you to extract any XML or HTML file cleanly.
Unfortunately, this won’t work for text files, so you need to use the Content-Type header to decide whether to use the PCRE. With a text file, there is no way of knowing whether any rogue characters exist, so the only option is to leave the remaining content untouched, apart from stripping whitespace from the beginning and end.
Figure 1 shows the decision process that needs to be followed after capturing the response from a socket connection. This is handed off to a new protected method called removeHeaders() that is called at the end of the useSocket() method.
Figure 1: The decision process used in processing the raw output from a socket connection


LinkBack URL
About LinkBacks
Reply With Quote

LinkBacks Enabled by vBSEO
Bookmarks