The problem: Multiple users, only one being the intended user, downloading large files simultaneously resulting in the desired user getting very poor performance from the uplink rate limited server; Single complete path "HTTP GET" requests, often for very large files, when there is no way anybody other than the intended user should know the path.

 

The workaround solution: The unintended users (Read on: It is Microsoft) always use a completely lower case file path and name. Just include some upper case characters in the file path and name and the unintended users will get an HTTP return code of 404 (Not Found), at least on a Unix or Linux or Free BSD type server. One could also use password protected web site areas and/or files. However, I am trying to keep things as simple as possible as often I am trying to get large files to grandparents and similar not very computer savvy users.

 

The concern: Who are these unintended users and how do they even obtain the file name and path in the first place? (Read on: They are Microsoft. One way they get the URL is via the Microsoft Internet Explorer SmartScreen filter. Another way is via SmartScreen technology built into Hotmail.)

 

See also: This excellent wiki page and blog.

 

Background: Many of my family and friends are not very computer savvy. Sometimes to share big files, such as many pictures or video clips, I will put them in a hidden spot on this web site and send them a link to the directory with instructions to highlight the file and "right click" and then select "save target as" to get the file to their computer (windows based method).

 

There was an odd entry in www.smythies.com web page access logs, where someone attempted to access a file but with case sensitive errors in the name. I had only created the directory about an hour before the download and had only told one family member about it via e-mail. My initial assumption was merely that the family member told someone else about the file, but was not aware of case sensitive file names in Unix type systems. This was not the case. It is important to note that the complete URL was not included in the e-mail, but rather the user had to highlight the file from the directory listing URL that was contained in the e-mail and then "right click" and then select "save target as", before a complete "HTTP GET" was sent to the server. Yet somehow, the undesired user issues an all lower case version of the same "HTTP GET" command only minutes later. I also had vague recollections of similar issues in the past. I reviewed all of the logs.

 

Log references (many non-relevant lines deleted. Hotmail ID's changed):

 

Case 1 (SmartScreen filter was enabled):

24.66.228.93 - - [24/Feb/2010:15:50:20 -0800] "GET /~doug/genealogy/R.E.Smythies/book.html HTTP/1.1" 200 4208 "http://xxxxxx.xxxxxx.mail.live.com/mail/InboxLight.aspx?FolderID=00000000-0000-0000-0000-0000000000$

24.66.228.93 - - [24/Feb/2010:15:54:15 -0800] "GET /~doug/genealogy/R.E.Smythies/RES_Chapter_1.pdf HTTP/1.1" 200 704087 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; SV1; GTB6.4; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CL$

24.66.228.93 - - [24/Feb/2010:15:55:49 -0800] "GET /~doug/genealogy/R.E.Smythies/RES_Chapter_5.pdf HTTP/1.1" 200 658964 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; SV1; GTB6.4; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CL$

24.66.228.93 - - [24/Feb/2010:15:57:11 -0800] "GET /~doug/genealogy/R.E.Smythies/RES_Chapter_10.pdf HTTP/1.1" 200 682348 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; SV1; GTB6.4; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET C$

24.66.228.93 - - [24/Feb/2010:15:57:30 -0800] "GET /~doug/genealogy/R.E.Smythies/RES_Chapter_11.pdf HTTP/1.1" 200 823745 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; SV1; GTB6.4; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET $

64.124.203.71 - - [24/Feb/2010:16:38:31 -0800] "GET /~doug/genealogy/r.e.smythies/res_chapter_11.pdf HTTP/1.1" 404 337 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)"

208.50.101.153 - - [24/Feb/2010:16:38:31 -0800] "GET /~doug/genealogy/r.e.smythies/res_chapter_10.pdf HTTP/1.1" 404 337 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)"

208.50.101.153 - - [24/Feb/2010:16:45:25 -0800] "GET /~doug/genealogy/r.e.smythies/res_chapter_5.pdf HTTP/1.1" 404 336 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)"

208.50.101.154 - - [24/Feb/2010:17:36:14 -0800] "GET /~doug/genealogy/r.e.smythies/res_chapter_1.pdf HTTP/1.1" 404 336 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)"

207.46.164.13 - - [24/Feb/2010:17:08:47 -0800] "GET /~doug/genealogy/r.e.smythies/res_chapter_10.pdf HTTP/1.1" 404 337 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)"

207.46.164.13 - - [24/Feb/2010:17:08:47 -0800] "GET /~doug/genealogy/r.e.smythies/res_chapter_10.pdf HTTP/1.1" 404 337 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)"

207.46.164.20 - - [24/Feb/2010:17:08:58 -0800] "GET /~doug/genealogy/r.e.smythies/res_chapter_11.pdf HTTP/1.1" 404 337 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)"

207.46.164.20 - - [24/Feb/2010:17:08:58 -0800] "GET /~doug/genealogy/r.e.smythies/res_chapter_11.pdf HTTP/1.1" 404 337 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)"

208.50.101.154 - - [24/Feb/2010:17:36:14 -0800] "GET /~doug/genealogy/r.e.smythies/res_chapter_1.pdf HTTP/1.1" 404 336 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)"

207.46.164.27 - - [24/Feb/2010:18:28:50 -0800] "GET /~doug/genealogy/r.e.smythies/res_chapter_1.pdf HTTP/1.1" 404 336 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)"

207.46.164.27 - - [24/Feb/2010:18:28:50 -0800] "GET /~doug/genealogy/r.e.smythies/res_chapter_1.pdf HTTP/1.1" 404 336 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)"

 

Case 2 (see more detailed look at case 2 further below): 

70.71.248.6 - - [28/Mar/2010:15:12:17 -0700] "GET /~doug/harrison/bla.zip HTTP/1.1" 200 93525412 "http://www.smythies.com/~doug/harrison/" "Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_4_11; en) AppleWebKit/531.9 (KHTML, like Gecko) Version/4.0.3 Safari/531.9" <<< There were a few pulls. Not sure which caused the below.

64.124.203.76 - - [28/Mar/2010:18:12:41 -0700] "GET /~doug/harrison/bla.zip HTTP/1.1" 103 93525412 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)"  <<< an isolated "GET" with no preceding directory listing (Note: file path is already lower case)(Pull was aborted after ~~8 megabytes)

 

Case 3 (SmartScreen filter was disabled):

64.180.197.51 - - [04/Apr/2010:12:52:05 -0700] "GET /~doug/mary/ HTTP/1.1" 200 697 "http://yyyyyyy.yyyyyy.mail.live.com/mail/InboxLight.aspx?FolderID=00000000-0000-0000-0000-000000000001&n$

64.180.197.51 - - [04/Apr/2010:12:52:05 -0700] "GET /icons/blank.gif HTTP/1.1" 200 148 "http://www.smythies.com/~doug/mary/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322)"

64.180.197.51 - - [04/Apr/2010:12:52:06 -0700] "GET /icons/back.gif HTTP/1.1" 200 216 "http://www.smythies.com/~doug/mary/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322)"

64.180.197.51 - - [04/Apr/2010:12:52:06 -0700] "GET /icons/movie.gif HTTP/1.1" 200 243 "http://www.smythies.com/~doug/mary/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322)"

64.180.197.51 - - [04/Apr/2010:12:52:06 -0700] "GET /favicon.ico HTTP/1.1" 200 1150 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322)"

64.180.197.51 - - [04/Apr/2010:12:52:32 -0700] "GET /~doug/mary/GS_2.wmv HTTP/1.1" 200 36222648 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322)"

208.50.101.152 - - [04/Apr/2010:13:11:31 -0700] "GET /~doug/mary/gs_2.wmv HTTP/1.1" 404 309 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)"

 

Of the IP addresses listed in red above, none have visited smythies.com before, on any port, not just port 80 (HTTP).

 

208.48.224.0-208.50.127.255 Global Crossing - Phoenix Az.

64.124.0.0-64.125.255.255 Abovenet Communications, Inc. - White Plains N.Y.

207.46.0.0-207.46.255.255 Microsoft Corp. - Redmond WA

 

See some excellent work by Tom R. of Acadia Systems Inc. in the below referred Blog. Portions of all of the above IP address ranges, including those highlighted in RED above, are used by Microsoft and their Internet Explorer SmartScreen filter.

 

There is an interesting blog, with much more information on this at http://athena.outer-reaches.com/wp/index.php/archives/613 . I will not repeat all of the content herein, but (with all due credit to the blog contributors) the main points are:

·         There is a definite relationship between the state of the Microsoft Internet Explorer SmartScreen filter setting and the undesired "HTTP GET"s. However it is not exclusive to IE. The SmartScreen technology is also used in Hotmail. For the cases on smythies.com: Case 1 had the SmartScreen filter enabled and also used Hotmail; Case 2 had the SmartScreen filter enabled, and it was not using Hotmail; Case 3 had the SmartScreen filter turned off, but used Hotmail.

·         A site can end up on some "safe" list and will not suffer from these undesired "HTTP GET"s anymore. If there is some idle time to be removed from the safe list, or how ones site actually gets onto the safe list, I do not know. In my case, I have not been able to re-create these undesired "HTTP GET"s. I have tried random file names, new folder names, and even new user accounts. This includes tests done by all three of the above noted desired users, tests with the SmartScreen filter enabled and tests via Hotmail e-mail accounts.

 

An informative link about SmartScreen use in Hotmail and its learning and confidence functions: http://postmaster.live.com/FightingJunk.aspx . What I still do not fully understand from Case 3 above is how the complete URL got to 208.50.101.152. The smart screen filter was disabled on that person's computer and even though they used hotmail as their e-mail account, the complete URL was not contained in my e-mail to them. Only the directory location was included as a link within the e-mail, with instructions to "right click" over the file name and then select "save target as" to get the file. Only then was the complete URL formed by 64.180.197.51 and sent to the smythies.com server.

 

A more detailed look at the second case from above:

 

Note: In this case multiple people had the address of the folder with the file, and the folder had existed for about 3 weeks. The information could well have been forwarded to others.

 

70.71.248.6 - - [28/Mar/2010:15:07:05 -0700] "GET /~doug/harrison/ HTTP/1.1" 200 709 "-" "Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_4_11; en) AppleWebKit/531.9 (KHTML, like Gecko) Version/4.0.3 Safari/531.9"

70.71.248.6 - - [28/Mar/2010:15:07:05 -0700] "GET /icons/blank.gif HTTP/1.1" 200 148 "http://www.smythies.com/~doug/harrison/" "Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_4_11; en) AppleWebKit/531.9 (KHTML, like Gecko) Version/4.0.3 Safari/531.9"

70.71.248.6 - - [28/Mar/2010:15:07:05 -0700] "GET /icons/back.gif HTTP/1.1" 200 216 "http://www.smythies.com/~doug/harrison/" "Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_4_11; en) AppleWebKit/531.9 (KHTML, like Gecko) Version/4.0.3 Safari/531.9"

70.71.248.6 - - [28/Mar/2010:15:07:05 -0700] "GET /icons/compressed.gif HTTP/1.1" 200 1038 "http://www.smythies.com/~doug/harrison/" "Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_4_11; en) AppleWebKit/531.9 (KHTML, like Gecko) Version/4.0.3 Safari/531.9"

70.71.248.6 - - [28/Mar/2010:15:07:06 -0700] "GET /favicon.ico HTTP/1.1" 200 1150 "http://www.smythies.com/~doug/harrison/" "Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_4_11; en) AppleWebKit/531.9 (KHTML, like Gecko) Version/4.0.3 Safari/531.9"

70.71.248.6 - - [28/Mar/2010:15:07:48 -0700] "GET /~doug/harrison/ HTTP/1.1" 200 709 "-" "Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_4_11; en) AppleWebKit/531.9 (KHTML, like Gecko) Version/4.0.3 Safari/531.9"

70.71.248.6 - - [28/Mar/2010:15:07:56 -0700] "GET /~doug/harrison/ HTTP/1.1" 200 709 "-" "Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_4_11; en) AppleWebKit/531.9 (KHTML, like Gecko) Version/4.0.3 Safari/531.9"

70.71.248.6 - - [28/Mar/2010:15:11:31 -0700] "GET /~doug/harrison/ HTTP/1.1" 200 709 "-" "Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_4_11; en) AppleWebKit/531.9 (KHTML, like Gecko) Version/4.0.3 Safari/531.9"

70.71.248.6 - - [28/Mar/2010:15:11:31 -0700] "GET /icons/blank.gif HTTP/1.1" 304 - "http://www.smythies.com/~doug/harrison/" "Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_4_11; en) AppleWebKit/531.9 (KHTML, like Gecko) Version/4.0.3 Safari/531.9"

70.71.248.6 - - [28/Mar/2010:15:11:31 -0700] "GET /icons/back.gif HTTP/1.1" 304 - "http://www.smythies.com/~doug/harrison/" "Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_4_11; en) AppleWebKit/531.9 (KHTML, like Gecko) Version/4.0.3 Safari/531.9"

70.71.248.6 - - [28/Mar/2010:15:11:31 -0700] "GET /icons/compressed.gif HTTP/1.1" 304 - "http://www.smythies.com/~doug/harrison/" "Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_4_11; en) AppleWebKit/531.9 (KHTML, like Gecko) Version/4.0.3 Safari/531.9"

70.71.248.6 - - [28/Mar/2010:15:11:31 -0700] "GET /favicon.ico HTTP/1.1" 200 1150 "http://www.smythies.com/~doug/harrison/" "Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_4_11; en) AppleWebKit/531.9 (KHTML, like Gecko) Version/4.0.3 Safari/531.9"

70.71.248.6 - - [28/Mar/2010:15:12:09 -0700] "GET /~doug/harrison/ HTTP/1.1" 200 709 "-" "Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_4_11; en) AppleWebKit/531.9 (KHTML, like Gecko) Version/4.0.3 Safari/531.9"

70.71.248.6 - - [28/Mar/2010:15:12:17 -0700] "GET /~doug/harrison/bla.zip HTTP/1.1" 200 93525412 "http://www.smythies.com/~doug/harrison/" "Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_4_11; en) AppleWebKit/531.9 (KHTML, like Gecko) Version/4.0.3 Safari/531.$

 

In the above referenced blog, I had referred to the above MAC computer, running Safari as a potential source that led to the undesired "HTTP GET". I now believe this one was innocent and it was some of the below traffic that lead to the undesired event.

In the below log segment there ends up being simultaneous downloads of the big file from two IP addresses. (Note: I believe these simultaneous downloads were caused by the desired user and this is not an example of an undesired user creating lack of performance for the desired user.) At the time of writing these notes the smythies.com server is ADSL uplink limited to about 75 kilobytes per second, thus the entire transfer time went to about 42 minutes.

Note what appears to be a time stamp sequence error in red. This is an anomaly of the Apache web server, it time stamps the log entry string at the start of the data transfer, but only writes it to the log file at the end of the data transfer. In this example there was another log entry in the meantime.

 

96.50.7.185 - - [28/Mar/2010:16:11:34 -0700] "GET /~doug/harrison/ HTTP/1.1" 200 709 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"

96.50.7.185 - - [28/Mar/2010:16:11:34 -0700] "GET /icons/blank.gif HTTP/1.1" 200 148 "http://www.smythies.com/~doug/harrison/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.$

96.50.7.185 - - [28/Mar/2010:16:11:34 -0700] "GET /icons/back.gif HTTP/1.1" 200 216 "http://www.smythies.com/~doug/harrison/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.4$

96.50.7.185 - - [28/Mar/2010:16:11:34 -0700] "GET /icons/compressed.gif HTTP/1.1" 200 1038 "http://www.smythies.com/~doug/harrison/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CL$

96.50.7.185 - - [28/Mar/2010:16:11:34 -0700] "GET /favicon.ico HTTP/1.1" 200 1150 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"

96.50.7.185 - - [28/Mar/2010:16:12:06 -0700] "GET /~doug/harrison/bla.zip HTTP/1.1" 103 93525412 "http://www.smythies.com/~doug/harrison/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .$

96.50.7.185 - - [28/Mar/2010:16:13:24 -0700] "GET /~doug/harrison/bla.zip HTTP/1.1" 103 87990865 "http://www.smythies.com/~doug/harrison/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .$

96.50.7.185 - - [28/Mar/2010:16:13:33 -0700] "GET /~doug/ HTTP/1.1" 200 3676 "http://www.smythies.com/~doug/harrison/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.4506.215$

96.50.7.185 - - [28/Mar/2010:16:13:33 -0700] "GET /cgi-bin/Count.cgi?df=doug.dat|display=Counter|ft=6|md=7|dd=A HTTP/1.1" 200 552 "http://www.smythies.com/~doug/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR $

96.50.7.185 - - [28/Mar/2010:16:14:10 -0700] "GET /index.html HTTP/1.1" 200 8711 "http://www.smythies.com/~doug/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.4506.2152; .N$

96.50.106.49 - - [28/Mar/2010:16:14:29 -0700] "GET /~doug/harrison/ HTTP/1.1" 200 709 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB0.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729) chrom$

96.50.106.49 - - [28/Mar/2010:16:14:29 -0700] "GET /icons/blank.gif HTTP/1.1" 200 148 "http://www.smythies.com/~doug/harrison/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB0.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CL$

96.50.106.49 - - [28/Mar/2010:16:14:29 -0700] "GET /icons/compressed.gif HTTP/1.1" 200 1038 "http://www.smythies.com/~doug/harrison/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB0.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .$

96.50.106.49 - - [28/Mar/2010:16:14:29 -0700] "GET /icons/back.gif HTTP/1.1" 200 216 "http://www.smythies.com/~doug/harrison/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB0.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR$

96.50.106.49 - - [28/Mar/2010:16:14:29 -0700] "GET /favicon.ico HTTP/1.1" 200 1150 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB0.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729)"         

96.50.7.185 - - [28/Mar/2010:16:14:56 -0700] "GET /~doug/harrison/bla.zip HTTP/1.1" 206 87303579 "http://www.smythies.com/~doug/harrison/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .$

207.46.12.165 - - [28/Mar/2010:16:53:20 -0700] "GET /index.html HTTP/1.1" 200 8711 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2;  SLCC1;  .NET CLR 1.1.4322)"

96.50.106.49 - - [28/Mar/2010:16:14:36 -0700] "GET /~doug/harrison/bla.zip HTTP/1.1" 200 93525412 "http://www.smythies.com/~doug/harrison/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB0.0; SLCC1; .NET CLR 2.0.50727; Media Center PC $

64.124.203.76 - - [28/Mar/2010:18:12:41 -0700] "GET /~doug/harrison/bla.zip HTTP/1.1" 103 93525412 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)" <<< an isolated "GET" with no preceding directory listing (Note: file path is already lower case)(Pull was aborted after ~~8 megabytes)

 

IP Address 207.46.12.165 reverse lookup gives "msnbot-207-46-12-165.search.msn.com". I have noticed some MSN web crawling robots before without a proper user agent string. Notice also that this one did not check the robots.txt file first, as is the accepted standard for web crawlers. Below is an example that follows proper accepted procedure where first the robots.txt file is checked and the user agent string clearly identifies the crawler. (Note: the robots.txt file doesn't have to be checked for every access, it only has to be checked occasionally. In the above case no msnbot, with or without a proper user agent string, had checked the robots.txt file at all that day.)

 

207.46.13.50 - - [12/Apr/2010:14:16:30 -0700] "GET /robots.txt HTTP/1.1" 200 5814 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 

207.46.13.50 - - [12/Apr/2010:14:17:21 -0700] "GET /bel_haven_2004/index.htm HTTP/1.1" 200 2739 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 

 

In this case I believe that 207.46.12.165 is not acting as an msnbot web crawler, but that it is actually step 1 in the SmartScreen check, step 2 being the undesired "HTTP GET" four hours later, where the full URL had been provided by the SmartScreen filter. Other 207.46.XX.XX msnbot type IP addresses that do not declare their user-agent string properly and that have visited this web site include, but are not limited to:

207.46.204.162 (msnbot-207-46-204-162.search.msn.com)

207.46.204.165 (msnbot-207-46-204-165.search.msn.com) (from above text, repeated in this list)

207.46.204.168 (msnbot-207-46-204-168.search.msn.com)

207.46.204.173 (msnbot-207-46-204-173.search.msn.com)

207.46.204.207 (msnbot-207-46-204-207.search.msn.com)

207.46.204.208 (msnbot-207-46-204-208.search.msn.com)

207.46.204.213 (msnbot-207-46-204-213.search.msn.com)

 

How to check Microsoft Internet Explorer SmartScreen Filter setting:

 

By the way, my suggestion is to always have such crap turned off. Note: Screen shots are for IE version 8. There may be differences for other versions.

 

Open the "Tools" menu tab; Select "Internet Options" from the pull down menu.

 

setting_01_.png

 

Scroll down the list and find the "Enable SmartScreen Filter" check box:

 

setting_02_.png

 

A side note about the SmartScreen Filter:

 

While trying to re-create the undesired "HTTP GET"s, I did some tests with the SmartScreen Filter enabled. It reported that the download was O.K., however, in my opinion, it could not have possibly known that it was O.K.

 

IE_02_.png