View Single Post
  #5 (permalink)  
Old 02-18-2008, 09:50 PM
prasannavigneshr prasannavigneshr is offline
D-Web Incredible
 
Join Date: Feb 2007
Posts: 1,321
prasannavigneshr is on a distinguished road
Send a message via MSN to prasannavigneshr
Thumbs up ColdFusion Tips & Tricks - Blocking site rippers

Blocking site rippers


You spend weeks building a website, pour hours of your life into it, and discovered recently that someone site-ripped all of your generated HTML and published it elsewhere.

Now that sucks.

Here's a function that will help prevent against site rippers yet allow spiders and normal users to browse unrestricted. The way it works is to count the number of page views per IP address per minute and if that count exceeds a predefined value, block access.

Note that if you set the max page hit to 30 and the user browses your site 29 times in a minute on the clock, they won't be blocked unless they view 31 pages in the next minute on the clock. While I understand that this is not ideal, it's a compromise between speed an functionality. Originally, I created a dynamic query that stored the time stamp of every page view. However, when I tried to clean up the query by selecting only records that occurred in the previous x amount of time, the Query of Queries can not properly handle time objects such as those created with the CreateODBCDateTime function. So instead of manually rebuilding the query row for row on every page view, I opted for this method.

Example HTML/CFML code:

Code:
<!---
MaxHits: The maximum number of page views allowed in a given minute

While doing this much work inside an exclusive lock of the application scope is undesireable, it's the
only way to defeat multi-threaded site rippers.
--->
<CFSET MaxHits=30>

<!--- Only check if not identified as a spider, allow most spiders to browse unrestricted (remove if not wanted) http://www.psychedelix.com/agents.html --->
<CFIF ListContainsNoCase(CGI.HTTP_User_Agent,"bot,spider,spyder,agent,altavista,crawl,arachno,24x,seek,search,fetch,deadlink,index,diagem,google") EQ 0>

	<CFSET UserIP="IP_" & Replace(CGI.Remote_Addr,".","_","ALL")>
	<cflock timeout="30" throwontimeout="Yes" type="EXCLUSIVE" scope="APPLICATION">
		<CFIF IsDefined("Application.IPHistory") EQ "No">
			<CFSET Application.IPHistory=StructNew()>
			<CFSET Application.IPHistory.Cnt=0>
		</CFIF>
		<CFIF IsDefined("Application.IPHistory.#UserIP#") EQ "No">
			<CFSET "Application.IPHistory.#UserIP#"=StructNew()>
			<CFSET History=StructNew()>
			<CFSET History.TimeStamp=TimeFormat(Now(),"HH:mm")>
			<CFSET History.Cnt=0>
		<CFELSE>
			<CFSET History=Duplicate(Evaluate("Application.IPHistory.#UserIP#"))>
		</CFIF>

		<CFSET CurrentTimeStamp=TimeFormat(Now(),"HH:mm")>
		<CFIF CurrentTimeStamp NEQ History.TimeStamp>
			<CFSET History.TimeStamp=CurrentTimeStamp>
			<CFSET History.Cnt=0>
		</CFIF>

		<CFSET History.Cnt=History.Cnt + 1>

		<CFSET "Application.IPHistory.#UserIP#"=Duplicate(History)>

		<!--- Keep track of the overall page hits for all users and remove IP's that haven't accessed the site in the past 10 minutes --->
		<CFSET Application.IPHistory.Cnt=Application.IPHistory.cnt + 1>
		<CFIF Application.IPHistory.Cnt GT 1000>
			<CFSET Application.IPHistory.Cnt=0>
			<CFLOOP index="CurrIP" list="#StructKeyList(Application.IPHistory)#">
				<CFIF CurrIP NEQ "CNT">
					<CFSET TS=DateFormat(Now(),"mm/dd/yyyy") & " " & Evaluate("Application.IPHistory.#CurrIP#.TimeStamp") & ":00">
					<CFIF DateDiff("n",TS,Now()) GT 10>
						<CFSET tmp=StructDelete(Application.IPHistory,CurrIP)>
					</CFIF>
				</CFIF>
			</CFLOOP>
		</CFIF>
	</cflock>

	<CFIF History.Cnt GT MaxHits>
		<CFOUTPUT>
		<h1>Access Denied</h1>
		Your access has been temporarly blocked due to excessive access.
		</CFOUTPUT>
		<CFABORT>
	</CFIF>

</CFIF>
__________________
Prasanna Vignesh
MCPD | Web Developer
Reply With Quote