(949)446-1716 Give us a call Mon-Fri 9am-5pm

Insights

Using Web Bots to hunt for B2B marketing leads

How we got vendor email addresses

Lets use houzz.com (educational purposes)  as our target for this example, goal is to obtain email addresses from the sites online vendors directory. 

Problem is the email addresses ARE NOT available on HOUZZ's website. 

We'll walk you through how we overcame this problem and got what we needed. 

The Houzz BOT at work....Console reporting back results

analysis of site and strategy used

mainpage
Houzz vendor listings

targets indexed

So we use this page to index all the vendors aka our targets.

detail_page
Vendor Detail Page

Email Workaround

We then use our bot to visit each vendors page, and collect any relative vendors details. Unfortunately no email address is listed. but. they do provide the vendors website, so we'll send our bot there next

website_email
Vendors Website

Obtain Payload

Our bot is now on the vendors website, we instruct it to scour the sites pages in search for email address (aka payload) 

Here is the code which you can access bitbucket repository

The MySQL code is commented out, in case youd rather store the data retrieved in a database. I opted to place results in a CSV file.

The script write 2 CSV files, one for indexing website URLs in step 1.

In step 2 we call the CSV file of target of vendor website URLs to search for the email addresses

Feedback is outputted in the terminal using fwrite(STDOUT)


//You can get these files over at my https:\/\/internettechnologyservices.com//internettechnologyservices.com//bitbucket.org/nicknguyenzrd/houzzbot/
require("crawler.php");
require('CSSQuery.php');

error_reporting(E_ERROR | E_PARSE);
//@ini_set('display_errors', 0);



/* Uncomment below to store data in MYSQL
$servername = "localhost";
$username = "root";
$password = "";
$dbname = "invoice";
// Create connection
$conn = new mysqli($servername, $username, $password, $dbname);
// Check connection
if ($conn->connect_error) {
	die("Connection failed: " . $conn->connect_error);
}
*/

//Step 1 Gather  Houzz Links

//Open Links File cause thats where well dump our data payload
$handle = fopen("links.txt", "r");
$id=94;
$type=1;
//Data Placeholder Array
$data['href']=array();
$data['company']=array();
$data['type']=array();
$data['id']=array();
$id=1;

//Deal with multiple page results with The All Powerful Iterative Loop 
for ($i = 1; $i <= 30; $i++) { $doc = new DOMDocument(); if($i===1) {$p=0; //To grab the first page had a different URL $doc->loadHTML( file_get_contents( "http://www.houzz.com/professionals/landscape-architect/orange-county"));
	}else{
//Every Page after the first page "/p/{page number}"
	$doc->loadHTML( file_get_contents( "http://www.houzz.com/professionals/landscape-architect/orange-county/p/" . $p ) );
	}
//Webpage loaded for us  
	$css = new CSSQuery( $doc );
	$arr = array();
	$arr = $css->query( 'a.pro-title' );
	foreach ( $arr as $a ) {
		//Get URL Link Filter out Javascript
		if ( $a->attributes->getNamedItem( 'href' )->value === "javascript:;" ) {
		} else {
			//Store link and company name
            $data['id'][]=$id;
			$data['href'][]    = $a->attributes->getNamedItem( 'href' )->value;
			$data['company'][] = $a->nodeValue;
			$data['type'][]=1;
			//Open our List of Links Page
			$handle = fopen('links.txt',"a+");
			$somecontent = $a->attributes->getNamedItem( 'href' )->value."\r\n";
			fwrite($handle,$somecontent);
			fwrite(STDOUT, $somecontent);
			fclose($handle);
			$id++;
		}

	}
	$p=$p+15;
	sleep(1);
	unset($doc);
	unset($css);
	//var_dump( $data );
}

//Step 2 Gather company details (Houzz doesnt list email addresses), so well have to improvise and go to there website to acquire target email contact if its listed on there website. 

//Make sure we double check were dealing with valid URLS cause that can really fuck things up once this bitch is fired up!
function get_valid_url( $url ) {
	$regex = "((https?|ftp)\:\/\/)?"; // Scheme
	$regex .= "([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?"; // User and Pass
	$regex .= "([a-z0-9-.]*)\.([a-z]{2,3})"; // Host or IP
	$regex .= "(\:[0-9]{2,5})?"; // Port
	$regex .= "(\/([a-z0-9+\$_-]\.?)+)*\/?"; // Path
	$regex .= "(\?[a-z+&\$_.-][a-z0-9;:@&%=+\/\$_.-]*)?"; // GET Query
	$regex .= "(#[a-z_.-][a-z0-9+\$_.-]*)?"; // Anchor
	return preg_match("/^$regex$/", $url);
}

if ($handle) {
	while ( ( $line = fgets( $handle ) ) !== false ) {
		$email="";
		$website="";
		$url="";
		$name="";
		$company="";
		$phone="";
		$link="";
		$tier="";
		$location="";
		$license="";
		$error="";
		$sql="";
		$doc = new DOMDocument();
		$doc->loadHTML( file_get_contents( $line ) );
		$css               = new CSSQuery( $doc );
		//Houzz Link to profile
		$data['link']=$line;
		$link=trim($line);
		//Company Name
		$nrr               = $css->query( 'a.profile-full-name' );
		$data['company'][] = $nrr[0]->textContent;
		$company=$nrr[0]->textContent;
		fwrite(STDOUT, "Starting: ".$id.":".$nrr[0]->textContent."\r\n");
		//Website and Email Addresses TODO add conditional statement
		$arr               = $css->query( 'a.proWebsiteLink' );
		foreach ( $arr as $a ) {
			$url= $a->attributes->getNamedItem( 'href' )->value;
if(get_valid_url($url)) {
	$data['website'][] = $url;
	$website=$url;
	fwrite(STDOUT, "Attempting site: ".$url."\r\n");
	$parse = parse_url($url);
	$foo = new crawler($url,$parse['host'],2,true,true);
	$result=$foo->init();

//Found Email Address
	if(isset($result['emails'][0])){
		$email=$result['emails'][0];
		//Output indicating email address discovered CLI 
		fwrite(STDOUT, "Found Email Address: ".$result['emails'][0]."\r\n");
	}
}
		}
		//Phone Number
		$crr = $css->query( 'span.pro-contact-text' );
		foreach ( $crr as $c ) {
			if($c->nodeValue!=="Website") {
				$data['phone'][] = $c->nodeValue;
				$phone           = $c->nodeValue;
			}
		}
		//All company details
		$info = $css->query( 'div.info-list-text' );
		$str="";
		foreach ( $info as $i ) {
			$test = $i->nodeValue;
//Person to contact
			if (strpos( $test, "Contact:" )!==FALSE) {
				$name= str_replace( "Contact:",'', $test );
				$name=trim($name);
				$data['contact'][] =$name;
			}
//Address
			if (strpos( $test, "Location:" )!==FALSE) {
				$location = str_replace( "Location:",'', $test );
				$location=trim($location);
				$data['location'][]=$location;
			}
//License Number
			if (strpos( $test, "License Number:" )!==FALSE) {
				$license=str_replace( "License Number:",'', $test );
				$license=trim($license);
				$data['license'][] =$license;
			}
//Tier
			if (strpos( $test, "Typical Job Costs:" )!==FALSE) {
				$tier =str_replace( "Typical Job Costs:",'', $test );
				$tier=trim($tier);
				$data['tier'][]=$tier;
		
			}
		}
		
//Write architect contact information into a CSV file
		$wr= fopen('archs.csv',"a+");
		$str=trim($str);
//Architect Record
	$details = $id.",".$type.",\"".$company."\",\"".$phone."\",\"".$url."\",\"".$email."\",\"".$link."\",\"".$contact."\",\"".$location."\",\"".$license."\",\"".$tier."\" \r\n";
		fwrite($wr,$details);
		//Disable Comment Below to OutPut to CLI 
		//fwrite(STDOUT, $details);
		$id++;
		fclose($wr);

/*  Uncomment below if youd rather insert scrapped data into MySQL Database 

		$sql = "INSERT INTO ip_oppurtunities(`type`,`company`,`phone`,`website`,`email`,`link`,`contact`,`location`,`license`,`tier`)
VALUES (1,'$company','$phone','$website','$email','$link','$name','$location','$license','$tier')";
		if ($conn->query($sql) === TRUE) {
			fwrite(STDOUT, $id.'-'.$company." Added \r\n");
		} else {
			$error=mysqli_error($conn);
			fwrite(STDOUT,  "Error: ".$company."=[".$sql."]".$error."\r\n");
			echo $error;
			die();
		}
		$conn->close();
		$id++;
*/

	}
	fclose($handle);
}

Proof, hardware is on its way out

https:\/\/internettechnologyservices.com//internettechnologyservices.com//youtu.be/dk3fel0U2C0

As indicated in the video about Foxconn the largest electronics manufacturer in the world, software technology such as the cloud, is overtaking the need for hardware.

The final destination of your previously enjoyed cassette players and pagers, keyboards and mouses shall soon follow, and right behind it is the actual device your looking at right now.

Amazon, Microsoft and Google are buying the latest in hardware technology, so you don’t have to. Imagine not having to buy each new employee a computer, and instead leasing a virtual desktop for which there’s minimal upfront cost no commitments, priced at a rate upon actual useage time? As a start up business this can greatly reduce your initial start up cost.

The above scenario is now a tried and true reality that any business can implement right now. In fact we’ve been doing just that for our clients. The internet based technology is the future , swallowing up entire industries like a merciless dictator treats a nonconformist.

8 Tips for picking a better domain name

 

Not only do these tips coincide with easier search engine indexing, They take into consideration human psychology, human memory is vital because if your end user doesn’t recall the proper address they are a lot less likely to reach there intended destination

  1. Choose a .com extension

  2. Use your brand name

  3. Don’t use exact match domains. Partial match domains are okay, but a brand name is always preferred

  4. Make it memorable

  5. Make sure its easy to spell.

  6. Avoid special characters.

  7. Avoid misspelling

8 Under 14 Character, shorter the better.

Shared Hosting

Sharing Address with many others – In shared hosting 1 address could be shared by thousands of other domains. Shared hosting is like sharing a large building, and you have a room and share all the remaining amenities with everyone else.

Increased Hack Vulnerability – If criminal breaks into the building, he would have acccess to your personal belongings as well.

Limited control of Digital Assets – You are renting your space from someone else, and simply storing your personal belongings there, in a dispute the landlord could withhold your personal belongs.

Guilty by Association – The more people that share your IP, you could by deemed guilty by association, if your neighbors are using there domain for nefarious purposes or sex offenders.

Costly Moving – When your ready to move out, you will have to pay for professional movers, some building are not move out friendly.

Technical Perception – Search Engines are able to discern sites on shared accounts versus dedicated

Ownership
Unfeathered Control over your digital assets

Dedicated Address You have your own dedicated address

Security Minimal Public access limits your properties exposure to criminals

Status – Search Engines and future investors will take your more seriously

5 Ways to help ensure your websites secure

These days web sites can expect to be attacked by web bots as soon as a search engine index’s there site. The more advanced web bots will launch every known attack in order to find a vulnerability to exploit, and report back to its attacker, whom can conduct various nefarious acts on your server, any of which could greatly affect your business, domain and/or clients. Recently a very common attack on businesses that does not get enough publicity is ransomware, which could basically criple a companies operations until they pay the attacker money, most times these attacks dont require more effort to accomplish, so even small businesses can fall victim. Here are some tips I think every site should implement to ensure there website is secure.

  1. Run updates on your server and all your software frequently

Plugins are vitally important for keeping websites ahead of potential security breaches. Ensure the ability to create and utilize custom plugins to reinforce site security, such as those that block particular IP addresses. Equally important, developers must also practice up-to-date coding and development standards and should use modern versions of platforms such as PHP and Apache. WDG developers, for example, use only coding structures and syntax patterns that have been proven secure and effective in order to maintain industry best practices.

  1. Require a strong password policy

Require a strong password policy anywhere a password on your site and throughout your organization anywhere a password is used. Attackers will often brute force your user passwords using scripts, which is akin to trying every common password for every username until one is found, such an attack takes a few minutes to conduct and is also typically automated.

  1. Conduct a regular back-up of your applications and databases

Having a backup copy of your applications and database could potentially save your business from any damage and/or prevent the loss of important information such as customer or sales records due damage was done by an attacker or virus. Once the damage is identified, without some sort of backup copy could take a business website offline indefinitely. For e-commerce websites, this could destroy the entire business.

  1. Obtain an SSL (Secure Socket Layer) Certificate and employ an Encryption Protocol

The information traveling from your website to your end-user can be intercepted by an unknowing third party, whom can capture sensitive information such as credit card information, username, and passwords. SSL or secure socket layer secures the information by encrypting it before it transmitted.

  1. Only install plugins or code from trusted sources

Attackers will often disguise viruses in actual functioning plugins, thereby allowing an attacker access to your server. These attacks often go unnoticed until it’s too late.

These steps are implemented automatically whenever possible when I develop for my clients. From a development standpoint, implementing the tips become more costly and difficult to implement as a site grows,

Real world example of unintentional consequences

Some people, best stay away from technology, rather than tell you, go see for yourself and Google “inurl:main.cgi linksys” (minus the “)….click on a few links and fulfill that inner “peeping tom” in you (don’t worry your secrets safe with me)

We now know for sure, encryption wont ensure privacy.

In the hours since the documents were made available by WikiLeaks, a misconception was developed, making people believe the CIA “cracked” the encryption used by popular secure messaging software including Signal and WhatsApp.

WikiLeaks asserted that:

“These techniques permit the CIA to bypass the encryption of WhatsApp, Signal, Telegram, Wiebo, Confide and Cloakman by hacking the “smart” phones that they run on and collecting audio and message traffic before encryption is applied.”

This statement by WikiLeaks made most people think that the encryption used by end-to-end encrypted messaging clients such as Signal and WhatsApp has been broken.

No, it hasn’t.

Instead, the CIA has tools to gain access to entire phones, which would of course “bypass” encrypted messaging apps because it fails all other security systems virtually on the phone, granting total remote access to the agency.

The WikiLeaks documents do not show any attack particular against Signal or WhatsApp, but rather the agency hijacks the entire phone and listens in before the applications encrypt and transmit information.

It’s like you are sitting in a train next to the target and reading his 2-way text conversation on his phone or laptop while he’s still typing, this doesn’t mean that the security of the app the target is using has any issue.

In that case, it also doesn’t matter if the messages were encrypted in transit if you are already watching everything that happens on the device before any security measure comes into play.

That being said, encryption wont protect your privacy. Uncle Sam has the ability to see and hear everything on our modern devices.

A visual comparison proving where you host your site matters.

Most website owners and marketing often overlook how a web server and its network is configured. The purpose of this article is to provide a "not-so" technical person with a brief side by side comparison of two sites, and how those settings are most likely effecting there search engine performance.

GearedForSpeed.com NativeWest.com

Both sites are basic "store-front" websites. Both are start-up companies. Just like a physical stores address, websites utilize IP addresses, so people are able to locate the website. In shared hosting multiple domains can share a single IP address.

GearedForSpeed.com IP NativeWest.com IP

GearedForSpeed.com shares its IP with over a thousand other domains. Where as NativeWest.com is the only domain associated with its IP address. If you were a search engine, which website seems more credible?

​​

GearedForSpeed.com NativeWest.com
   
1,357 Other Domains 1 Domain

Sharing IP address with other domains, could also mean your domain could be penalized for the conduct of one of your neighboring domains.

GearedForSpeed.com NativeWest.com
 
7 SPAM Host

Blocked for Adult Content

Clean History

Here is how each company is pulling on search engines when using the company name.

GearedForSpeed.com NativeWest.com
 
Not Listed Top Position

Its important to note, NativeWest.com pays less each month for hosting then GearedForSpeed.com does. This boils down to the capabilities of whomever set up your web server and your account type. Cloud Hosting has greatly reduced the cost of hosting for IT professionals, a developer with experience in setting up dedicated networks, could save money and gain you more traffic.

Reach out to me, if you would like me to review your site.

Artificial Intelligence threatens most jobs

I learned about machine learning “artificial intelligence” recently. The titans of the internet today google, apple, Microsoft, and amazon have been publishing Developer API to start using these services. I had a free moment yesterday, I took the time to wrap my mind around the API, to make a living in freelance its our responsibility to remain relative. Only to be greeted by the grim reaper of human skill, An all to familiar adversary, onto the road to sustainable success/wealth in this life.

From my research, I realized, Artificial Intelligence exist, computers can generate there own code. This shall deprecate the many programming languages, into a consolidated object oriented language, and shortly there after typing will not be necessary, I suspect at which time we’ll have truly “smart” devices that are aware of your intentions as well as other devices.

I got into programming because thought it would be a good skill for the long term. Granted it has treated me well, The deprecation of my trade means the deprecation of a majority of modern trades.  Simply put, Soon all work will be replaced by machines.

Hosting like the big boys…on the cheap!

Amazon is out to own the web hosting space with its new lightsail space. Lightsail offers a setup of the cloud Virtual Private Server instance, and all the trimmings that typically would require an experienced IT professional to configure to make possible. Making it possible for small businesses to have a scalable private server for as low as $5 a month, and any newbie developer should be able to set it up!! If your still using Godaddy, bluehost,hostgator and sharing one IP address and server space, among hundreds (sometimes thousands) of other websites in a “shared” hosting account (which search engines can see), it would be silly not to migrate.

What I can determine about you from your browser alone

Our browsers send off quite a bit of information about us now. Using the fingerprintJS library I was able to abstract this information.

Browser Used:
language:
color_depth:
pixel_ratio:
resolution:
available_resolution:
timezone_offset:
session_storage:
local_storage:
indexed_db:
cpu_class:
navigator_platform:
do_not_track:
regular_plugins:
canvas:
canvas winding:yes~canvas
webgl:
adblock:
has_lied_languages:
has_lied_resolution:
has_lied_os:
has_lied_browser:
touch_support
js_fonts

 

Open chat
1
Hello
How can we help you?