The case against shared hosting
Using Web Bots to hunt for B2B marketing leads
How we got vendor email addresses
Lets use houzz.com (educational purposes) as our target for this example, goal is to obtain email addresses from the sites online vendors directory.
Problem is the email addresses ARE NOT available on HOUZZ's website.
We'll walk you through how we overcame this problem and got what we needed.
analysis of site and strategy used
Houzz vendor listings
targets indexed
So we use this page to index all the vendors aka our targets.
Vendor Detail Page
Email Workaround
We then use our bot to visit each vendors page, and collect any relative vendors details. Unfortunately no email address is listed. but. they do provide the vendors website, so we'll send our bot there next
Vendors Website
Obtain Payload
Our bot is now on the vendors website, we instruct it to scour the sites pages in search for email address (aka payload)
Here is the code which you can access bitbucket repository
The MySQL code is commented out, in case youd rather store the data retrieved in a database. I opted to place results in a CSV file.
The script write 2 CSV files, one for indexing website URLs in step 1.
In step 2 we call the CSV file of target of vendor website URLs to search for the email addresses
Feedback is outputted in the terminal using fwrite(STDOUT)
//You can get these files over at my https:\/\/internettechnologyservices.com//internettechnologyservices.com//bitbucket.org/nicknguyenzrd/houzzbot/
require("crawler.php");
require('CSSQuery.php');
error_reporting(E_ERROR | E_PARSE);
//@ini_set('display_errors', 0);
/* Uncomment below to store data in MYSQL
$servername = "localhost";
$username = "root";
$password = "";
$dbname = "invoice";
// Create connection
$conn = new mysqli($servername, $username, $password, $dbname);
// Check connection
if ($conn->connect_error) {
die("Connection failed: " . $conn->connect_error);
}
*/
//Step 1 Gather Houzz Links
//Open Links File cause thats where well dump our data payload
$handle = fopen("links.txt", "r");
$id=94;
$type=1;
//Data Placeholder Array
$data['href']=array();
$data['company']=array();
$data['type']=array();
$data['id']=array();
$id=1;
//Deal with multiple page results with The All Powerful Iterative Loop
for ($i = 1; $i <= 30; $i++) { $doc = new DOMDocument(); if($i===1) {$p=0; //To grab the first page had a different URL $doc->loadHTML( file_get_contents( "http://www.houzz.com/professionals/landscape-architect/orange-county"));
}else{
//Every Page after the first page "/p/{page number}"
$doc->loadHTML( file_get_contents( "http://www.houzz.com/professionals/landscape-architect/orange-county/p/" . $p ) );
}
//Webpage loaded for us
$css = new CSSQuery( $doc );
$arr = array();
$arr = $css->query( 'a.pro-title' );
foreach ( $arr as $a ) {
//Get URL Link Filter out Javascript
if ( $a->attributes->getNamedItem( 'href' )->value === "javascript:;" ) {
} else {
//Store link and company name
$data['id'][]=$id;
$data['href'][] = $a->attributes->getNamedItem( 'href' )->value;
$data['company'][] = $a->nodeValue;
$data['type'][]=1;
//Open our List of Links Page
$handle = fopen('links.txt',"a+");
$somecontent = $a->attributes->getNamedItem( 'href' )->value."\r\n";
fwrite($handle,$somecontent);
fwrite(STDOUT, $somecontent);
fclose($handle);
$id++;
}
}
$p=$p+15;
sleep(1);
unset($doc);
unset($css);
//var_dump( $data );
}
//Step 2 Gather company details (Houzz doesnt list email addresses), so well have to improvise and go to there website to acquire target email contact if its listed on there website.
//Make sure we double check were dealing with valid URLS cause that can really fuck things up once this bitch is fired up!
function get_valid_url( $url ) {
$regex = "((https?|ftp)\:\/\/)?"; // Scheme
$regex .= "([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?"; // User and Pass
$regex .= "([a-z0-9-.]*)\.([a-z]{2,3})"; // Host or IP
$regex .= "(\:[0-9]{2,5})?"; // Port
$regex .= "(\/([a-z0-9+\$_-]\.?)+)*\/?"; // Path
$regex .= "(\?[a-z+&\$_.-][a-z0-9;:@&%=+\/\$_.-]*)?"; // GET Query
$regex .= "(#[a-z_.-][a-z0-9+\$_.-]*)?"; // Anchor
return preg_match("/^$regex$/", $url);
}
if ($handle) {
while ( ( $line = fgets( $handle ) ) !== false ) {
$email="";
$website="";
$url="";
$name="";
$company="";
$phone="";
$link="";
$tier="";
$location="";
$license="";
$error="";
$sql="";
$doc = new DOMDocument();
$doc->loadHTML( file_get_contents( $line ) );
$css = new CSSQuery( $doc );
//Houzz Link to profile
$data['link']=$line;
$link=trim($line);
//Company Name
$nrr = $css->query( 'a.profile-full-name' );
$data['company'][] = $nrr[0]->textContent;
$company=$nrr[0]->textContent;
fwrite(STDOUT, "Starting: ".$id.":".$nrr[0]->textContent."\r\n");
//Website and Email Addresses TODO add conditional statement
$arr = $css->query( 'a.proWebsiteLink' );
foreach ( $arr as $a ) {
$url= $a->attributes->getNamedItem( 'href' )->value;
if(get_valid_url($url)) {
$data['website'][] = $url;
$website=$url;
fwrite(STDOUT, "Attempting site: ".$url."\r\n");
$parse = parse_url($url);
$foo = new crawler($url,$parse['host'],2,true,true);
$result=$foo->init();
//Found Email Address
if(isset($result['emails'][0])){
$email=$result['emails'][0];
//Output indicating email address discovered CLI
fwrite(STDOUT, "Found Email Address: ".$result['emails'][0]."\r\n");
}
}
}
//Phone Number
$crr = $css->query( 'span.pro-contact-text' );
foreach ( $crr as $c ) {
if($c->nodeValue!=="Website") {
$data['phone'][] = $c->nodeValue;
$phone = $c->nodeValue;
}
}
//All company details
$info = $css->query( 'div.info-list-text' );
$str="";
foreach ( $info as $i ) {
$test = $i->nodeValue;
//Person to contact
if (strpos( $test, "Contact:" )!==FALSE) {
$name= str_replace( "Contact:",'', $test );
$name=trim($name);
$data['contact'][] =$name;
}
//Address
if (strpos( $test, "Location:" )!==FALSE) {
$location = str_replace( "Location:",'', $test );
$location=trim($location);
$data['location'][]=$location;
}
//License Number
if (strpos( $test, "License Number:" )!==FALSE) {
$license=str_replace( "License Number:",'', $test );
$license=trim($license);
$data['license'][] =$license;
}
//Tier
if (strpos( $test, "Typical Job Costs:" )!==FALSE) {
$tier =str_replace( "Typical Job Costs:",'', $test );
$tier=trim($tier);
$data['tier'][]=$tier;
}
}
//Write architect contact information into a CSV file
$wr= fopen('archs.csv',"a+");
$str=trim($str);
//Architect Record
$details = $id.",".$type.",\"".$company."\",\"".$phone."\",\"".$url."\",\"".$email."\",\"".$link."\",\"".$contact."\",\"".$location."\",\"".$license."\",\"".$tier."\" \r\n";
fwrite($wr,$details);
//Disable Comment Below to OutPut to CLI
//fwrite(STDOUT, $details);
$id++;
fclose($wr);
/* Uncomment below if youd rather insert scrapped data into MySQL Database
$sql = "INSERT INTO ip_oppurtunities(`type`,`company`,`phone`,`website`,`email`,`link`,`contact`,`location`,`license`,`tier`)
VALUES (1,'$company','$phone','$website','$email','$link','$name','$location','$license','$tier')";
if ($conn->query($sql) === TRUE) {
fwrite(STDOUT, $id.'-'.$company." Added \r\n");
} else {
$error=mysqli_error($conn);
fwrite(STDOUT, "Error: ".$company."=[".$sql."]".$error."\r\n");
echo $error;
die();
}
$conn->close();
$id++;
*/
}
fclose($handle);
}
Voice Controlled Side Project
Check out a side project ive been working Using machine learning/ artificial intelligence we control a security system thru voice commands
In this video I demonstrate how I control my home security cameras using voice, as well as how to control my computers browser to do youtube searches and internet searches
We build a TV Channel
Thru the "Makers Network" project we bring our ninja internet skills to traditional TV broadcasting, by creating our very own TV channel on the ROKU network of 40 Million subscribers.
Rather than paying for advertising, with appealing content there's potential to get paid for marketing your brand.
We also believe that viewers are also more open to advertising when watching TV in there homes. Content can also be created to sell products indirectly.
ROKU provides a lot of flexibility and options in terms of advertising on the channel. We can help come up with a proper strategy.
Combating Robocalls with Robo-Bouncer
In this case study we bring our internet skills to traditional telecommunications, in a project we dubbed the RoboBouncer. RoboBouncer is a bot that screens your phone voice phone calls by asking for the user to speak there name, it then calls you plays the recording and provides you with an option to accept or reject the call.
With full programmatic control of call handling, the possibilities the internet brings to traditional calling are endless. We have set up phone IVR systems (Company Directories), Lead Transfer Systems to High Volume outbound calling.
To test out the RoboBouncer you can call our phone number (949)446-1716
After popular demand we made a commercial version that automates the deployment of a RoboBouncer. Click here to create your own RoboBouncer
For the technically inclined we have posted the code and info at BitBucket
Creating a telephone call screening service using Twilio
The Call Flow
-Caller calls your Twilio Number. The agent will ask the caller to speak there name and record it. Then place the caller in enquene (Inbound Call) -Using Twilio REST API initiate an (outbound call), from your Twilio Number to your personal phone number. Plays the recording, and presents you with the following options:
Press 1 (DTMF) or Say Accept (SpeechResult) -connects the call to Caller using Dial Queue Verbs -OR- Press 2 (DTMF) or say Reject (SpeechResult) ->Using REST API we notify the caller and end the call enqueue and hang up our call
Uses TWIML API and Twilio Voice API PHP SDK
Uses Queue https:\/\/internettechnologyservices.com//internettechnologyservices.com//www.twilio.com/docs/voice/api/queue-resource and EnQueue to place caller on hold. Could use Conference verb as well instead of Enqueue, just replace Enqueue with Conference for more options
Follow this wonderful guide to set up your free twilio account, get a phone number and your API SID and Token https:\/\/internettechnologyservices.com//internettechnologyservices.com//www.twilio.com/docs/voice/quickstart/php
Dont forget to assign the webhook step1.xml (or if you use TWIMLBIN select the twimlbin URL) to your twilio phone number
Setting up a webserver isnt necessary. Use Twilios TWIMLBin for the XML files and "Functions" for the PHP files (convert logic to Javascript the IVR Menu example is a good starting blueprint) -I would suggest loading the XML files into TWIMLBIN let Twilio host the files so you dont have too, and if your capable Twilio Functions for the server side Code
Proof, hardware is on its way out
https:\/\/internettechnologyservices.com//internettechnologyservices.com//youtu.be/dk3fel0U2C0
As indicated in the video about Foxconn the largest electronics manufacturer in the world, software technology such as the cloud, is overtaking the need for hardware.
The final destination of your previously enjoyed cassette players and pagers, keyboards and mouses shall soon follow, and right behind it is the actual device your looking at right now.
Amazon, Microsoft and Google are buying the latest in hardware technology, so you don’t have to. Imagine not having to buy each new employee a computer, and instead leasing a virtual desktop for which there’s minimal upfront cost no commitments, priced at a rate upon actual useage time? As a start up business this can greatly reduce your initial start up cost.
The above scenario is now a tried and true reality that any business can implement right now. In fact we’ve been doing just that for our clients. The internet based technology is the future , swallowing up entire industries like a merciless dictator treats a nonconformist.
Complete Business Start-up
We helped Native West start there business. With a wide array of solutions, from IP Phone Management, Bidding System, and not only designing there website but also getting it on top of search engines.
Website Design
IT Services was responsible for the design and development of Native West main website.
SEO
Additional Site Links in Search Results
Native West was a term that was difficult to compete for because it was dominated by Native American related topics. But in short order and proper site structure search engines not only gave us the top spot we also displayed with additional search results
Phone System
Managed Phone System
Managed Phone System allowing Native West to customise phone call flow, directing calls to field agents etc.
Management Platform
Bidding and Contract System
This is the companies backbone because its responsibility is to coordinate and connects the various departments (Sales, Construction, Accounting and Management) in the many stages of landscape construction.
Want more control over your company?
Schedule a free phone consultation
Interactive demonstrations of how we keep users engaged
Is your account info compromised?
We hope you find this functional demonstration useful and informative, we were quite surprised to find a couple of our own accounts were compromised, to prevent abuse we can only show you a portion of the password. **We hate SPAM email addresses provided will not be redistributed.
8 Tips for picking a better domain name
Not only do these tips coincide with easier search engine indexing, They take into consideration human psychology, human memory is vital because if your end user doesn’t recall the proper address they are a lot less likely to reach there intended destination
- Choose a .com extension
-
Use your brand name
-
Don’t use exact match domains. Partial match domains are okay, but a brand name is always preferred
-
Make it memorable
-
Make sure its easy to spell.
-
Avoid special characters.
-
Avoid misspelling
8 Under 14 Character, shorter the better.
Shared Hosting
Sharing Address with many others – In shared hosting 1 address could be shared by thousands of other domains. Shared hosting is like sharing a large building, and you have a room and share all the remaining amenities with everyone else.
Increased Hack Vulnerability – If criminal breaks into the building, he would have acccess to your personal belongings as well.
Limited control of Digital Assets – You are renting your space from someone else, and simply storing your personal belongings there, in a dispute the landlord could withhold your personal belongs.
Guilty by Association – The more people that share your IP, you could by deemed guilty by association, if your neighbors are using there domain for nefarious purposes or sex offenders.
Costly Moving – When your ready to move out, you will have to pay for professional movers, some building are not move out friendly.
Technical Perception – Search Engines are able to discern sites on shared accounts versus dedicated
Ownership
Unfeathered Control over your digital assets
Dedicated Address You have your own dedicated address
Security Minimal Public access limits your properties exposure to criminals
Status – Search Engines and future investors will take your more seriously
5 Ways to help ensure your websites secure
These days web sites can expect to be attacked by web bots as soon as a search engine index’s there site. The more advanced web bots will launch every known attack in order to find a vulnerability to exploit, and report back to its attacker, whom can conduct various nefarious acts on your server, any of which could greatly affect your business, domain and/or clients. Recently a very common attack on businesses that does not get enough publicity is ransomware, which could basically criple a companies operations until they pay the attacker money, most times these attacks dont require more effort to accomplish, so even small businesses can fall victim. Here are some tips I think every site should implement to ensure there website is secure.
- Run updates on your server and all your software frequently
Plugins are vitally important for keeping websites ahead of potential security breaches. Ensure the ability to create and utilize custom plugins to reinforce site security, such as those that block particular IP addresses. Equally important, developers must also practice up-to-date coding and development standards and should use modern versions of platforms such as PHP and Apache. WDG developers, for example, use only coding structures and syntax patterns that have been proven secure and effective in order to maintain industry best practices.
- Require a strong password policy
Require a strong password policy anywhere a password on your site and throughout your organization anywhere a password is used. Attackers will often brute force your user passwords using scripts, which is akin to trying every common password for every username until one is found, such an attack takes a few minutes to conduct and is also typically automated.
- Conduct a regular back-up of your applications and databases
Having a backup copy of your applications and database could potentially save your business from any damage and/or prevent the loss of important information such as customer or sales records due damage was done by an attacker or virus. Once the damage is identified, without some sort of backup copy could take a business website offline indefinitely. For e-commerce websites, this could destroy the entire business.
- Obtain an SSL (Secure Socket Layer) Certificate and employ an Encryption Protocol
The information traveling from your website to your end-user can be intercepted by an unknowing third party, whom can capture sensitive information such as credit card information, username, and passwords. SSL or secure socket layer secures the information by encrypting it before it transmitted.
- Only install plugins or code from trusted sources
Attackers will often disguise viruses in actual functioning plugins, thereby allowing an attacker access to your server. These attacks often go unnoticed until it’s too late.
These steps are implemented automatically whenever possible when I develop for my clients. From a development standpoint, implementing the tips become more costly and difficult to implement as a site grows,
Real world example of unintentional consequences
Some people, best stay away from technology, rather than tell you, go see for yourself and Google “inurl:main.cgi linksys” (minus the “)….click on a few links and fulfill that inner “peeping tom” in you (don’t worry your secrets safe with me)
We now know for sure, encryption wont ensure privacy.
A visual comparison proving where you host your site matters.
Most website owners and marketing often overlook how a web server and its network is configured. The purpose of this article is to provide a "not-so" technical person with a brief side by side comparison of two sites, and how those settings are most likely effecting there search engine performance.
GearedForSpeed.com | NativeWest.com |
---|---|
Both sites are basic "store-front" websites. Both are start-up companies. Just like a physical stores address, websites utilize IP addresses, so people are able to locate the website. In shared hosting multiple domains can share a single IP address.
GearedForSpeed.com IP | NativeWest.com IP |
---|---|
GearedForSpeed.com shares its IP with over a thousand other domains. Where as NativeWest.com is the only domain associated with its IP address. If you were a search engine, which website seems more credible?
GearedForSpeed.com | NativeWest.com |
---|---|
1,357 Other Domains | 1 Domain |
Sharing IP address with other domains, could also mean your domain could be penalized for the conduct of one of your neighboring domains.
GearedForSpeed.com | NativeWest.com |
---|---|
7 SPAM Host
Blocked for Adult Content |
Clean History |
Here is how each company is pulling on search engines when using the company name.
GearedForSpeed.com | NativeWest.com |
---|---|
Not Listed | Top Position |
Its important to note, NativeWest.com pays less each month for hosting then GearedForSpeed.com does. This boils down to the capabilities of whomever set up your web server and your account type. Cloud Hosting has greatly reduced the cost of hosting for IT professionals, a developer with experience in setting up dedicated networks, could save money and gain you more traffic.
Reach out to me, if you would like me to review your site.
Artificial Intelligence threatens most jobs
I learned about machine learning “artificial intelligence” recently. The titans of the internet today google, apple, Microsoft, and amazon have been publishing Developer API to start using these services. I had a free moment yesterday, I took the time to wrap my mind around the API, to make a living in freelance its our responsibility to remain relative. Only to be greeted by the grim reaper of human skill, An all to familiar adversary, onto the road to sustainable success/wealth in this life.
From my research, I realized, Artificial Intelligence exist, computers can generate there own code. This shall deprecate the many programming languages, into a consolidated object oriented language, and shortly there after typing will not be necessary, I suspect at which time we’ll have truly “smart” devices that are aware of your intentions as well as other devices.
I got into programming because thought it would be a good skill for the long term. Granted it has treated me well, The deprecation of my trade means the deprecation of a majority of modern trades. Simply put, Soon all work will be replaced by machines.
Hosting like the big boys…on the cheap!
Amazon is out to own the web hosting space with its new lightsail space. Lightsail offers a setup of the cloud Virtual Private Server instance, and all the trimmings that typically would require an experienced IT professional to configure to make possible. Making it possible for small businesses to have a scalable private server for as low as $5 a month, and any newbie developer should be able to set it up!! If your still using Godaddy, bluehost,hostgator and sharing one IP address and server space, among hundreds (sometimes thousands) of other websites in a “shared” hosting account (which search engines can see), it would be silly not to migrate.
What I can determine about you from your browser alone
Our browsers send off quite a bit of information about us now. Using the fingerprintJS library I was able to abstract this information.
Browser Used:
language:
color_depth:
pixel_ratio:
resolution:
available_resolution:
timezone_offset:
session_storage:
local_storage:
indexed_db:
cpu_class:
navigator_platform:
do_not_track:
regular_plugins:
canvas:
canvas winding:yes~canvas
webgl:
adblock:
has_lied_languages:
has_lied_resolution:
has_lied_os:
has_lied_browser:
touch_support
js_fonts