One place for hosting & domains

      How To Install and Configure Zabbix to Securely Monitor Remote Servers on Ubuntu 18.04


      The author selected the Open Source Initiative to receive a donation as part of the Write for DOnations program.

      Introduction

      Zabbix is open-source monitoring software for networks and applications. It offers real-time monitoring of thousands of metrics collected from servers, virtual machines, network devices, and web applications. These metrics can help you determine the current health of your IT infrastructure and detect problems with hardware or software components before customers complain. Useful information is stored in a database so you can analyze data over time and improve the quality of provided services, or plan upgrades of your equipment.

      Zabbix uses several options for collecting metrics, including agentless monitoring of user services and client-server architecture. To collect server metrics, it uses a small agent on the monitored client to gather data and send it to the Zabbix server. Zabbix supports encrypted communication between the server and connected clients, so your data is protected while it travels over insecure networks.

      The Zabbix server stores its data in a relational database powered by MySQL, PostgreSQL, or Oracle. You can also store historical data in nosql databases like Elasticsearch and TimescaleDB. Zabbix provides a web interface so you can view data and configure system settings.

      In this tutorial, you will configure two machines. One will be configured as the server, and the other as a client that you’ll monitor. The server will use a MySQL database to record monitoring data and use Apache to serve the web interface.

      Prerequisites

      To follow this tutorial, you will need:

      • Two Ubuntu 18.04 servers set up by following the Initial Server Setup Guide for Ubuntu 18.04, including a non-root user with sudo privileges and a firewall configured with ufw. On one server, you will install Zabbix; this tutorial will refer to this as the Zabbix server. It will monitor your second server; this second server will be referred to as the second Ubuntu server.

      • The server that will run the Zabbix server needs Apache, MySQL, and PHP installed. Follow this guide to configure those on your Zabbix server.

      Additionally, because the Zabbix Server is used to access valuable information about your infrastructure that you would not want unauthorized users to access, it’s important that you keep your server secure by installing a TLS/SSL certificate. This is optional but strongly encouraged. You can follow the Let’s Encrypt on Ubuntu 18.04 guide to obtain the free TLS/SSL certificate.

      Step 1 — Installing the Zabbix Server

      First, you need to install Zabbix on the server where you installed MySQL, Apache, and PHP. Log into this machine as your non-root user:

      • ssh sammy@zabbix_server_ip_address

      Zabbix is available in Ubuntu’s package manager, but it’s outdated, so use the official Zabbix repository to install the latest stable version. Download and install the repository configuration package:

      • wget https://repo.zabbix.com/zabbix/4.2/ubuntu/pool/main/z/zabbix-release/zabbix-release_4.2-1+bionic_all.deb
      • sudo dpkg -i zabbix-release_4.2-1+bionic_all.deb

      You will see the following output:

      Output

      Selecting previously unselected package zabbix-release. (Reading database ... 61483 files and directories currently installed.) Preparing to unpack zabbix-release_4.2-1+bionic_all.deb ... Unpacking zabbix-release (4.2-1+bionicc) ... Setting up zabbix-release (4.2-1+bionicc) ...

      Update the package index so the new repository is included:

      Then install the Zabbix server and web frontend with MySQL database support:

      • sudo apt install zabbix-server-mysql zabbix-frontend-php

      Also, install the Zabbix agent, which will let you collect data about the Zabbix server status itself.

      • sudo apt install zabbix-agent

      Before you can use Zabbix, you have to set up a database to hold the data that the Zabbix server will collect from its agents. You can do this in the next step.

      Step 2 — Configuring the MySQL Database for Zabbix

      You need to create a new MySQL database and populate it with some basic information in order to make it suitable for Zabbix. You'll also create a specific user for this database so Zabbix isn't logging into MySQL with the root account.

      Log into MySQL as the root user using the root password that you set up during the MySQL server installation:

      Create the Zabbix database with UTF-8 character support:

      • create database zabbix character set utf8 collate utf8_bin;

      Then create a user that the Zabbix server will use, give it access to the new database, and set the password for the user:

      • grant all privileges on zabbix.* to zabbix@localhost identified by 'your_zabbix_mysql_password';

      Then apply these new permissions:

      That takes care of the user and the database. Exit out of the database console.

      Next you have to import the initial schema and data. The Zabbix installation provided you with a file that sets this up.

      Run the following command to set up the schema and import the data into the zabbix database. Use zcat since the data in the file is compressed.

      • zcat /usr/share/doc/zabbix-server-mysql/create.sql.gz | mysql -uzabbix -p zabbix

      Enter the password for the zabbix MySQL user that you configured when prompted.

      This command will not output any errors if it was successful. If you see the error ERROR 1045 (28000): Access denied for userzabbix@'localhost' (using password: YES) then make sure you used the password for the zabbix user and not the root user.

      In order for the Zabbix server to use this database, you need to set the database password in the Zabbix server configuration file. Open the configuration file in your preferred text editor. This tutorial will use nano:

      • sudo nano /etc/zabbix/zabbix_server.conf

      Look for the following section of the file:

      /etc/zabbix/zabbix_server.conf

      ### Option: DBPassword                           
      #       Database password. Ignored for SQLite.   
      #       Comment this line if no password is used.
      #                                                
      # Mandatory: no                                  
      # Default:                                       
      # DBPassword=
      

      These comments in the file explain how to connect to the database. You need to set the DBPassword value in the file to the password for your database user. Add this line below those comments to configure the database:

      /etc/zabbix/zabbix_server.conf

      ...
      DBPassword=your_zabbix_mysql_password
      

      Save and close zabbix_server.conf by pressing CTRL+X, followed by Y and then ENTER if you're using nano.

      That takes care of the Zabbix server configuration. Next, you will make some modifications to your PHP setup in order for the Zabbix web interface to work properly.

      Step 3 — Configuring PHP for Zabbix

      The Zabbix web interface is written in PHP and requires some special PHP server settings. The Zabbix installation process created an Apache configuration file that contains these settings. It is located in the directory /etc/zabbix and is loaded automatically by Apache. You need to make a small change to this file, so open it up with the following:

      • sudo nano /etc/zabbix/apache.conf

      The file contains PHP settings that meet the necessary requirements for the Zabbix web interface. However, the timezone setting is commented out by default. To make sure that Zabbix uses the correct time, you need to set the appropriate timezone.

      /etc/zabbix/apache.conf

      ...
      <IfModule mod_php7.c>
          php_value max_execution_time 300
          php_value memory_limit 128M
          php_value post_max_size 16M
          php_value upload_max_filesize 2M
          php_value max_input_time 300
          php_value always_populate_raw_post_data -1
          # php_value date.timezone Europe/Riga
      </IfModule>
      

      Uncomment the timezone line, highlighted in the preceding code block, and change it to your timezone. You can use this list of supported time zones to find the right one for you. Then save and close the file.

      Now restart Apache to apply these new settings.

      • sudo systemctl restart apache2

      You can now start the Zabbix server.

      • sudo systemctl start zabbix-server

      Then check whether the Zabbix server is running properly:

      • sudo systemctl status zabbix-server

      You will see the following status:

      Output

      ● zabbix-server.service - Zabbix Server Loaded: loaded (/lib/systemd/system/zabbix-server.service; disabled; vendor preset: enabled) Active: active (running) since Fri 2019-04-05 08:50:54 UTC; 3s ago Process: 16497 ExecStart=/usr/sbin/zabbix_server -c $CONFFILE (code=exited, status=0/SUCCESS) ...

      Finally, enable the server to start at boot time:

      • sudo systemctl enable zabbix-server

      The server is set up and connected to the database. Next, set up the web frontend.

      Note: As mentioned in the Prerequisites section, it is recommended that you enable SSL/TLS on your server. You can follow this tutorial now to obtain a free SSL certificate for Apache on Ubuntu 18.04. After obtaining your SSL/TLS certificates, you can come back and complete this tutorial.

      Step 4 — Configuring Settings for the Zabbix Web Interface

      The web interface lets you see reports and add hosts that you want to monitor, but it needs some initial setup before you can use it. Launch your browser and go to the address http://zabbix_server_name/zabbix/. On the first screen, you will see a welcome message. Click Next step to continue.

      On the next screen, you will see the table that lists all of the prerequisites to run Zabbix.

      Prerequisites

      All of the values in this table must be OK, so verify that they are. Be sure to scroll down and look at all of the prerequisites. Once you've verified that everything is ready to go, click Next step to proceed.

      The next screen asks for database connection information.

      DB Connection

      You told the Zabbix server about your database, but the Zabbix web interface also needs access to the database to manage hosts and read data. Therefore enter the MySQL credentials you configured in Step 2 and click Next step to proceed.

      On the next screen, you can leave the options at their default values.

      Zabbix Server Details

      The Name is optional; it is used in the web interface to distinguish one server from another in case you have several monitoring servers. Click Next step to proceed.

      The next screen will show the pre-installation summary so you can confirm everything is correct.

      Summary

      Click Next step to proceed to the final screen.

      The web interface setup is complete! This process creates the configuration file /usr/share/zabbix/conf/zabbix.conf.php which you could back up and use in the future. Click Finish to proceed to the login screen. The default user is Admin and the password is zabbix.

      Before you log in, set up the Zabbix agent on your second Ubuntu server.

      Step 5 — Installing and Configuring the Zabbix Agent

      Now you need to configure the agent software that will send monitoring data to the Zabbix server.

      Log in to the second Ubuntu server:

      • ssh sammy@second_ubuntu_server_ip_address

      Then, just like on the Zabbix server, run the following commands to install the repository configuration package:

      • wget https://repo.zabbix.com/zabbix/4.2/ubuntu/pool/main/z/zabbix-release/zabbix-release_4.2-1+bionic_all.deb
      • sudo dpkg -i zabbix-release_4.2-1+bionic_all.deb

      Next, update the package index:

      Then install the Zabbix agent:

      • sudo apt install zabbix-agent

      While Zabbix supports certificate-based encryption, setting up a certificate authority is beyond the scope of this tutorial, but you can use pre-shared keys (PSK) to secure the connection between the server and agent.

      First, generate a PSK:

      • sudo sh -c "openssl rand -hex 32 > /etc/zabbix/zabbix_agentd.psk"

      Show the key so you can copy it somewhere. You will need it to configure the host.

      • cat /etc/zabbix/zabbix_agentd.psk

      The key will look something like this:

      Output

      12eb854dea38ac9ee7d1ded2d74cee6262b0a56710f6946f7913d674ab82cdd4

      Now edit the Zabbix agent settings to set up its secure connection to the Zabbix server. Open the agent configuration file in your text editor:

      • sudo nano /etc/zabbix/zabbix_agentd.conf

      Each setting within this file is documented via informative comments throughout the file, but you only need to edit some of them.

      First you have to edit the IP address of the Zabbix server. Find the following section:

      /etc/zabbix/zabbix_agentd.conf

      ...
      ### Option: Server
      #       List of comma delimited IP addresses (or hostnames) of Zabbix servers.
      #       Incoming connections will be accepted only from the hosts listed here.
      #       If IPv6 support is enabled then '127.0.0.1', '::127.0.0.1', '::ffff:127.0.0.1' are treated equally.
      #
      # Mandatory: no
      # Default:
      # Server=
      
      Server=127.0.0.1
      ...
      

      Change the default value to the IP of your Zabbix server:

      /etc/zabbix/zabbix_agentd.conf

      ...
      Server=zabbix_server_ip_address
      ...
      

      Next, find the section that configures the secure connection to the Zabbix server and enable pre-shared key support. Find the TLSConnect section, which looks like this:

      /etc/zabbix/zabbix_agentd.conf

      ...
      ### Option: TLSConnect
      #       How the agent should connect to server or proxy. Used for active checks.
      #       Only one value can be specified:
      #               unencrypted - connect without encryption
      #               psk         - connect using TLS and a pre-shared key
      #               cert        - connect using TLS and a certificate
      #
      # Mandatory: yes, if TLS certificate or PSK parameters are defined (even for 'unencrypted' connection)
      # Default:
      # TLSConnect=unencrypted
      ...
      

      Then add this line to configure pre-shared key support:

      /etc/zabbix/zabbix_agentd.conf

      ...
      TLSConnect=psk
      ...
      

      Next, locate the TLSAccept section, which looks like this:

      /etc/zabbix/zabbix_agentd.conf

      ...
      ### Option: TLSAccept
      #       What incoming connections to accept.
      #       Multiple values can be specified, separated by comma:
      #               unencrypted - accept connections without encryption
      #               psk         - accept connections secured with TLS and a pre-shared key
      #               cert        - accept connections secured with TLS and a certificate
      #
      # Mandatory: yes, if TLS certificate or PSK parameters are defined (even for 'unencrypted' connection)
      # Default:
      # TLSAccept=unencrypted
      ...
      

      Configure incoming connections to support pre-shared keys by adding this line:

      /etc/zabbix/zabbix_agentd.conf

      ...
      TLSAccept=psk
      ...
      

      Next, find the TLSPSKIdentity section, which looks like this:

      /etc/zabbix/zabbix_agentd.conf

      ...
      ### Option: TLSPSKIdentity
      #       Unique, case sensitive string used to identify the pre-shared key.
      #
      # Mandatory: no
      # Default:
      # TLSPSKIdentity=
      ...
      

      Choose a unique name to identify your pre-shared key by adding this line:

      /etc/zabbix/zabbix_agentd.conf

      ...
      TLSPSKIdentity=PSK 001
      ...
      

      You'll use this as the PSK ID when you add your host through the Zabbix web interface.

      Then set the option that points to your previously created pre-shared key. Locate the TLSPSKFile option:

      /etc/zabbix/zabbix_agentd.conf

      ...
      ### Option: TLSPSKFile
      #       Full pathname of a file containing the pre-shared key.
      #
      # Mandatory: no
      # Default:
      # TLSPSKFile=
      ...
      

      Add this line to point the Zabbix agent to your PSK file you created:

      /etc/zabbix/zabbix_agentd.conf

      ...
      TLSPSKFile=/etc/zabbix/zabbix_agentd.psk
      ...
      

      Save and close the file. Now you can restart the Zabbix agent and set it to start at boot time:

      • sudo systemctl restart zabbix-agent
      • sudo systemctl enable zabbix-agent

      For good measure, check that the Zabbix agent is running properly:

      • sudo systemctl status zabbix-agent

      You will see the following status, indicating the agent is running:

      Output

      ● zabbix-agent.service - Zabbix Agent Loaded: loaded (/lib/systemd/system/zabbix-agent.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2019-04-05 09:03:04 UTC; 1s ago ...

      The agent will listen on port 10050 for connections from the server. Configure UFW to allow connections to this port:

      You can learn more about UFW in How To Set Up a Firewall with UFW on Ubuntu 18.04.

      Your agent is now ready to send data to the Zabbix server. But in order to use it, you have to link to it from the server's web console. In the next step, you will complete the configuration.

      Step 6 — Adding the New Host to the Zabbix Server

      Installing an agent on a server you want to monitor is only half of the process. Each host you want to monitor needs to be registered on the Zabbix server, which you can do through the web interface.

      Log in to the Zabbix Server web interface by navigating to the address http://zabbix_server_name/zabbix/.

      The Zabbix login screen

      When you have logged in, click on Configuration, and then Hosts in the top navigation bar. Then click the Create host button in the top right corner of the screen. This will open the host configuration page.

      Creating a host

      Adjust the Host name and IP address to reflect the host name and IP address of your second Ubuntu server, then add the host to a group. You can select an existing group, for example Linux servers, or create your own group. The host can be in multiple groups. To do this, enter the name of an existing or new group in the Groups field and select the desired value from the proposed list.

      Once you've added the group, click the Templates tab.

      Adding a template to the host

      Type Template OS Linux in the Search field and then click Add to add this template to the host.

      Next, navigate to the Encryption tab. Select PSK for both Connections to host and Connections from host. Then set PSK identity to PSK 001, which is the value of the TLSPSKIdentity setting of the Zabbix agent you configured previously. Then set PSK value to the key you generated for the Zabbix agent. It's the one stored in the file /etc/zabbix/zabbix_agentd.psk on the agent machine.

      Setting up the encryption

      Finally, click the Add button at the bottom of the form to create the host.

      You will see your new host in the list. Wait for a minute and reload the page to see green labels indicating that everything is working fine and the connection is encrypted.

      Zabbix shows your new host

      If you have additional servers you need to monitor, log in to each host, install the Zabbix agent, generate a PSK, configure the agent, and add the host to the web interface following the same steps you followed to add your first host.

      The Zabbix server is now monitoring your second Ubuntu server. Now, set up email notifications to be notified about problems.

      Step 7 — Configuring Email Notifications

      Zabbix automatically supports several types of notifications: email, Jabber, SMS, etc. You can also use alternative notification methods, such as Telegram or Slack. You can see the full list of integrations here.

      The simplest communication method is email, and this tutorial will configure notifications for this media type.

      Click on Administration, and then Media types in the top navigation bar. You will see the list of all media types. Click on Email.

      Adjust the SMTP options according to the settings provided by your email service. This tutorial uses Gmail's SMTP capabilities to set up email notifications; if you would like more information about setting this up, see How To Use Google's SMTP Server.


      Note: If you use 2-Step Verification with Gmail, you need to generate an App Password for Zabbix. You don't need to remember it, you’ll only have to enter an App password once during setup. You will find instructions on how to generate this password in the Google Help Center.

      You can also choose the message format—html or plain text. Finally, click the Update button at the bottom of the form to update the email parameters.

      Setting up email

      Now, create a new user. Click on Administration, and then Users in the top navigation bar. You will see the list of users. Then click the Create user button in the top right corner of the screen. This will open the user configuration page.

      Creating a user

      Enter the new username in the Alias field and set up a new password. Next, add the user to the administrator's group. Type Zabbix administrators in the Groups field and select it from the proposed list.

      Once you've added the group, click the Media tab and click on the Add underlined link. You will see a pop-up window.

      Adding an email

      Enter your email address in the Send to field. You can leave the rest of the options at the default values. Click the Add button at the bottom to submit.

      Now navigate to the Permissions tab. Select Zabbix Super Admin from the User type drop-down menu.

      Finally, click the Add button at the bottom of the form to create the user.

      Now you need to enable notifications. Click on the Configuration tab, and then Actions in the top navigation bar. You will see a pre-configured action, which is responsible for sending notifications to all Zabbix administrators. You can review and change the settings by clicking on its name. For the purposes of this tutorial, use the default parameters. To enable the action, click on the red Disabled link in the Status column.

      Now you are ready to receive alerts. In the next step, you will generate one to test your notification setup.

      Step 8 — Generating a Test Alert

      In this step, you will generate a test alert to ensure everything is connected. By default, Zabbix keeps track of the amount of free disk space on your server. It automatically detects all disk mounts and adds the corresponding checks. This discovery is executed every hour, so you need to wait a while for the notification to be triggered.

      Create a temporary file that's large enough to trigger Zabbix's file system usage alert. To do this, log in to your second Ubuntu server if you're not already connected.

      • ssh sammy@second_ubuntu_server_ip_address

      Next, determine how much free space you have on the server. You can use the df command to find out:

      The command df will report the disk space usage of your file system, and the -h will make the output human-readable. You'll see output like the following:

      Output

      Filesystem Size Used Avail Use% Mounted on /dev/vda1 25G 1.2G 23G 5% /

      In this case, the free space is 23GB. Your free space may differ.

      Use the fallocate command, which allows you to pre-allocate or de-allocate space to a file, to create a file that takes up more than 80% of the available disk space. This will be enough to trigger the alert:

      • fallocate -l 20G /tmp/temp.img

      After around an hour, Zabbix will trigger an alert about the amount of free disk space and will run the action you configured, sending the notification message. You can check your inbox for the message from the Zabbix server. You will see a message like:

      Output

      Problem started at 10:37:54 on 2019.04.05 Problem name: Free disk space is less than 20% on volume / Host: Second Ubuntu server Severity: Warning Original problem ID: 34

      You can also navigate to the Monitoring tab, and then Dashboard to see the notification and its details.

      Main dashboard

      Now that you know the alerts are working, delete the temporary file you created so you can reclaim your disk space:

      After a minute Zabbix will send the recovery message and the alert will disappear from main dashboard.

      Conclusion

      In this tutorial, you learned how to set up a simple and secure monitoring solution which will help you monitor the state of your servers. It can now warn you of problems, and you have the opportunity to analyze the processes occurring in your IT infrastructure.

      To learn more about setting up monitoring infrastructure, check out How To Install Elasticsearch, Logstash, and Kibana (Elastic Stack) on Ubuntu 18.04 and How To Gather Infrastructure Metrics with Metricbeat on Ubuntu 18.04.



      Source link

      How To Scrape Web Pages and Post Content to Twitter with Python 3


      The author selected The Computer History Museum to receive a donation as part of the Write for DOnations program.

      Introduction

      Twitter bots are a powerful way of managing your social media as well as extracting information from the microblogging network. By leveraging Twitter’s versatile APIs, a bot can do a lot of things: tweet, retweet, “favorite-tweet”, follow people with certain interests, reply automatically, and so on. Even though people can, and do, abuse their bot’s power, leading to a negative experience for other users, research shows that people view Twitter bots as a credible source of information. For example, a bot can keep your followers engaged with content even when you’re not online. Some bots even provide critical and helpful information, like @EarthquakesSF. The applications for bots are limitless. As of 2019, it is estimated that bots account for about 24% of all tweets on Twitter.

      In this tutorial, you’ll build a Twitter bot using this Twitter API library for Python. You’ll use API keys from your Twitter account to authorize your bot and build a to capable of scraping content from two websites. Furthermore, you’ll program your bot to alternately tweet content from these two websites and at set time intervals. Note that you’ll use Python 3 in this tutorial.

      Prerequisites

      You will need the following to complete this tutorial:

      Note: You’ll be setting up a developer account with Twitter, which involves an application review by Twitter before your can access the API keys you require for this bot. Step 1 walks through the specific details for completing the application.

      Step 1 — Setting Up Your Developer Account and Accessing Your Twitter API Keys

      Before you begin coding your bot, you’ll need the API keys for Twitter to recognize the requests of your bot. In this step, you’ll set up your Twitter Developer Account and access your API keys for your Twitter bot.

      To get your API keys, head over to developer.twitter.com and register your bot application with Twitter by clicking on Apply in the top right section of the page.

      Now click on Apply for a developer account.

      Next, click on Continue to associate your Twitter username with your bot application that you’ll be building in this tutorial.

      Twitter Username Association with Bot

      On the next page, for the purposes of this tutorial, you’ll choose the I am requesting access for my own personal use option since you’ll be building a bot for your own personal education use.

      Twitter API Personal Use

      After choosing your Account Name and Country, move on to the next section. For What use case(s) are you interested in?, pick the Publish and curate Tweets and Student project / Learning to code options. These categories are the best representation of why you’re completing this tutorial.

      Twitter Bot Purpose

      Then provide a description of the bot you’re trying to build. Twitter requires this to protect against bot abuse; in 2018 they introduced such vetting. For this tutorial, you’ll be scraping tech-focused content from The New Stack and The Coursera Blog.

      When deciding what to enter into the description box, model your answer on the following lines for the purposes of this tutorial:

      I’m following a tutorial to build a Twitter bot that will scrape content from websites like thenewstack.io (The New Stack) and blog.coursera.org (Coursera’s Blog) and tweet quotes from them. The scraped content will be aggregated and will be tweeted in a round-robin fashion via Python generator functions.

      Finally, choose no for Will your product, service, or analysis make Twitter content or derived information available to a government entity?

      Twitter Bot Intent

      Next, accept Twitter’s terms and conditions, click on Submit application, and then verify your email address. Twitter will send a verification email to you after your submission of this form.

      Once you verify your email, you’ll get an Application under review page with a feedback form for the application process.

      You will also receive another email from Twitter regarding the review:

      Application Review Email

      The timeline for Twitter’s application review process can vary significantly, but often Twitter will confirm this within a few minutes. However, should your application’s review take longer than this, it is not unusual, and you should receive it within a day or two. Once you receive confirmation, Twitter has authorized you to generate your keys. You can access these under the Keys and tokens tab after clicking the details button of your app on developer.twitter.com/apps.

      Finally go to the Permissions tab on your app’s page and set the Access Permission option to Read and Write since you want to write tweet content too. Usually, you would use the read-only mode for research purposes like analyzing trends, data-mining, and so on. The final option allows users to integrate chatbots into their existing apps, since chatbots require access to direct messages.

      Twitter App Permissions Page

      You have access to Twitter’s powerful API, which will be a crucial part of your bot application. Now you’ll set up your environment and begin building your bot.

      Step 2 — Building the Essentials

      In this step, you’ll write code to authenticate your bot with Twitter using the API keys, and make the first programmatic tweet via your Twitter handle. This will serve as a good milestone in your path towards the goal of building a Twitter bot that scrapes content from The New Stack and the Coursera Blog and tweets them periodically.

      First, you’ll set up a project folder and a specific programming environment for your project.

      Create your project folder:

      Move into your project folder:

      Then create a new Python virtual environment for your project:

      Then activate your environment using the following command:

      • source bird-env/bin/activate

      This will attach a (bird-env) prefix to the prompt in your terminal window.

      Now move to your text editor and create a file called credentials.py, which will store your Twitter API keys:

      Add the following content, replacing the highlighted code with your keys from Twitter:

      bird/credentials.py

      
      ACCESS_TOKEN='your-access-token'
      ACCESS_SECRET='your-access-secret'
      CONSUMER_KEY='your-consumer-key'
      CONSUMER_SECRET='your-consumer-secret'
      

      Now, you'll install the main API library for sending requests to Twitter. For this project, you'll require the following libraries: nltk, requests, twitter, lxml, random, and time. random and time are part of Python's standard library, so you don't need to separately install these libraries. To install the remaining libraries, you'll use pip, a package manager for Python.

      Open your terminal, ensure you're in the project folder, and run the following command:

      • pip3 install lxml nltk requests twitter
      • lxml and requests: You will use them for web scraping.
      • twitter: This is the library for making API calls to Twitter's servers.
      • nltk: (natural language toolkit) You will use to split paragraphs of blogs into sentences.
      • random: You will use this to randomly select parts of an entire scraped blog post.
      • time: You will use to make your bot sleep periodically after certain actions.

      Once you have installed the libraries, you're all set to begin programming. Now, you'll import your credentials into the main script that will run the bot. Alongside credentials.py, from your text editor create a file in the bird project directory, and name it bot.py:

      In practice, you would spread the functionality of your bot across multiple files as it grows more and more sophisticated. However, in this tutorial, you'll put all of your code in a single script, bot.py, for demonstration purposes.

      First you'll test your API keys by authorizing your bot. Begin by adding the following snippet to bot.py:

      bird/bot.py

      import random
      import time
      
      from lxml.html import fromstring
      import nltk
      nltk.download('punkt')
      import requests
      from twitter import OAuth, Twitter
      
      import credentials
      

      Here, you import the required libraries; and in a couple of instances you import the necessary functions from the libraries. You will use the fromstring function later in the code to convert the string source of a scraped webpage to a tree structure that makes it easier to extract relevant information from the page. OAuth will help you in constructing an authentication object from your keys, and Twitter will build the main API object for all further communication with Twitter's servers.

      Now extend bot.py with the following lines:

      bird/bot.py

      ...
      tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
      
      oauth = OAuth(
              credentials.ACCESS_TOKEN,
              credentials.ACCESS_SECRET,
              credentials.CONSUMER_KEY,
              credentials.CONSUMER_SECRET
          )
      t = Twitter(auth=oauth)
      

      nltk.download('punkt') downloads a dataset necessary for parsing paragraphs and tokenizing (splitting) them into smaller components. tokenizer is the object you'll use later in the code for splitting paragraphs written in English.

      oauth is the authentication object constructed by feeding the imported OAuth class with your API keys. You authenticate your bot via the line t = Twitter(auth=oauth). ACCESS_TOKEN and ACCESS_SECRET help in recognizing your application. Finally, CONSUMER_KEY and CONSUMER_SECRET help in recognizing the handle via which the application interacts with Twitter. You'll use this t object to communicate your requests to Twitter.

      Now save this file and run it in your terminal using the following command:

      Your output will look similar to the following, which means your authorization was successful:

      Output

      [nltk_data] Downloading package punkt to /Users/binaryboy/nltk_data... [nltk_data] Package punkt is already up-to-date!

      If you do receive an error, verify your saved API keys with those in your Twitter developer account and try again. Also ensure that the required libraries are installed correctly. If not, use pip3 again to install them.

      Now you can try tweeting something programmatically. Type the same command on the terminal with the -i flag to open the Python interpreter after the execution of your script:

      Next, type the following to send a tweet via your account:

      • t.statuses.update(status="Just setting up my Twttr bot")

      Now open your Twitter timeline in a browser, and you'll see a tweet at the top of your timeline containing the content you posted.

      First Programmatic Tweet

      Close the interpreter by typing quit() or CTRL + D.

      Your bot now has the fundamental capability to tweet. To develop your bot to tweet useful content, you'll incorporate web scraping in the next step.

      Step 3 — Scraping Websites for Your Tweet Content

      To introduce some more interesting content to your timeline, you'll scrape content from the New Stack and the Coursera Blog, and then post this content to Twitter in the form of tweets. Generally, to scrape the appropriate data from your target websites, you have to experiment with their HTML structure. Each tweet coming from the bot you'll build in this tutorial will have a link to a blog post from the chosen websites, along with a random quote from that blog. You'll implement this procedure within a function specific to scraping content from Coursera, so you'll name it scrape_coursera().

      First open bot.py:

      Add the scrape_coursera() function to the end of your file:

      bird/bot.py

      ...
      t = Twitter(auth=oauth)
      
      
      def scrape_coursera():
      

      To scrape information from the blog, you'll first request the relevant webpage from Coursera's servers. For that you will use the get() function from the requests library. get() takes in a URL and fetches the corresponding webpage. So, you'll pass blog.coursera.org as an argument to get(). But you also need to provide a header in your GET request, which will ensure Coursera's servers recognize you as a genuine client. Add the following highlighted lines to your scrape_coursera() function to provide a header:

      bird/bot.py

      def scrape_coursera():
          HEADERS = {
              'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5)'
                            ' AppleWebKit/537.36 (KHTML, like Gecko) Cafari/537.36'
              }
      

      This header will contain information pertaining to a defined web browser running on a specific operating system. As long as this information (usually referred to as User-Agent) corresponds to real web browsers and operating systems, it doesn't matter whether the header information aligns with the actual web browser and operating system on your computer. Therefore this header will work fine for all systems.

      Once you have defined the headers, add the following highlighted lines to make a GET request to Coursera by specifying the URL of the blog webpage:

      bird/bot.py

      ...
      def scrape_coursera():
          HEADERS = {
              'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5)'
                            ' AppleWebKit/537.36 (KHTML, like Gecko) Cafari/537.36'
              }
          r = requests.get('https://blog.coursera.org', headers=HEADERS)
          tree = fromstring(r.content)
      

      This will fetch the webpage to your machine and save the information from the entire webpage in the variable r. You can assess the HTML source code of the webpage using the content attribute of r. Therefore, the value of r.content is the same as what you see when you inspect the webpage in your browser by right clicking on the page and choosing the Inspect Element option.

      Here you've also added the fromstring function. You can pass the webpage's source code to the fromstring function imported from the lxml library to construct the tree structure of the webpage. This tree structure will allow you to conveniently access different parts of the webpage. HTML source code has a particular tree-like structure; every element is enclosed in the <html> tag and nested thereafter.

      Now, open https://blog.coursera.org in a browser and inspect its HTML source using the browser's developer tools. Right click on the page and choose the Inspect Element option. You'll see a window appear at the bottom of the browser, showing part of the page's HTML source code.

      browser-inspect

      Next, right click on the thumbnail of any visible blog post and then inspect it. The HTML source will highlight the relevant HTML lines where that blog thumbnail is defined. You'll notice that all blog posts on this page are defined within a <div> tag with a class of "recent":

      blog-div

      Thus, in your code, you'll use all such blog post div elements via their XPath, which is a convenient way of addressing elements of a web page.

      To do so, extend your function in bot.py as follows:

      bird/bot.py

      ...
      def scrape_coursera():
          HEADERS = {
              'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5)'
                            ' AppleWebKit/537.36 (KHTML, like Gecko) Cafari/537.36'
                          }
          r = requests.get('https://blog.coursera.org', headers=HEADERS)
          tree = fromstring(r.content)
          links = tree.xpath('//div[@class="recent"]//div[@class="title"]/a/@href')
          print(links)
      
      scrape_coursera()
      

      Here, the XPath (the string passed to tree.xpath()) communicates that you want div elements from the entire web page source, of class "recent". The // corresponds to searching the whole webpage, div tells the function to extract only the div elements, and [@class="recent"] asks it to only extract those div elements that have the values of their class attribute as "recent".

      However, you don't need these elements themselves, you only need the links they're pointing to, so that you can access the individual blog posts to scrape their content. Therefore, you extract all the links using the values of the href anchor tags that are within the previous div tags of the blog posts.

      To test your program so far, you call the scrape_coursera() function at the end of bot.py.

      Save and exit bot.py.

      Now run bot.py with the following command:

      In your output, you'll see a list of URLs like the following:

      Output

      ['https://blog.coursera.org/career-stories-from-inside-coursera/', 'https://blog.coursera.org/unlock-the-power-of-data-with-python-university-of-michigan-offers-new-programming-specializations-on-coursera/', ...]

      After you verify the output, you can remove the last two highlighted lines from bot.py script:

      bird/bot.py

      ...
      def scrape_coursera():
          ...
          tree = fromstring(r.content)
          links = tree.xpath('//div[@class="recent"]//div[@class="title"]/a/@href')
          ~~print(links)~~
      
      ~~scrape_coursera()~~
      

      Now extend the function in bot.py with the following highlighted line to extract the content from a blog post:

      bird/bot.py

      ...
      def scrape_coursera():
          ...
          links = tree.xpath('//div[@class="recent"]//div[@class="title"]/a/@href')
          for link in links:
              r = requests.get(link, headers=HEADERS)
              blog_tree = fromstring(r.content)
      

      You iterate over each link, fetch the corresponding blog post, extract a random sentence from the post, and then tweet this sentence as a quote, along with the corresponding URL. Extracting a random sentence involves three parts:

      1. Grabbing all the paragraphs in the blog post as a list.
      2. Selecting a paragraph at random from the list of paragraphs.
      3. Selecting a sentence at random from this paragraph.

      You'll execute these steps for each blog post. For fetching one, you make a GET request for its link.

      Now that you have access to the content of a blog, you will introduce the code that executes these three steps to extract the content you want from it. Add the following extension to your scraping function that executes the three steps:

      bird/bot.py

      ...
      def scrape_coursera():
          ...
          for link in links:
              r = requests.get(link, headers=HEADERS)
              blog_tree = fromstring(r.content)
              paras = blog_tree.xpath('//div[@class="entry-content"]/p')
              paras_text = [para.text_content() for para in paras if para.text_content()]
              para = random.choice(paras_text)
              para_tokenized = tokenizer.tokenize(para)
              for _ in range(10):
                  text = random.choice(para)
                  if text and 60 < len(text) < 210:
                      break
      

      If you inspect the blog post by opening the first link, you'll notice that all the paragraphs belong to the div tag having entry-content as its class. Therefore, you extract all paragraphs as a list with paras = blog_tree.xpath('//div[@class="entry-content"]/p').

      Div Enclosing Paragraphs

      The list elements aren't literal paragraphs; they are Element objects. To extract the text out of these objects, you use the text_content() method. This line follows Python's list comprehension design pattern, which defines a collection using a loop that is usually written out in a single line. In bot.py, you extract the text for each paragraph element object and store it in a list if the text is not empty. To randomly choose a paragraph from this list of paragraphs, you incorporate the random module.

      Finally, you have to select a sentence at random from this paragraph, which is stored in the variable para. For this task, you first break the paragraph into sentences. One approach to accomplish this is using the Python's split() method. However this can be difficult since a sentence can be split at multiple breakpoints. Therefore, to simplify your splitting tasks, you leverage natural language processing through the nltk library. The tokenizer object you defined earlier in the tutorial will be useful for this purpose.

      Now that you have a list of sentences, you call random.choice() to extract a random sentence. You want this sentence to be a quote for a tweet, so it can't exceed 280 characters. However, for aesthetic reasons, you'll select a sentence that is neither too big nor too small. You designate that your tweet sentence should have a length between 60 to 210 characters. The sentence random.choice() picks might not satisfy this criterion. To identify the right sentence, your script will make ten attempts, checking for the criterion each time. Once the randomly picked-up sentence satisfies your criterion, you can break out of the loop.

      Although the probability is quite low, it is possible that none of the sentences meet this size condition within ten attempts. In this case, you'll ignore the corresponding blog post and move on to the next one.

      Now that you have a sentence to quote, you can tweet it with the corresponding link. You can do this by yielding a string that contains the randomly picked-up sentence as well as the corresponding blog link. The code that calls this scrape_coursera() function will then post the yielded string to Twitter via Twitter's API.

      Extend your function as follows:

      bird/bot.py

      ...
      def scrape_coursera():
          ...
          for link in links:
              ...
              para_tokenized = tokenizer.tokenize(para)
              for _ in range(10):
                  text = random.choice(para)
                  if text and 60 < len(text) < 210:
                      break
              else:
                  yield None
              yield '"%s" %s' % (text, link)
      

      The script only executes the else statement when the preceding for loop doesn't break. Thus, it only happens when the loop is not able to find a sentence that fits your size condition. In that case, you simply yield None so that the code that calls this function is able to determine that there is nothing to tweet. It will then move on to call the function again and get the content for the next blog link. But if the loop does break it means the function has found an appropriate sentence; the script will not execute the else statement, and the function will yield a string composed of the sentence as well as the blog link, separated by a single whitespace.

      The implementation of the scrape_coursera() function is almost complete. If you want to make a similar function to scrape another website, you will have to repeat some of the code you've written for scraping Coursera's blog. To avoid rewriting and duplicating parts of the code and to ensure your bot's script follows the DRY principle (Don't Repeat Yourself), you'll identify and abstract out parts of the code that you will use again and again for any scraper function written later.

      Regardless of the website the function is scraping, you'll have to randomly pick up a paragraph and then choose a random sentence from this chosen paragraph — you can extract out these functionalities in separate functions. Then you can simply call these functions from your scraper functions and achieve the desired result. You can also define HEADERS outside the scrape_coursera() function so that all of the scraper functions can use it. Therefore, in the code that follows, the HEADERS definition should precede that of the scraper function, so that eventually you're able to use it for other scrapers:

      bird/bot.py

      ...
      HEADERS = {
          'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5)'
                        ' AppleWebKit/537.36 (KHTML, like Gecko) Cafari/537.36'
          }
      
      
      def scrape_coursera():
          r = requests.get('https://blog.coursera.org', headers=HEADERS)
          ...
      

      Now you can define the extract_paratext() function for extracting a random paragraph from a list of paragraph objects. The random paragraph will pass to the function as a paras argument, and return the chosen paragraph's tokenized form that you'll use later for sentence extraction:

      bird/bot.py

      ...
      HEADERS = {
              'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5)'
                            ' AppleWebKit/537.36 (KHTML, like Gecko) Cafari/537.36'
              }
      
      def extract_paratext(paras):
          """Extracts text from <p> elements and returns a clean, tokenized random
          paragraph."""
      
          paras = [para.text_content() for para in paras if para.text_content()]
          para = random.choice(paras)
          return tokenizer.tokenize(para)
      
      
      def scrape_coursera():
          r = requests.get('https://blog.coursera.org', headers=HEADERS)
          ...
      

      Next, you will define a function that will extract a random sentence of suitable length (between 60 and 210 characters) from the tokenized paragraph it gets as an argument, which you can name as para. If such a sentence is not discovered after ten attempts, the function returns None instead. Add the following highlighted code to define the extract_text() function:

      bird/bot.py

      ...
      
      def extract_paratext(paras):
          ...
          return tokenizer.tokenize(para)
      
      
      def extract_text(para):
          """Returns a sufficiently-large random text from a tokenized paragraph,
          if such text exists. Otherwise, returns None."""
      
          for _ in range(10):
              text = random.choice(para)
              if text and 60 < len(text) < 210:
                  return text
      
          return None
      
      
      def scrape_coursera():
          r = requests.get('https://blog.coursera.org', headers=HEADERS)
          ...
      

      Once you have defined these new helper functions, you can redefine the scrape_coursera() function to look as follows:

      bird/bot.py

      ...
      def extract_paratext():
          for _ in range(10):<^>
              text = random.choice(para)
          ...
      
      
      def scrape_coursera():
          """Scrapes content from the Coursera blog."""
      
          url = 'https://blog.coursera.org'
          r = requests.get(url, headers=HEADERS)
          tree = fromstring(r.content)
          links = tree.xpath('//div[@class="recent"]//div[@class="title"]/a/@href')
      
          for link in links:
              r = requests.get(link, headers=HEADERS)
              blog_tree = fromstring(r.content)
              paras = blog_tree.xpath('//div[@class="entry-content"]/p')
              para = extract_paratext(paras)
              text = extract_text(para)
              if not text:
                  continue
      
              yield '"%s" %s' % (text, link)
      

      Save and exit bot.py.

      Here you're using yield instead of return because, for iterating over the links, the scraper function will give you the tweet strings one-by-one in a sequential fashion. This means when you make a first call to the scraper sc defined as sc = scrape_coursera(), you will get the tweet string corresponding to the first link among the list of links that you computed within the scraper function. If you run the following code in the interpreter, you'll get string_1 and string_2 as displayed below, if the links variable within scrape_coursera() holds a list that looks like ["https://thenewstack.io/cloud-native-live-twistlocks-virtual-conference/", "https://blog.coursera.org/unlock-the-power-of-data-with-python-university-of-michigan-offers-new-programming-specializations-on-coursera/", ...].

      Instantiate the scraper and call it sc:

      >>> sc = scrape_coursera()
      

      It is now a generator; it generates or scrapes relevant content from Coursera, one at a time. You can access the scraped content one-by-one by calling next() over sc sequentially:

      >>> string_1 = next(sc)
      >>> string_2 = next(sc)
      

      Now you can print the strings you've defined to display the scraped content:

      >>> print(string_1)
      "Other speakers include Priyanka Sharma, director of cloud native alliances at GitLab and Dan Kohn, executive director of the Cloud Native Computing Foundation." https://thenewstack.io/cloud-native-live-twistlocks-virtual-conference/
      >>>
      >>> print(string_2)
      "You can learn how to use the power of Python for data analysis with a series of courses covering fundamental theory and project-based learning." https://blog.coursera.org/unlock-the-power-of-data-with-python-university-of-michigan-offers-new-programming-specializations-on-coursera/
      >>>
      

      If you use return instead, you will not be able to obtain the strings one-by-one and in a sequence. If you simply replace the yield with return in scrape_coursera(), you'll always get the string corresponding to the first blog post, instead of getting the first one in the first call, second one in the second call, and so on. You can modify the function to simply return a list of all the strings corresponding to all the links, but that is more memory intensive. Also, this kind of program could potentially make a lot of requests to Coursera's servers within a short span of time if you want the entire list quickly. This could result in your bot getting temporarily banned from accessing a website. Therefore, yield is the best fit for a wide variety of scraping jobs, where you only need information scraped one-at-a-time.

      Step 4 — Scraping Additional Content

      In this step, you'll build a scraper for thenewstack.io. The process is similar to what you've completed in the previous step, so this will be a quick overview.

      Open the website in your browser and inspect the page source. You'll find here that all blog sections are div elements of class normalstory-box.

      HTML Source Inspection of The New Stack website

      Now you'll make a new scraper function named scrape_thenewstack() and make a GET request to thenewstack.io from within it. Next, extract the links to the blogs from these elements and then iterate over each link. Add the following code to achieve this:

      bird/bot.py

      ...
      def scrape_coursera():
          ...
          yield '"%s" %s' % (text, link)
      
      
      def scrape_thenewstack():
          """Scrapes news from thenewstack.io"""
      
          r = requests.get('https://thenewstack.io', verify=False)
      
              tree = fromstring(r.content)
              links = tree.xpath('//div[@class="normalstory-box"]/header/h2/a/@href')
              for link in links:
      

      You use the verify=False flag because websites can sometimes have expired security certificates and it's OK to access them if no sensitive data is involved, as is the case here. The verify=False flag tells the requests.get method to not verify the certificates and continue fetching data as usual. Otherwise, the method throws an error about expired security certificates.

      You can now extract the paragraphs of the blog corresponding to each link, and use the extract_paratext() function you built in the previous step to pull out a random paragraph from the list of available paragraphs. Finally, extract a random sentence from this paragraph using the extract_text() function, and then yield it with the corresponding blog link. Add the following highlighted code to your file to accomplish these tasks:

      bird/bot.py

      ...
      def scrape_thenewstack():
          ...
          links = tree.xpath('//div[@class="normalstory-box"]/header/h2/a/@href')
      
          for link in links:
              r = requests.get(link, verify=False)
              tree = fromstring(r.content)
              paras = tree.xpath('//div[@class="post-content"]/p')
              para = extract_paratext(paras)
              text = extract_text(para)  
              if not text:
                  continue
      
              yield '"%s" %s' % (text, link)
      

      You now have an idea of what a scraping process generally encompasses. You can now build your own, custom scrapers that can, for example, scrape the images in blog posts instead of random quotes. For that, you can look for the relevant <img> tags. Once you have the right path for tags, which serve as their identifiers, you can access the information within tags using the names of corresponding attributes. For example, in the case of scraping images, you can access the links of images using their src attributes.

      At this point, you've built two scraper functions for scraping content from two different websites, and you've also built two helper functions to reuse functionalities that are common across the two scrapers. Now that your bot knows how to tweet and what to tweet, you'll write the code to tweet the scraped content.

      Step 5 — Tweeting the Scraped Content

      In this step, you'll extend the bot to scrape content from the two websites and tweet it via your Twitter account. More precisely, you want it to tweet content from the two websites alternately, and at regular intervals of ten minutes, for an indefinite period of time. Thus, you will use an infinite while loop to implement the desired functionality. You'll do this as part of a main() function, which will implement the core high-level process that you'll want your bot to follow:

      bird/bot.py

      ...
      def scrape_thenewstack():
          ...
          yield '"%s" %s' % (text, link)
      
      
      def main():
          """Encompasses the main loop of the bot."""
          print('---Bot started---n')
          news_funcs = ['scrape_coursera', 'scrape_thenewstack']
          news_iterators = []  
          for func in news_funcs:
              news_iterators.append(globals()[func]())
          while True:
              for i, iterator in enumerate(news_iterators):
                  try:
                      tweet = next(iterator)
                      t.statuses.update(status=tweet)
                      print(tweet, end='nn')
                      time.sleep(600)  
                  except StopIteration:
                      news_iterators[i] = globals()[newsfuncs[i]]()
      

      You first create a list of the names of the scraping functions you defined earlier, and call it as news_funcs. Then you create an empty list that will hold the actual scraper functions, and name that list as news_iterators. You then populate it by going through each name in the news_funcs list and appending the corresponding iterator in the news_iterators list. You're using Python's built-in globals() function. This returns a dictionary that maps variable names to actual variables within your script. An iterator is what you get when you call a scraper function: for example, if you write coursera_iterator = scrape_coursera(), then coursera_iterator will be an iterator on which you can invoke next() calls. Each next() call will return a string containing a quote and its corresponding link, exactly as defined in the scrape_coursera() function's yield statement. Each next() call goes through one iteration of the for loop in the scrape_coursera() function. Thus, you can only make as many next() calls as there are blog links in the scrape_coursera() function. Once that number exceeds, a StopIteration exception will be raised.

      Once both the iterators populate the news_iterators list, the main while loop starts. Within it, you have a for loop that goes through each iterator and tries to obtain the content to be tweeted. After obtaining the content, your bot tweets it and then sleeps for ten minutes. If the iterator has no more content to offer, a StopIteration exception is raised, upon which you refresh that iterator by re-instantiating it, to check for the availability of newer content on the source website. Then you move on to the next iterator, if available. Otherwise, if execution reaches the end of the iterators list, you restart from the beginning and tweet the next available content. This makes your bot tweet content alternately from the two scrapers for as long as you want.

      All that remains now is to make a call to the main() function. You do this when the script is called directly by the Python interpreter:

      bird/bot.py

      ...
      def main():
          print('---Bot started---n')<^>
          news_funcs = ['scrape_coursera', 'scrape_thenewstack']
          ...
      
      if __name__ == "__main__":  
          main()
      

      The following is a completed version of the bot.py script. You can also view the script on this GitHub repository.

      bird/bot.py

      
      """Main bot script - bot.py
      For the DigitalOcean Tutorial.
      """
      
      
      import random
      import time
      
      
      from lxml.html import fromstring
      import nltk  
      nltk.download('punkt')
      import requests  
      
      from twitter import OAuth, Twitter
      
      
      import credentials
      
      tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
      
      oauth = OAuth(
              credentials.ACCESS_TOKEN,
              credentials.ACCESS_SECRET,
              credentials.CONSUMER_KEY,
              credentials.CONSUMER_SECRET
          )
      t = Twitter(auth=oauth)
      
      HEADERS = {
              'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5)'
                            ' AppleWebKit/537.36 (KHTML, like Gecko) Cafari/537.36'
              }
      
      
      def extract_paratext(paras):
          """Extracts text from <p> elements and returns a clean, tokenized random
          paragraph."""
      
          paras = [para.text_content() for para in paras if para.text_content()]
          para = random.choice(paras)
          return tokenizer.tokenize(para)
      
      
      def extract_text(para):
          """Returns a sufficiently-large random text from a tokenized paragraph,
          if such text exists. Otherwise, returns None."""
      
          for _ in range(10):
              text = random.choice(para)
              if text and 60 < len(text) < 210:
                  return text
      
          return None
      
      
      def scrape_coursera():
          """Scrapes content from the Coursera blog."""
          url = 'https://blog.coursera.org'
          r = requests.get(url, headers=HEADERS)
          tree = fromstring(r.content)
          links = tree.xpath('//div[@class="recent"]//div[@class="title"]/a/@href')
      
          for link in links:
              r = requests.get(link, headers=HEADERS)
              blog_tree = fromstring(r.content)
              paras = blog_tree.xpath('//div[@class="entry-content"]/p')
              para = extract_paratext(paras)  
              text = extract_text(para)  
              if not text:
                  continue
      
              yield '"%s" %s' % (text, link)  
      
      
      def scrape_thenewstack():
          """Scrapes news from thenewstack.io"""
      
          r = requests.get('https://thenewstack.io', verify=False)
      
          tree = fromstring(r.content)
          links = tree.xpath('//div[@class="normalstory-box"]/header/h2/a/@href')
      
          for link in links:
              r = requests.get(link, verify=False)
              tree = fromstring(r.content)
              paras = tree.xpath('//div[@class="post-content"]/p')
              para = extract_paratext(paras)
              text = extract_text(para)  
              if not text:
                  continue
      
              yield '"%s" %s' % (text, link)
      
      
      def main():
          """Encompasses the main loop of the bot."""
          print('Bot started.')
          news_funcs = ['scrape_coursera', 'scrape_thenewstack']
          news_iterators = []  
          for func in news_funcs:
              news_iterators.append(globals()[func]())
          while True:
              for i, iterator in enumerate(news_iterators):
                  try:
                      tweet = next(iterator)
                      t.statuses.update(status=tweet)
                      print(tweet, end='n')
                      time.sleep(600)
                  except StopIteration:
                      news_iterators[i] = globals()[newsfuncs[i]]()
      
      
      if __name__ == "__main__":  
          main()
      
      

      Save and exit bot.py.

      The following is a sample execution of bot.py:

      You will receive output showing the content that your bot has scraped, in a similar format to the following:

      Output

      [nltk_data] Downloading package punkt to /Users/binaryboy/nltk_data... [nltk_data] Package punkt is already up-to-date! ---Bot started--- "Take the first step toward your career goals by building new skills." https://blog.coursera.org/career-stories-from-inside-coursera/ "Other speakers include Priyanka Sharma, director of cloud native alliances at GitLab and Dan Kohn, executive director of the Cloud Native Computing Foundation." https://thenewstack.io/cloud-native-live-twistlocks-virtual-conference/ "You can learn how to use the power of Python for data analysis with a series of courses covering fundamental theory and project-based learning." https://blog.coursera.org/unlock-the-power-of-data-with-python-university-of-michigan-offers-new-programming-specializations-on-coursera/ "“Real-user monitoring is really about trying to understand the underlying reasons, so you know, ‘who do I actually want to fly with?" https://thenewstack.io/how-raygun-co-founder-and-ceo-spun-gold-out-of-monitoring-agony/

      After a sample run of your bot, you'll see a full timeline of programmatic tweets posted by your bot on your Twitter page. It will look something like the following:

      Programmatic Tweets posted

      As you can see, the bot is tweeting the scraped blog links with random quotes from each blog as highlights. This feed is now an information feed with tweets alternating between blog quotes from Coursera and thenewstack.io. You've built a bot that aggregates content from the web and posts it on Twitter. You can now broaden the scope of this bot as per your wish by adding more scrapers for different websites, and the bot will tweet content coming from all the scrapers in a round-robin fashion, and in your desired time intervals.

      Conclusion

      In this tutorial you built a basic Twitter bot with Python and scraped some content from the web for your bot to tweet. There are many bot ideas to try; you could also implement your own ideas for a bot's utility. You can combine the versatile functionalities offered by Twitter's API and create something more complex. For a version of a more sophisticated Twitter bot, check out chirps, a Twitter bot framework that uses some advanced concepts like multithreading to make the bot do multiple things simultaneously. There are also some fun-idea bots, like misheardly. There are no limits on the creativity one can use while building Twitter bots. Finding the right API endpoints to hit for your bot's implementation is essential.

      Finally, bot etiquette or ("botiquette") is important to keep in mind when building your next bot. For example, if your bot incorporates retweeting, make all tweets' text pass through a filter to detect abusive language before retweeting them. You can implement such features using regular expressions and natural language processing. Also, while looking for sources to scrape, follow your judgment and avoid ones that spread misinformation. To read more about botiquette, you can visit this blog post by Joe Mayo on the topic.





      Source link

      Run a Small Business? Here are 8 Ways to Manage Your Stress


      There are plenty of perks to owning your business — like being the boss, for one. You get to see your own plans and dreams come to life. And you can set the company track exactly as you see fit. But the downside to running a small business? All of the stress that comes with it. That’s why learning to manage stress is crucial for small biz owners.

      “Managing stress is important as a business owner because typically, we tend to be sole proprietors or have few employees,” says Amanda Pratt MSW, LCSW, CPLC, The Chronic Illness Therapist, Imagine Life Therapy. “This means that if we burn out, it can ultimately slow business progress or momentum and when we aren’t well, our businesses can’t be well. We also know that if we cope poorly with stress, we tend to have worse physical and mental health outcomes overall, so business owner or not, this is an area that I feel should be a top priority for all of us.”

      Reducing stress should always be at the top of your to-do list to keep you sane — and your company healthy, too. “That’s why it’s important not to feel guilty for stepping back or prioritizing some ‘me’ time,” says Poppy Greenwood, mental health advocate, serial entrepreneur and co-founder of female entrepreneur support platform Meyvnn.

      Luckily, there are plenty of small business stress management techniques that will help take away the tension and anxiety of your work. Give these tactics a try to manage your stress levels.

      8 Ways to Handle Small Biz Stress

      Stressed small business owner.

      1. Recognize What’s Going Well

      “This is one of the first things I will point out to clients — it’s just as important to recognize what’s going well (if not more so) as it is to recognize where things aren’t going so well,” Pratt says.

      “Strategies that work best for us tend to play off our strengths. It’s also good to take inventory of areas of coping where we tend to have more engaged or active responses to stress (versus disengaged responses) and can inform our future attempts at other areas of stress management. We all have habits that come more naturally to us that are healthy, and I believe these are the strategies we should tap into first to address when creating a stress management plan.”

      Plus, when you consider what’s going right with your business, that instantly puts you in a positive mindset, which makes it much easier to combat stress. “Taking stock of things that have gone well helps you put into perspective the change you are affecting and the growth that you have achieved,” Greenwood says. “Feeling that you’re making progress, no matter how small, is one of the best ways to relax. It helps you to recognize you’re on a journey, and that your work towards whatever goal you have is pushing you forwards.

      “It also just makes you feel more organized,” Greenwood says. “Being able to identify where things are working or are not makes you feel like you have control over what is happening, in what can feel like the chaos of running a business.”

      Focusing on the good things about your business also keeps your mind in the present. “When you’re stressed, your brain tells you that you have to stay vigilant,” says Drema Dial, Ph.D., psychologist and life coach. “Your brain goes into hyperdrive with all the things that could be going wrong, will go wrong, might have already gone wrong, and how will you fix it! This is one way our brain uses to keep us locked into familiar routines. This is precisely why it’s imperative to break this cycle, which keeps us chained to unhealthy coping behaviors and keeps your stress level high.”

      2. Identify Your Stressors

      “Identifying your stressors is vital to be able to tackle them,” Greenwood says. “Stress usually comes from a problem you haven’t yet started to solve or are having trouble solving. I think the best way to identify stressors is to take a step back. When you’re an entrepreneur, you’re constantly working and adjusting and testing to grow. Being in that kind of intense mindset all day long can really constrict a wider perspective you need to really pinpoint the areas that are causing you stress and how best to tackle them. Once you’ve identified what is causing you stress, you are much more able to work out how to deal with it. And even just identifying what is causing you stress can help alleviate some of it.”

      Remember that people respond to stress in their own unique way. “Self-awareness is key here because everyone is different,” says Mike McDonnell, international speaker, serial entrepreneur, global brand co-owner and podcaster. Once you know what stresses you out, you can delegate those tasks to others. If that’s not an option, knowing that a particular part of the job triggers anxiety can help you prepare to tackle it and just take a deep breath before going in. Over time, you can work on changing your response to the stressor.

      “We can do this through practicing mindfulness techniques to open our awareness to our body sensations, thoughts, and behaviors,” Pratt says. “We can also self-monitor through journaling or tracking mood states, symptoms and thought habits. And while it’s good to identify stressors, it’s even more important to identify our perceptions and responses to these stressors. Research shows us that it matters less what the stressor is and more how we respond to the stressor.”

      3. Build a Solid Schedule

      “Structure is important because the more we plan, the less we have to actively anticipate what might happen,” Pratt says. “Planning helps us have a greater sense of self-efficacy or confidence in our ability to handle whatever might come up.”

      When you have a regular routine, you know what to expect at work, and that gives you a sense of peace and control, making it easier to keep stress at bay. If you know in advance that you have a difficult item to cross off your to-do list, tackle it first thing in the morning to avoid that sense of dread. Plus, you’ll feel accomplished and ready to conquer whatever else comes your way.

      “Your body also likes a routine — it’s good for your circadian rhythm, which is effectively your internal body clock that can dictate things like when you feel tired or energized and can really impact your ability to focus,” Greenwood says. “For example, I know my energy and concentration dip around 3 p.m. So, in my routine around that time, I usually have a workout scheduled that gives me some time away to re-energize.”

      A common complaint from small business owners is that there are never enough hours in the day. “Usually when we delve into this issue, the problem is not a lack of time but a lack of a schedule,” Dial says. “A schedule allows a person to plan, to anticipate, and helps keep life organized. I recommend that all activities go onto a schedule, even play time!”

      4. Prioritize Your Time

      There’s a reason “self-care” has become such a buzzword — we’ve come to realize just how crucial it is to carve out time for ourselves to keep a healthy mental state. Looking after yourself is key to keeping stress under control.

      “Prioritizing ‘me time’ is really important because it is so easy to get caught up in what you’re doing, you can really forget about yourself and who you are — separate from your business,” Greenwood says. “Taking time for yourself, or using it to go out with friends and family, is often what re-affirms your belief in what you’re doing. It’s really important to not lose yourself within your business, because that, in the worst case scenario, then can lead to your business itself losing its way.”

      As a small business owner, it’s all too easy to fall into the trap of always being on the clock. Just as you schedule time for certain tasks you need to get done, you should schedule free time. “I teach clients to see their downtime as beneficial to creativity and efficiency because they tend to work better after taking a break,” Dial says. “Taking a break allows the brain to take in new information and to generate creativity.”

      5. Learn to Say ‘No’

      “When you’re starting out, you may not have the luxury of opportunities flying at you, so you say yes to everything,” McDonnell says. “But eventually you focus on your mission and ask yourself, ‘Will this help me get there?’ before deciding yes or no.”

      Of course, saying no can be really tough. But it’s important to remember your value and that you have limited time. “Instead of thinking you may offend the other person, it’s an opportunity to show them that when you decide to do something, you really value what you’re doing and you’re doing it on your terms,” Greenwood says.

      Otherwise, taking on more than you can handle is the fastest way to fall into a stress trap. “It’s important to learn that setting boundaries is necessary to safeguard small business owners’ well-being, their time, and to protect their business,” Dial says. “When approached with a request, the small business owner should ask themselves the following: ‘1. Is this something I want to do? 2. Do I have time to do it? 3. What is its importance level, and will it fit it into my schedule?’”

      Saying no is also key to setting boundaries. “When we don’t set boundaries, we end up feeling taken advantage of, burned out, stressed out, and end up as people pleasers, workaholics, isolated, or feeling misunderstood,” Pratt says. “Simply stated: Boundaries are one of the best things you can do for your physical and mental health and wellness.”

      6.Delegate or Outsource Tasks

      When you’re used to being the boss, it can be hard to let go and give up control. But as any small business owner knows, you can’t do it all. And if you’re trying to, then you’re probably not doing a good job at every single thing. That’s why learning how to delegate or outsource certain parts of the biz is a foundation for being successful.

      For example, do you struggle with Facebook but love working face-to-face with clients? Hiring a social media manager might free you up to do just that. Figure out how you want to spend your time — and what you’d rather avoid.

      In the end, outsourcing allows you to grow your company. “It’s important early on to recognize where your weaknesses are, so that you can hand over those areas to other people who do them much better,” Greenwood says. “Doing this can also relieve so much stress, not having a task hang over you that you know you need to do but that you struggle with and find time-consuming.”

      7. Choose Your Tools Wisely

      Work tools and software are meant to make your job easier — not harder. But if you’re spending more time learning how to use them than actually using them, it’s not doing you any favors. “It’s important to choose tools wisely, because they are meant to be the things that take away stress and help with tasks instead of adding to the problem,” Greenwood says.

      Opting for reliable small business appsweb management tools, and hosting services will always pay off in the end. Imagine if your business’ website went down? That’s why it’s worth using DreamHost hosting and WordPress to have one less thing to worry about.

      “Test out different software until you find the one that takes your stress away so you can benefit fully from it,” McDonnell says.

      Shared Hosting That Powers Your Purpose

      We make sure your website is fast, secure and always up so your visitors trust you.

      8. Unplug During Your Off-time

      “You’re not a robot,” Greenwood says. “You can’t work all the time and expect to maintain the same level of productivity and efficiency. You need to replenish your energy levels, and not just physically but mentally, emotionally, and spiritually. When you’re working on your business, you want to be present and in the moment. That would be difficult if you’re unable to unplug in your off time and feel a conflict between your work life and your personal life.”

      As a small business owner, you probably feel tied to your phone, but you need time away from answering emails and checking in with customers. “Unplugging and doing a digital detox allows parts of your brain to rest,” Dial says. “Reading, watching TV, going for a walk, and talking with others are all great ways to engage a different part of your brain. Make sure you take time for activities you find enjoyable. It’s essential to combat stress by seeking out experiences that will help restore you.”

      It’s especially important to power down your devices and avoid blue light, which can keep you awake, at least an hour before bedtime. Plus, you won’t have to worry about an email keeping you up that night. You’ll sleep better so you can be rested and alert for the next day of tending to your business.

      Breathe In and Out

      It’s no secret that running a small business is one of the most challenging (and stressful) things you’ll ever take on. But it’s also one of the most rewarding! So tell us: how do you manage your stress as a small-biz owner? What keeps you fired up as you “Rise and Grind?” Connect with us on Twitter and let us know your thoughts!





      Source link