Category: software

How to backup personal GitHub repositories

Published on: 01.08.2018

I will show how to do a backup of your GitHub repositories with python-github-backup

Why to bother with a backup of GitHub

I can already see that there will be comments regarding why to do the backup of GitHub.

  • “It is a waste of time.”
  • “GitHub internally already have backups.” (I hope so)
  • “They will not lose your code” (But maybe I will)
  • “They will not go overnight out of business.”

Response to all those comments is:
You will not be worst off if you have your own backup.

If forever reason (GitHub go under, all repositories deleted by accident, alien attack) GitHub is not available anymore, I have my own backup of code that I have written.

Paid solution

If you are looking for a paid solution, BackHub looks like a good solution.
I have no experience with BackHub, nor am I in any way associated with it.

Free solution

After researching all available options I have decided to go with python-github-backup because it had more stars and contributors on GitHub than other projects.

I have used the number of stars and contributors on GitHub as the assumption that python-github-backup is more in use than other solutions so there are more people who will continue to support it in future.

In order to access your GitHub personal data, you need to have a personal access token.

After that, you can install it with pip/pipenv:
(I have installed it in separate virtualenv)

pipenv install github-backup
or
pip install github-backup

Run it with:
/full/path/github_backup/venv/bin/github-backup sasa-buklijas -t your_personal_access_token -o /full/path/github_backup --all

This command will backup all your GitHub information to /full/path/github_backup directory.

It would be tiresome to run this command every day, so I have automated it on my online hosting.

Crontab

My usecase:
15 00 * * * /full/path/github_backup/venv/bin/github-backup sasa-buklijas -t your_personal_access_token -o /full/path/github_backup --all > /full/path/github_backup/last_log.txt

> /full/path/github_backup/last_log.txt is used to have an output of the last backup command.
>> can be used to have outputs of all backup commands, but I have found that having just last one is enough.

Conclusion

“You will not be worse off if you have your own backup.”

Do you have backups of your own personal GitHub repositories, if you have, what do you use for backup?

What programming language should you learn?

This is written for persons that do not know any programing language and they are thinking what programing language they should learn first.

Altho, I think that reasoning behind decisions in this article can help you with choosing your next programing language also.

If you want to do something, first know why you want to do it

What are you trying to accomplish?

Same is with learning programming language.

I have listed few main reasons why persons want to learn a programing language:

  • get a job (on-site or freelance)
  • make some software app/website/web app
  • just to learn to do programming

I want to learn programming to get a job

Programming jobs (and salaries) are location dependent, due to this reason do research which programming languages job are available in your area.

If you plan to move/migrate do same for that area.

Check the local programming jobs listing to get a clue.

It is good to visit local programming meetups, if you plan to be a professional software developer start on your networking also.

Meetups are also a good way to see who is hiring.

If you plan to do freelance then you are not location depended.

What, I would argue, is even making thing more difficult, because you do not have location constraint.

Anyway do cost/benefit analysis and pick some language that makes sense according to your own constraints.

I want to learn programming to make software

You want to make some software (desktop app, website, web app, mobile app, etc).

You could pay profession to do it for you, but for some reason (eg. you are still in high-school, etc) you want to do it by your self.

Do research and find out what programming language is best for software that you plan to build.

I personally optimize for time to market.

If you plan to make a web app, there is no reason for you to learn C++, believe me, there is not.

Currently, in the year 2018, there are already known programming languages (tools) for most of the use cases.

But you also need to be careful, because most software developers will suggest programming languages that they know.

So, do not ask just one person but at least few dozens.

And always ask them what is the reasoning behind their decision.

I want to learn programming just to know how to programme

You do not want a job, you have no idea what to make with programming, you just want to learn programming.

Then you can pick any language, altho my suggestion is to pick something that does have some real-life usage and it is not complicated for beginners.

My humble suggestion is to choose Python “… is easy for beginners, practical for professionals, and exciting for hackers …” from Fluent Python.

Conclusion

Know what are you trying to accomplish and pick programing language for that purpose.

The largest expense for the programmer

I will talk about largest expense from somebody who is developing software (primarily writing code) but from the business owner perspective (not from employe perspective).

I got this idea after starting to develop my own software products, not at the time when I was writing code for others.

The topic could be rephrased as “The largest expense for the business owner who is also the sole developer”.

But, I also think that it is correct for software development in general.

Time is not on your side

After prolong thinking about a subject, I have come to conclusion (of course I can be wrong), that largest expense is time.

By the time I mean how much time you will spend to make some software.

One can argue that this is also the only expense (with some hardware, room, and electricity).

So, next question is on what activity in software development is most of the time spent(or wasted).

Hardware is cheap

Better hardware (SSD, more RAM, faster CPU, etc) will reduce time in development and you should use it.

But hardware price is relatively cheap against other time expenses.

Let’s say that by using SSD you will get 10 more minutes of work per workday (altho I would argue that it is at least double).

Multiplying by 260 workdays per year, that is 2600 minutes or 43h.

1 TB SSD is 400$, so even if you only bill 10$ for your working hour, ROI is one year and this is no brain investment. (the number can change, but you get the point)

Anyway, moral of the story is that hardware is cheap.

The largest expense is learning how to do something new

Most of the time is spent on learning how to do something new.

New programing language, new frameworks, new libraries, new tools, new new new …

The endless supply of new things thing that needs to be learned.

Somebody could come to the conclusion learning how to learn fast is the solution.

Certainly learning fast is useful, but it is not the solution, because there are more things to learn that there is time to do it.

Temporary nature of the knowledge capital

And also, in software development, you have additional problems of “temporary nature of the knowledge capital”.

Basically, what you learn today, probably will not be useful in 5 years.

Maybe it will not even exist anymore.

So far this is nothing new, anybody, who has been programming for more than 5 years have practical experience that technologies (languages, frameworks, libraries, tools, etc) go away, new ones come and now you need to spend the time to learn new ways to do the old things.

What is new, or at least what I am trying to argue in the essay, is that from the business standpoint, we should look at it as an expense.

Especially as the largest expense in software development.

Somebody can say: “Hey douchebag, I like learning new technologies and using them in my job. You are just some old dinosaur who does not like programming and probably was never good at it.”

Well, I can agree that 35 I am certainly not young anymore.

But my age does not change the fact is that learning something is not same as doing something.

If I want to do something, first I must know how to do it, and to know it, you need to learn it.

So, learning is an expense to doing.

And if for everything that you need to do, first, you need to learn, that is a lot of learning (and big expense).

Is programming is young man game?

I would say that part of the problem is that most programmers are young and do not have much life/work experience.

Statistic from StackOverflow 2018 Developer Survey Results support that calim.

30% have only 2 years of professional coding. That is one third.

57.5% have only 5 years of professional coding.

In some work field, even after 5 years, you are still considered just a beginner.

And only 12,7% have more than 15 years of professional coding.

Hobby or business

Also 81% of professional developers code as a hobby.

I do not think that this is a bad thing.

But at the same time, if you consider something a hobby, then you will not treat it as a business.

In business, there is income and expense.

And by subtracting expense from income, you get profit.

And you should have some if you expect for your business to survive.

In the hobby, there is only fun.

That is why it is called the hobby.

And hobby is the expense, from the bookkeeping perspective, but everybody is considering his/her hobby as fun, not as the expense.

These two reasons, probably more the second one, are probably main reasons why most programmes do not see learning as the expense.

At least, until they have burnout and then they switch to something else, like project management (real job title should be: project reporting) or leave software development completely.

Looking from the previously mentioned statistic only 6.9% of developer are older than 45 years.

But it is well known that programming is mostly young man game, probably due to “temporary nature of the knowledge capital”.

How to reduce your largest expense

I think that specialization is the large part of the solution.

Find your niche and stick to it.

Then you can reduce (or even completely eliminate) the expense of time spent on learning how to program something and can acquire domain-specific knowledge.

When to learn new things

I am not saying that you should never learn new things.

Not at all.

Just do cost/benefit analysis before.

I will use my own experience as an example.

I do/did a lot of web scraping.

First I have done it with Beautiful Soup and a lot of custom code.

I wrote my own caching, ORM, etc.

In web scraping, the easiest part is to write XPath selectors, there are a lot of other things and over time I have made my own small framework for all of that.

And with each new website that I scraped I had to and new features or improve old ones.

And this was taking larger and larger percent of my time as I was scraping more and more challenging websites.

After some time I decided to try Scrapy .

To scrape the first website with Scrapy I need around one week, with 90% of the time was spent on learning Scrapy (learning how to do the old thing with the new framework).

If I have used my custom old framework I would be finished it in 3 days.

But I would have to write more code than with Scrapy.

Currently, with Scrapy in one day, I can do scraping that I needed at least 3-4 days with my old custom framework.

This is the example when learning new framework was useful.

Conclusion

In software development business for a software developer, the largest expense is time spent learning new technologies.

Reduce it by finding your niche and specializing.

Use cost/benefit analysis to determine should you use learn some new technology.

RS232 sniffing with Python

Published on: 15.04.2018

My motivation for monitoring RS232 with Python.

I had PC and LG TV screen communicating via RS232 cable.

PC was turning TV screen on and off via RS232 commands.

But TV screen was not acting normally, it was always in standby mode.

So, either PC was not sending RS232 commands or TV screen was not responding to them.

In order to diagnose the problem, I decided to spy on their communication.

half-duplex and full-duplex RS232

There are two types of communication in RS232: half-duplex and full-duplex.

In half-duplex while one device is talking another one is listening, just like people communication should be.

In full-duplex both devices are talking and listening at the same time, people think that they can do this also, but it is not working in practice.

Hardware

The first step was to make special half duplex RS232 spy cable because I did not have software access on PC.

So the only solution was to sniff communication between PC and TV.

Schematics are available from https://www.lammertbies.nl/comm/cable/RS-232-spy-monitor.html.

On the same website, there are also Schematics for full duplex RS232 spy cable.

Software

For redirecting, all communication data from the RS232 compliant serial port device into a text file Eltima RS232 Data Logger can be used.

The program is working fine, the only problem that I had was that there is no output on screen just to text file.

No real-time watching of communication data was possible, due to that I decided to make my own program for viewing RS232 communication data in real-time.

Python to the rescue

The best thing about Python program language is that there is a package for everything.

For access to the serial port in Python, there is pySerial package.

My Python program was reading communication data and showing it on screen in real-time with timestamps of reading, and time difference between the last read.

Python source code for RS232 sniffing:

How to find all emails on the web page ?

Published on: 15.03.2018

Conclusion

Use get_emails() from webscraping Python package.

Python strength

The best thing about Python is huge numbers of 3rd party packages.

With a lot of them, you can solve your problems with just a few lines of code.

Let’s say that you want to find all emails in some HTML document, either for an offline or online web page.

This can be done with webscraping package.

First, install it with:

Code for finding all emails on the single page is:

Line 1 is importing download and alg from webscraping package that you have just installed.

Line 3 is creating download.Download() object and calling it D.

Line 4 is saving the web page from where you want to find all emails in html variable.

Line 6 is finding all emails from your html variable and saving all emails in emails Python list.

Line 8 is showing all emails that have been found on the screen.

This will work for a single web page.

How to find emails on the whole site

If you want to search the whole website for emails, not just one page, you can use following code.

With max_depth, max_urls, max_emails parameters you can define how long your searching should be.

Happy spamming.

P.S. just joking 🙂

Should you learn C/C++ in 2018 ?

Published on: 15.02.2018

Conclusion

I would not recommend learning C/C++ as your first programing language just for learning to programme, better go with Python.

Also, if you do not know why you want to learn C/C++, better do not spend time on it.

My C/C++ background

Most of my professional programming career, around 10 years I have spent writing C/C++ programs.

By professional programming career, I mean that other people have paid me to write code.

I have spent more time in C/C++, altho I would say that I know Python better than C/C++.

My knowledge of Python is better because Python has fewer features than C/C++ and you do not need to do memory management.

Why C/C++ is hard

IMHO C/C++ is hard due to many features and memory management.

By the time you understand memory management, you can learn 80% of Python.

Just look at this image of code that is concentrating 2 strings in C vs. Python.

Yes, there can be less C code (error checking, free, few const less) and the code would still work.

But then I would be showing bad C production code and not be demonstrating how real-world C code should be written.

It is not fun to debug segmentation fault and that is what you get if there is no proper error checking in C.

Python code could, and in real life should also have if __name__ == "__main__":, but I have removed it to because it is not necessary like in C.

Point is that higher level programing language (Python in this example) is doing a lot of low-level things automatically.

So, you need fewer lines of code, what means fewer possible bugs, resulting in increased developer productivity.

The tradeoff is more CPU and memory for faster time to market.

C and C++ are two different programming languages

Here I am using a language construct C/C++.

Because most of the code bases (at least that I have seen in a corporate environment) are some hybrid of C and C++.

My theory is that the older code was written in C and then later they added object-oriented programming (C++).

But C and C++ are not the same programming languages, mental programming models are quite different.

C code is about structure, functions, and pointers.

In C++ you have object-oriented programming and lot of other features.

In the real world, you need to know C and C++, so that is why I use C/C++.

When you should learn C/C++

The only positive thing that I can see that person will learn from learning C/C++ is manual memory management and pointers.

Basically better understanding how the computer (on software level) is working.

If your only reason is to have a better understanding how the computer (on software level) is working, better learn C programing language.

C has fewer features than C++.

It is easier to learn C++ if you know C because then you should know manual memory management and pointers.

But for anything else than few niches (system, embedded, banking, game engines, etc), I do not see a much practical use of C/C++ in the year 2018.

Probably you ain’t gonna use C/C++

I am not saying C/C++ is not in use anywhere in 2018.

I personally (almost every month) get some interview requests for C/C++ positions.

Usually, they are either for embedded software (C) or banking industries (C++).

I am just arguing that beginners (persons that do not know any programing language) should not start with C/C++.

The only exception is if you are planning to get a job (or start your business) in C/C++ environment.

If it is not for the job/business, I do not see why you should learn C/C++, except for hobby.

But even then, get a better hobby 🙂.

P.S. If you, dear reader, think that I am missing some point, please add it in comments.

Always start with simple solution (ConfigParser vs. JSON for Python configuration file)

Published on: 01.01.2018

Conclusion

Sometimes perfect is the enemy of the good.

Why use configuration file

I had Python program, that needed to access device via IP address and to do some diagnostic and commands on it.

At that time, I was accessing only one device (only one IP address).

But could see that in future (in few months to one year), I will need to do the same set of command on more devices.

One solution is to add IP address as a parameter to CLI program.

In my use case number of IP address that needs to be accessed will never be bigger than 34.

And writing 34 IP addresses as CLI parameter, that is around 373 letters, is not a nice solution.

When you need to read it, to see are all IP addressee included, it is not easy to read.

Python code as configuration was not possible

I was distributing my Python code as EXE, so use of Python code as configuration was not possible.

Altho, I think that Python code as the configuration is a good solution if you are executing source code, and only developer (not average user who does not know what Notepad is) will edit it.

Sometimes perfect is enemy of good

Sometimes I do have a problem that I tend to unnecessarily complicate things.

Because I think about all possible edge cases and all possible future uses.

And from these two, all possible future uses are the real problem.

Edge cases can happen, but is there a positive cost benefit to solve them and how?

This need to be determined on a case by case basis.

But “all possible future use” is trying to anticipate the future.

That is impossible.

From my experience, whenever I add code for future use cases, usually it was waste of time.

Even when I am the only user, so I can argue to my self that I know will need it.

Usually, I do not need it.

So, I have decided to eliminate waste when developing software, starting from this project.

For example, here I needed a configuration file that will have a list of IP addresses, which I will iterate in for loop.

This was the smallest requirement that I needed for my problem.

But immediately I was thinking that it would be nice to have:

  • checking if IP address is in the valid format
  • if I have 5 address that I need to write all 5, but I could add special syntax for that. Like 192.168.1.1 --- 192.168.1.5, and this can really get more complicated if you want to cover all edge cases.
  • I only need IPv4, but can I also ad IPv6
  • if there is some error in the configuration file, there can be a useful error message and suggested a solution
  • another configuration file can be added where I would define what are valid IP addresses, and then I can validate requested IP against valid IP

And the list can go on.

But I said to myself, NO.

You will just make code that can read a list of items as a list in Python from the configuration file.

Nothing more.

If when you have the real need, you will add additional features when needed.

When I think about this, maybe this is an example of premature design?

Premature design is deciding too early what a program should do.

Why JSON is better than Configparser for list datatype

After investigating possible solution, decision was to try ConfigParser.

ConfigParser configuration file was:

Code was:

I found few problems with ConfigParser:

  1. It did not have build in functionality to read data as Python List
    • so even for this simple example I had to write custom code (this is why I have get_as_list() function)
  2. There were no rules in format, eg. "192.168.1.36" and 192.168.1.36 was both valid.
    • this was not big problem, but I just did not like it

Let try JSON (as configuration files in Python)

JSON configuration file was:

Code was:

There is much less code, and line count is important.

One thing that I did not like about JSON is "ValueError: No JSON object could be decoded", this is error message that you get if you JSON is invalid.

I was hoping to get some more details, eg. what token in which line.

But you can not have it all, and ConfigParser was no bettter.

Decision was to use JSON as configuration file for Python, because code was less complicated (less lines of code).

I did not tried YAML.

Syntax, at least to me, is less readable than JSON.

Questions and remarks please leave in comments.

Automatic backup of git repositories to Dropbox with Python

Published on: 01.12.2017

Intro

I will show how to upload files to Dropbox from Python code.

Why do I need this?

Currently, I am only using WebFaction for all my web services and also as my private git server.

I wanted to make an automatic backup of my git repositories to Dropbox.

Dropbox App

I order to upload files to Dropbox you need to have an access token.

And for the access token, you need to register your app on DBX platform.

All of this must be done on Dropbox website.

The first step is to go to https://www.dropbox.com/developers/apps/ and press “Create App” button.

Step 1

Just click

Just click “Create app” button

Step 2

New app on DBX Platform

We will use Dropbox API.

We will choose “App folder” because we will just upload one backup to Dropbox, we do not need full access to all our files.

Name your app and click “Create App” button.

Step 3

Use defaults settings

We will use defaults settings, here we will get the access token so, click “Generate access token” button.

Step 4

Access token generated

Now you have your access token, you will need it in your code, so copy it.

Step 5

Dropbox app

Now we have our “my_git_backup” Dropbox app.

pip install

It is always recommended to use virtual environments inside python.

At least I use them always.

Code

I am using fabric to make my life(code) easier.

I use fabric every time when I am calling CLI command from Python.

I will explain NUMBER_OF_BACKUP_TO_KEEP later, I use it at the end of the program.

All code that follows is inside with lcd(remote_directory): Python context manager.

The context manager is used so all code that follows is executed inside remote_directory directory.

Name of backup file will be YYYYMMDD_HHMM_git_backup.zip where upper case letters are date and time when a program was executed.

Eg. 20171121_1856_git_backup.zip so that we know from when is this backup file.

For making an actual backup, zip CLI command is used, we are only doing the backup of files that end on *.git (in my case only git repositories).

I also have LAME_PASSWORD for basic protection.

This is why I used fabric, just by calling local() function you can execute CLI commands.

The first line is the opening connection to your Dropbox application, you need to add your own access token as an argument.

Next two lines are for upload, you are: opening file, reading it and uploading bytes to Dropbox.

In Dropbox documentation is mention that this is only working for files till 150MB in size.

With last line program is deleting the local backup.

First for loop is getting all files from your backup folder in a list.

Second for loop is deleting all files except, last few files.

How many files to keep (otherwise we need manually to delete old backup files) is define in NUMBER_OF_BACKUP_TO_KEEP from the beginning of the code.

I keep it at 10, more than that I do not need.

Because we have date and time in our filename we can use Python sort function to sort files by when the backup was done.

The program can be run with
fab -f fabfile git_backup_to_dropbox

First is fab because we used fabric, fabfile because fabfile.py is file of our source code and git_backup_to_dropboxis name of the function that we are executing from fabfile.py file.

How I run this automaticaly

I personally run this command from crontab once per day.
35 02 * * * /home/user_name/code/venv/bin/fab -f /home/user_name/code/fabfile git_backup_to_dropbox

Conclusion

This can be used for backup of any folder as zip file automaticaly to Dropbox.

For any questions, please write them in comments.

Controlling NEC display from Python with nec-pd-sdk

Update on: 07.03.2013
Updated version available on Twilio blog.

Published on: 01.11.2017

Conclusion

pip install nec-pd-sdk

My Story

I was responsible for maintenance of one spectacular 17 meters tall audio/video system on a cruise ship.

The system had 34 NEC X551UN screens among other components.

Waterfall from top

Behind each screen, there is a SDI-to-DVI converter.

If a picture on the screen was black, usually there was some problem with SDI-to-DVI converter, mostly power supply was broken.

Or NEC screen was broken, but I never had it in practice.

Special NEC screen

But, also there was one special screen, it was black from time to time.

After restart mostly fine, and SDI-to-DVI converter was fine.

After one month of troubleshooting, I have come to the conclusion that problem is with NEC screen.

It just got stuck every few days (sometimes every second day, some time was fine for a week), and simple restart (sometimes of 5 seconds and sometimes of 5 minutes) would solve issue till next time.

I also know that when there is a black picture on this screen, then screen diagnostic was “No signal.”

NEC screen no signal

I have come to the conclusion that the following code could solve the problem:

And then to run this code on a schedule, like every hour.

Existing NEC software

NEC have two software applications for managing their products.

First is PD Comms Tool, you can remotely get and set all values to a screen.

It also has a scripting language.

I have used it for setting scheduler for all 34 screens and change of time.

It is much faster than manually doing it for each screen.

The second one is NaViSet Administrator 2 it is much more powerful than “PD Comms Tool”.

It can be used for monitoring all your NEC screens and also some additional equipment (like projectors and Windows PC)

It also has a visual scripting language where you can set and get multiple parameters according to some condition.

And then you also can set specific scheduler for each script.

I could have used this tool for my problem, but there was just one problem, it did not have sleep/pause command.

Design

I know that existing NEC software is communicating with the screen via TCP/IP.

Full protocol documentation is at http://www.necdisplay.com/documents/UserManuals/External_Control_P.V.X-series.pdf, but I was not so eager to write custom TCP/IP packets.

I wanted something more readable and simple.

I googled “NEC python” and found about nec-pd-sdk, what is python SDK for NEC screens.

There is no textual documentation, but there are few examples.

Most useful for me was test_routines_example.py, it is showing how to get every parameter.

The command for turning screen ON and OFF was found in source code.

Code for turning NEC screen ON and OFF

Here is the code:

Last Words

When I was investigating NaViSet Administrator 2 for my use-case.

I contacted NEC support, to ask can them can I use it for this purpose.

They told me not and suggested to use TCP/IP External_Control.

So, even support from NEC does not know that they have NEC python SDK.

What is sad, considering that their NEC python SDK is useful software.

Questions and remarks, please leave in comments.

Make standalone executable from Python code with PyInstaller

Published on: 01.09.2017

I wanted to create single file that person could run on Windows machine, from my Pythone code.

After some investigation I found PyInstaller and 1 hour later I had my EXE file from Python code.

Process for generating EXE files from Python code with PyInstaller was quite easy, at least from my experience.

I have used it on Windows 7 64-bit and had no problems.

My program was one file script with 300 line and dependencies to docopt and pyautogui.

Steps for generating exe file with PyInstaller

This will install PyInstaller
pip install pyinstaller

This will generate script.exe in dist directory
pyinstaller --onefile script.py

After this you have your EXE program.

How PyInstaller is working

Here I have used --onefile option for PyInstaller what will make one file EXE program.

If you just use pyinstaller script.py, with out --onefile option, than in dist folder you will get folder script with EXE file and all additional files for your EXE file to work.

If you use --onefile option, then your one file EXE program need every time to uncompresses all files every time when it starts.

Uncompression is described in details in official documentation, temporary directory for uncompression in Windows is %TEMP%.

Some other solutions:
http://nsis.sourceforge.net/Main_Page
http://nuitka.net/
https://pypi.python.org/pypi/pynsist
https://cx-freeze.readthedocs.io/en/latest/
http://www.py2exe.org/

Comparison of some others solutions.