Python Testing with pytest, book review

python_testing_with_pytest-Book_Cover

Published on: 15.05.2018

Number of pages: 220
Written by: Brian Okken
Publish by: The Pragmatic Bookshelf

Conclusion
The only book about pytest.

Review
If you want to learn how to use pytest for testing Python programs, this book will be useful to you.

I did not know much about pytest, before reading this book, except that it was related to unit testing in Python.

So, I was very surprised to learn that it much more (fixtures, plugins, configuration, etc) than just better unit testing for Python.

The largest expense for the programmer

I will talk about largest expense from somebody who is developing software (primarily writing code) but from the business owner perspective (not from employe perspective).

I got this idea after starting to develop my own software products, not at the time when I was writing code for others.

The topic could be rephrased as “The largest expense for the business owner who is also the sole developer”.

But, I also think that it is correct for software development in general.

Time is not on your side

After prolong thinking about a subject, I have come to conclusion (of course I can be wrong), that largest expense is time.

By the time I mean how much time you will spend to make some software.

One can argue that this is also the only expense (with some hardware, room, and electricity).

So, next question is on what activity in software development is most of the time spent(or wasted).

Hardware is cheap

Better hardware (SSD, more RAM, faster CPU, etc) will reduce time in development and you should use it.

But hardware price is relatively cheap against other time expenses.

Let’s say that by using SSD you will get 10 more minutes of work per workday (altho I would argue that it is at least double).

Multiplying by 260 workdays per year, that is 2600 minutes or 43h.

1 TB SSD is 400$, so even if you only bill 10$ for your working hour, ROI is one year and this is no brain investment. (the number can change, but you get the point)

Anyway, moral of the story is that hardware is cheap.

The largest expense is learning how to do something new

Most of the time is spent on learning how to do something new.

New programing language, new frameworks, new libraries, new tools, new new new …

The endless supply of new things thing that needs to be learned.

Somebody could come to the conclusion learning how to learn fast is the solution.

Certainly learning fast is useful, but it is not the solution, because there are more things to learn that there is time to do it.

Temporary nature of the knowledge capital

And also, in software development, you have additional problems of “temporary nature of the knowledge capital”.

Basically, what you learn today, probably will not be useful in 5 years.

Maybe it will not even exist anymore.

So far this is nothing new, anybody, who has been programming for more than 5 years have practical experience that technologies (languages, frameworks, libraries, tools, etc) go away, new ones come and now you need to spend the time to learn new ways to do the old things.

What is new, or at least what I am trying to argue in the essay, is that from the business standpoint, we should look at it as an expense.

Especially as the largest expense in software development.

Somebody can say: “Hey douchebag, I like learning new technologies and using them in my job. You are just some old dinosaur who does not like programming and probably was never good at it.”

Well, I can agree that 35 I am certainly not young anymore.

But my age does not change the fact is that learning something is not same as doing something.

If I want to do something, first I must know how to do it, and to know it, you need to learn it.

So, learning is an expense to doing.

And if for everything that you need to do, first, you need to learn, that is a lot of learning (and big expense).

Is programming is young man game?

I would say that part of the problem is that most programmers are young and do not have much life/work experience.

Statistic from StackOverflow 2018 Developer Survey Results support that calim.

30% have only 2 years of professional coding. That is one third.

57.5% have only 5 years of professional coding.

In some work field, even after 5 years, you are still considered just a beginner.

And only 12,7% have more than 15 years of professional coding.

Hobby or business

Also 81% of professional developers code as a hobby.

I do not think that this is a bad thing.

But at the same time, if you consider something a hobby, then you will not treat it as a business.

In business, there is income and expense.

And by subtracting expense from income, you get profit.

And you should have some if you expect for your business to survive.

In the hobby, there is only fun.

That is why it is called the hobby.

And hobby is the expense, from the bookkeeping perspective, but everybody is considering his/her hobby as fun, not as the expense.

These two reasons, probably more the second one, are probably main reasons why most programmes do not see learning as the expense.

At least, until they have burnout and then they switch to something else, like project management (real job title should be: project reporting) or leave software development completely.

Looking from the previously mentioned statistic only 6.9% of developer are older than 45 years.

But it is well known that programming is mostly young man game, probably due to “temporary nature of the knowledge capital”.

How to reduce your largest expense

I think that specialization is the large part of the solution.

Find your niche and stick to it.

Then you can reduce (or even completely eliminate) the expense of time spent on learning how to program something and can acquire domain-specific knowledge.

When to learn new things

I am not saying that you should never learn new things.

Not at all.

Just do cost/benefit analysis before.

I will use my own experience as an example.

I do/did a lot of web scraping.

First I have done it with Beautiful Soup and a lot of custom code.

I wrote my own caching, ORM, etc.

In web scraping, the easiest part is to write XPath selectors, there are a lot of other things and over time I have made my own small framework for all of that.

And with each new website that I scraped I had to and new features or improve old ones.

And this was taking larger and larger percent of my time as I was scraping more and more challenging websites.

After some time I decided to try Scrapy .

To scrape the first website with Scrapy I need around one week, with 90% of the time was spent on learning Scrapy (learning how to do the old thing with the new framework).

If I have used my custom old framework I would be finished it in 3 days.

But I would have to write more code than with Scrapy.

Currently, with Scrapy in one day, I can do scraping that I needed at least 3-4 days with my old custom framework.

This is the example when learning new framework was useful.

Conclusion

In software development business for a software developer, the largest expense is time spent learning new technologies.

Reduce it by finding your niche and specializing.

Use cost/benefit analysis to determine should you use learn some new technology.

RS232 sniffing with Python

Published on: 15.04.2018

My motivation for monitoring RS232 with Python.

I had PC and LG TV screen communicating via RS232 cable.

PC was turning TV screen on and off via RS232 commands.

But TV screen was not acting normally, it was always in standby mode.

So, either PC was not sending RS232 commands or TV screen was not responding to them.

In order to diagnose the problem, I decided to spy on their communication.

half-duplex and full-duplex RS232

There are two types of communication in RS232: half-duplex and full-duplex.

In half-duplex while one device is talking another one is listening, just like people communication should be.

In full-duplex both devices are talking and listening at the same time, people think that they can do this also, but it is not working in practice.

Hardware

The first step was to make special half duplex RS232 spy cable because I did not have software access on PC.

So the only solution was to sniff communication between PC and TV.

Schematics are available from https://www.lammertbies.nl/comm/cable/RS-232-spy-monitor.html.

On the same website, there are also Schematics for full duplex RS232 spy cable.

Software

For redirecting, all communication data from the RS232 compliant serial port device into a text file Eltima RS232 Data Logger can be used.

The program is working fine, the only problem that I had was that there is no output on screen just to text file.

No real-time watching of communication data was possible, due to that I decided to make my own program for viewing RS232 communication data in real-time.

Python to the rescue

The best thing about Python program language is that there is a package for everything.

For access to the serial port in Python, there is pySerial package.

My Python program was reading communication data and showing it on screen in real-time with timestamps of reading, and time difference between the last read.

Python source code for RS232 sniffing:

Fluent Python, book review

Fluent_Python-Book_Cover

Published on: 01.04.2018

Number of pages: 792
Written by: Luciano Ramalho
Publish by: O’Reilly Media

Conclusion
The book is good for advanced Python, it covers a lot of and in details.

Review
This book is for developers who want to get their Python skills to advance level.

It is not for beginners, for beginners in Python I recommend Automate The Boring Stuff With Python.

Book Effective Python also covers advanced topics, but this one covers in more details.

Both books are good, I would recommend this one because it describes topics in more details, but that is why this book has much more pages.

How to find all emails on the web page ?

Published on: 15.03.2018

Conclusion

Use get_emails() from webscraping Python package.

Python strength

The best thing about Python is huge numbers of 3rd party packages.

With a lot of them, you can solve your problems with just a few lines of code.

Let’s say that you want to find all emails in some HTML document, either for an offline or online web page.

This can be done with webscraping package.

First, install it with:

Code for finding all emails on the single page is:

Line 1 is importing download and alg from webscraping package that you have just installed.

Line 3 is creating download.Download() object and calling it D.

Line 4 is saving the web page from where you want to find all emails in html variable.

Line 6 is finding all emails from your html variable and saving all emails in emails Python list.

Line 8 is showing all emails that have been found on the screen.

This will work for a single web page.

How to find emails on the whole site

If you want to search the whole website for emails, not just one page, you can use following code.

With max_depth, max_urls, max_emails parameters you can define how long your searching should be.

Happy spamming.

P.S. just joking 🙂

Hackers and Painters, book review

hackers_and_painters-Book_Cover

Published on: 01.03.2018

Number of pages: 272
Written by: Paul Graham
Publish by: O’Reilly Media

Conclusion
One of the classics book for hackers (software developers).

If you are just in the business of software, I recommend that you read it.

Review
The book is composed of 15 essays:

Why Nerds Are Unpopular
Hackers and Painters
What You Can’t Say
Good Bad Attitude
The Other Road Ahead
How to Make Wealth
Mind the Gap
A Plan for Spam
Taste for Makers
Programming Languages Explained
The Hundred-Year Language
Beating the Averages
Revenge of the Nerds
The Dream Language
Design and Research

Most of them are available on Paul Graham blog.

My personal favorites are Mind the Gap and How to Make Wealth.

Should you learn C/C++ in 2018 ?

Published on: 15.02.2018

Conclusion

I would not recommend learning C/C++ as your first programing language just for learning to programme, better go with Python.

Also, if you do not know why you want to learn C/C++, better do not spend time on it.

My C/C++ background

Most of my professional programming career, around 10 years I have spent writing C/C++ programs.

By professional programming career, I mean that other people have paid me to write code.

I have spent more time in C/C++, altho I would say that I know Python better than C/C++.

My knowledge of Python is better because Python has fewer features than C/C++ and you do not need to do memory management.

Why C/C++ is hard

IMHO C/C++ is hard due to many features and memory management.

By the time you understand memory management, you can learn 80% of Python.

Just look at this image of code that is concentrating 2 strings in C vs. Python.

Yes, there can be less C code (error checking, free, few const less) and the code would still work.

But then I would be showing bad C production code and not be demonstrating how real-world C code should be written.

It is not fun to debug segmentation fault and that is what you get if there is no proper error checking in C.

Python code could, and in real life should also have if __name__ == "__main__":, but I have removed it to because it is not necessary like in C.

Point is that higher level programing language (Python in this example) is doing a lot of low-level things automatically.

So, you need fewer lines of code, what means fewer possible bugs, resulting in increased developer productivity.

The tradeoff is more CPU and memory for faster time to market.

C and C++ are two different programming languages

Here I am using a language construct C/C++.

Because most of the code bases (at least that I have seen in a corporate environment) are some hybrid of C and C++.

My theory is that the older code was written in C and then later they added object-oriented programming (C++).

But C and C++ are not the same programming languages, mental programming models are quite different.

C code is about structure, functions, and pointers.

In C++ you have object-oriented programming and lot of other features.

In the real world, you need to know C and C++, so that is why I use C/C++.

When you should learn C/C++

The only positive thing that I can see that person will learn from learning C/C++ is manual memory management and pointers.

Basically better understanding how the computer (on software level) is working.

If your only reason is to have a better understanding how the computer (on software level) is working, better learn C programing language.

C has fewer features than C++.

It is easier to learn C++ if you know C because then you should know manual memory management and pointers.

But for anything else than few niches (system, embedded, banking, game engines, etc), I do not see a much practical use of C/C++ in the year 2018.

Probably you ain’t gonna use C/C++

I am not saying C/C++ is not in use anywhere in 2018.

I personally (almost every month) get some interview requests for C/C++ positions.

Usually, they are either for embedded software (C) or banking industries (C++).

I am just arguing that beginners (persons that do not know any programing language) should not start with C/C++.

The only exception is if you are planning to get a job (or start your business) in C/C++ environment.

If it is not for the job/business, I do not see why you should learn C/C++, except for hobby.

But even then, get a better hobby 🙂.

P.S. If you, dear reader, think that I am missing some point, please add it in comments.

Samsung ML-1520 on OS X 10.12 (Mac OS Sierra)

Published on: 01.02.2018

I do not like doing upgrades

Especially operating system (OS) upgrades.

The reason, why I do not like them, is because usually after update something is not working as before or even not working at all.

The feature from software is changed, drivers not available anymore, software is not working with a new OS, sometimes new OS is slower than old one, etc.

The only reason why I do upgrade is if I have to (some feature is only working on new OS) or have new hardware (then anyway I need to install OS so I can try the last version).

Regularly I upgrade only iPhone and iPad because I do not use it for work, more for fun (if some software is not working anymore I can live without it).

I bought new SSD

For my 7 years old computer, I decided to upgrade my HDD to SDD because it was the bottleneck.

Upgrade will be done from OS X 10.9 to OS X 10.12.

At that time 10.13 was available as a beta.

But I will not use the beta if I do not have to.

When I did iPhone development with XCode I usually had to use newest OS X to have access to the last SDK.

Always test before upgrade

Luckily I have one Mac Mini, so I decided to do a test installation on it first.

So that I am sure that all software that I use is working fine before I upgrade my main computer.

The biggest problem I had with my Samsung ML-1520 laser printer.

I do remember that even on OS X 10.9 it was also not working by default, but with http://guigo.us/mac/splix/ I manage to have it working.

On OS X 10.12 http://guigo.us/mac/splix/ are still needed, but will not get the job done.

After 2 hours, I manage to have it working on OS X 10.12 (Mac OS Sierra) but it was not easy.

652 WLXKJ USB network server

I have Samsung ML-1520 connected to 652 WLXKJ USB network server so that I can share Samsung ML-1520 as a network printer.

For last 7 years, 652 WLXKJ is working with no problems.

There is no official support for it (I do not even know who made this thing), and you have to figure things on your own (like to add USB stick to it, if you want to use it as network print server).

652 WLXKJ USB server
652 WLXKJ USB network server working with 15 years old USB stick

Samsung ML-1520 on OS X 10.12 (Mac OS Sierra)

The only way how I have found that Samsung ML-1520 laser printer can work with OS X 10.12 (Mac OS Sierra), is to add a PPD (Postscript Printer Description) file manually but http://guigo.us/mac/splix/ must also be installed.

Luckily, PPD file is available at SpliX website, download the source code, unzip it, ml1520.ppd is inside splix-2.0.0/ppd.

There is also ml1520fr.ppd (French language) and ml1520pt.ppd (Portuguese Brazil language), .

After this printing of PDF and DOC documents did work (and I have only tested with PDF and DOC documents).

Steps for installing Samsung ML-1520 on OS X 10.12 (Mac OS Sierra)

Install Splix-2.0.0.mpkg

Splix-2.0.0.mpkg can be downloades from http://guigo.us/mac/splix/ as Splix-2.0.0.zip, unzip it and install it.

You will have the error message The installation failed but this is fine, you need to have, I know it is strange.

the error message The installation failed after installing Splix-2.0.0.mpkg, but this is fine

This is an important step, if you do to it, later when you install Samsung ML-1520 you will have The software for the printer was installed incorrectly. Please reinstall the software from the manufacturer error message and Samsung ML-1520 will not work.

error if Splix-2.0.0.mpkg is nto installed

Add ml1520pt.ppd Postscript Printer Description for Samsung ML-1520

all information from Samsung ML-1520

Add Samsung ML-1520 in System Preference... inside Printers & Scanners
Most important is to in Use: select Other... and then Open file ml1520pt.ppd, remeber you need to download it from SpliX website.

Address:, Protocol and Queue: are specific in my setup because I am using it with 652 WLXKJ USB network server.

When all is finished, Samsung ML-1520 is working fine.

Samsung ML-1520 working fine

This is why I do not like upgrades.

Hopefully, this will be helpful to somebody.

Always start with simple solution (ConfigParser vs. JSON for Python configuration file)

Published on: 01.01.2018

Conclusion

Sometimes perfect is the enemy of the good.

Why use configuration file

I had Python program, that needed to access device via IP address and to do some diagnostic and commands on it.

At that time, I was accessing only one device (only one IP address).

But could see that in future (in few months to one year), I will need to do the same set of command on more devices.

One solution is to add IP address as a parameter to CLI program.

In my use case number of IP address that needs to be accessed will never be bigger than 34.

And writing 34 IP addresses as CLI parameter, that is around 373 letters, is not a nice solution.

When you need to read it, to see are all IP addressee included, it is not easy to read.

Python code as configuration was not possible

I was distributing my Python code as EXE, so use of Python code as configuration was not possible.

Altho, I think that Python code as the configuration is a good solution if you are executing source code, and only developer (not average user who does not know what Notepad is) will edit it.

Sometimes perfect is enemy of good

Sometimes I do have a problem that I tend to unnecessarily complicate things.

Because I think about all possible edge cases and all possible future uses.

And from these two, all possible future uses are the real problem.

Edge cases can happen, but is there a positive cost benefit to solve them and how?

This need to be determined on a case by case basis.

But “all possible future use” is trying to anticipate the future.

That is impossible.

From my experience, whenever I add code for future use cases, usually it was waste of time.

Even when I am the only user, so I can argue to my self that I know will need it.

Usually, I do not need it.

So, I have decided to eliminate waste when developing software, starting from this project.

For example, here I needed a configuration file that will have a list of IP addresses, which I will iterate in for loop.

This was the smallest requirement that I needed for my problem.

But immediately I was thinking that it would be nice to have:

  • checking if IP address is in the valid format
  • if I have 5 address that I need to write all 5, but I could add special syntax for that. Like 192.168.1.1 --- 192.168.1.5, and this can really get more complicated if you want to cover all edge cases.
  • I only need IPv4, but can I also ad IPv6
  • if there is some error in the configuration file, there can be a useful error message and suggested a solution
  • another configuration file can be added where I would define what are valid IP addresses, and then I can validate requested IP against valid IP

And the list can go on.

But I said to myself, NO.

You will just make code that can read a list of items as a list in Python from the configuration file.

Nothing more.

If when you have the real need, you will add additional features when needed.

When I think about this, maybe this is an example of premature design?

Premature design is deciding too early what a program should do.

Why JSON is better than Configparser for list datatype

After investigating possible solution, decision was to try ConfigParser.

ConfigParser configuration file was:

Code was:

I found few problems with ConfigParser:

  1. It did not have build in functionality to read data as Python List
    • so even for this simple example I had to write custom code (this is why I have get_as_list() function)
  2. There were no rules in format, eg. "192.168.1.36" and 192.168.1.36 was both valid.
    • this was not big problem, but I just did not like it

Let try JSON (as configuration files in Python)

JSON configuration file was:

Code was:

There is much less code, and line count is important.

One thing that I did not like about JSON is "ValueError: No JSON object could be decoded", this is error message that you get if you JSON is invalid.

I was hoping to get some more details, eg. what token in which line.

But you can not have it all, and ConfigParser was no bettter.

Decision was to use JSON as configuration file for Python, because code was less complicated (less lines of code).

I did not tried YAML.

Syntax, at least to me, is less readable than JSON.

Questions and remarks please leave in comments.

Automatic backup of git repositories to Dropbox with Python

Published on: 01.12.2017

Intro

I will show how to upload files to Dropbox from Python code.

Why do I need this?

Currently, I am only using WebFaction for all my web services and also as my private git server.

I wanted to make an automatic backup of my git repositories to Dropbox.

Dropbox App

I order to upload files to Dropbox you need to have an access token.

And for the access token, you need to register your app on DBX platform.

All of this must be done on Dropbox website.

The first step is to go to https://www.dropbox.com/developers/apps/ and press “Create App” button.

Step 1

Just click

Just click “Create app” button

Step 2

New app on DBX Platform

We will use Dropbox API.

We will choose “App folder” because we will just upload one backup to Dropbox, we do not need full access to all our files.

Name your app and click “Create App” button.

Step 3

Use defaults settings

We will use defaults settings, here we will get the access token so, click “Generate access token” button.

Step 4

Access token generated

Now you have your access token, you will need it in your code, so copy it.

Step 5

Dropbox app

Now we have our “my_git_backup” Dropbox app.

pip install

It is always recommended to use virtual environments inside python.

At least I use them always.

Code

I am using fabric to make my life(code) easier.

I use fabric every time when I am calling CLI command from Python.

I will explain NUMBER_OF_BACKUP_TO_KEEP later, I use it at the end of the program.

All code that follows is inside with lcd(remote_directory): Python context manager.

The context manager is used so all code that follows is executed inside remote_directory directory.

Name of backup file will be YYYYMMDD_HHMM_git_backup.zip where upper case letters are date and time when a program was executed.

Eg. 20171121_1856_git_backup.zip so that we know from when is this backup file.

For making an actual backup, zip CLI command is used, we are only doing the backup of files that end on *.git (in my case only git repositories).

I also have LAME_PASSWORD for basic protection.

This is why I used fabric, just by calling local() function you can execute CLI commands.

The first line is the opening connection to your Dropbox application, you need to add your own access token as an argument.

Next two lines are for upload, you are: opening file, reading it and uploading bytes to Dropbox.

In Dropbox documentation is mention that this is only working for files till 150MB in size.

With last line program is deleting the local backup.

First for loop is getting all files from your backup folder in a list.

Second for loop is deleting all files except, last few files.

How many files to keep (otherwise we need manually to delete old backup files) is define in NUMBER_OF_BACKUP_TO_KEEP from the beginning of the code.

I keep it at 10, more than that I do not need.

Because we have date and time in our filename we can use Python sort function to sort files by when the backup was done.

The program can be run with
fab -f fabfile git_backup_to_dropbox

First is fab because we used fabric, fabfile because fabfile.py is file of our source code and git_backup_to_dropboxis name of the function that we are executing from fabfile.py file.

How I run this automaticaly

I personally run this command from crontab once per day.
35 02 * * * /home/user_name/code/venv/bin/fab -f /home/user_name/code/fabfile git_backup_to_dropbox

Conclusion

This can be used for backup of any folder as zip file automaticaly to Dropbox.

For any questions, please write them in comments.