Information Visualization: Interpretations and Stories around them.


Nine shared this great presentation from Gurman titled:

When Statistics become stories

It was part of her talk given at DesignUp 2019. In one slide she talked about irregular age spikes we have around multiple of 10s.

I am thinking of creating an exercise around this for the AI-ML workshop to be conducted later this month at NID Gandhinagar for New Media Design Students.

At this stage of workshop, we would have covered basic concepts around programming and Jupyter notebooks.

Section One - Introducing Pandas

I have got French population and age distribution from here, and we have cleaned it to following structure:

Out[115]: 
   year   males  females   total  age
0  2018  364155   347749  711904    0
1  2017  370453   355472  725925    1
2  2016  378518   363162  741680    2
3  2015  387906   372402  760308    3
4  2014  399232   387042  786274    4

We would start with loading this data and introduce concepts of:

  1. Reading the data(in this case from csv file using, read_csv).
  2. Exploring the structure of data(DataFrame), accessing it, using Rows, Columns.
  3. Try basic operations over the data to answer some questions, like, for which age spectrum, male population is more than females and vice versa.
  4. We would explore the concept of using ? for getting access to documentation of the method/attribute.

Section Two - Plotting the data

After having played around with the data and different methods we would shift to plotting it and try to see if we can answer questions we had explored in previous section using the plots.

I am thinking of introducing them to plotting Pie Charts, Bar graphs, Lines. Age distribution of country is generally represented in Population Pyramid, here we would try to plot the same Pyramid for French population.

Section three - Exercise for students.

A similar UK age distribution of the population is available here. We would apply things we have learned in above two sections and ask the students to plot Population Pyramid for UK.

Section four - Census and Age distribution of Indian population:

Akash Gutha has a repository and a IPython notebook that:

  1. Fetches relevant data(excel sheet) from Indian Census site.
  2. Cleans up the data and assign names to the columns, and related plots.

We would work on top of those steps to:

  1. cover how Census releases data and an accompanying guide that helps people make sense of it.
  2. Plot Population Pyramid graph for India.
  3. Observe the difference between population distribution for India and UK/France.
  4. Also have an open discussion around the spikes for certain age.
  5. Share the screenshots from Gurman's presentation that explains the spikes.

At this point we conclude the session around handling data, information visualization. Possibly we will follow it with more hands on exercise for students.

Setting up an environment for a workshop based on Python.


I distinctly remember, while working at FOSSEE back in 2009-10, when we would conduct hands on workshop in the labs of various institutes, we would factor in significant time to reach early and setup all the dependencies on the lab computers. Back then we would use Enthought's binaries for Windows system to install everything. If we were lucky we would also find Linux machines in the lab and that would help a lot as we were really comfortable with installing the requirements using a CLI.

Recently we scheduled an AI/ML workshop for New Media Design students at NID Gandhinagar. While preparing for it I was looking for resources. I knew about Project Jupyter and IPython notebooks but my understanding of them was very limited.

I found that JupyterHub is brilliant project in terms of setting up the complete environment and sharing the resources with all the students. Their offering of the-littlest-jupyterhub which is targeted for 1-100 users hosted on single server is perfect. However it does need sudo and root privileges to segregate user environments. If at NID campus we get access to a server, I will try and see if I can set it up.

Otherwise, I also came across Colab from google, that comes with all dependencies, libraries installed to be used and shared with the students. It looks really promising. I will try to put together some notebooks and exercises around the concepts we would be covering and see how both these solutions fare.

But compared to the manual setup we used to do back then, this looks like a cakewalk.

Communications


I recently got into a tense conversation with a friend. We were talking about education and I was briefing him about some popular steps a particular government was taking. During that conversation, I think my friend was trying to make the case that the things I was mentioning weren't directly related to improving the quality of education or for the students and teachers. He was right. But at that time, I didn't realize that and got defensive in a way that derailed the whole conversation.

Lately, I have noticed that many times I don't completely understand what's being said and I end up interrupting the conversation. Things escalate from there. It is uncomfortable, tense, exhausting, tiresome and worst of all, the topic of conversation gets sidelined. Furthermore, even from my side, when I am trying to express myself, often I would use the wrong word. I think my communication skills need lot more work, practice and I have to be more mindful about it.

This is one reason I like these writing-club sessions. Writing is a good exercise, it clears out the noise and makes you more focused. I have been slacking lately on these sessions but I will try to improve on that front too.

SystemD Dependency Tree


At Senic, we have shifted to systemd for managing many independent application we have running on the Hub. Earlier we were using supervisord and for bunch of reasons(limit dependency, system supported solution etc). systemd provides many strong features, thing like:

uses socket and D-Bus activation for starting services, offers on-demand starting of daemons, keeps track of processes using Linux control groups, maintains mount and automount points, and implements an elaborate transactional dependency-based service control logic

We have put together different service files that starts applications as Hub boots. Some of these services have hard dependencies on others, meaning if parent service is not running, child service won't start/run. For example if we have an application which is making network request, in some scenarios it will help if that service is dependent on NetworkManager service which manages Network interfaces(or other native service which handles network connections).

This dependency tree has both benefits and issues. For us, some of the services(parent service), initializes DBus Objects. And child services connects or subscribe to these Objects, that enables DBus communication between separate applications. Now if Parent service dies(SIGTERM), child service can't continue and needs to stop. Here the systemd dependency tree takes care of this for us, it stops all dependent services if parent stops.

But in situation where parent service restarts, I would say, my understanding of systemd fails me. systemd correctly stops all the child services but it doesn't restart them once parent service starts again. I am not sure which dependency construct to use that (Before, After etc) make sure that once parent service restarts, all child process also restart.

All the services have a Restart clause to make sure that service restarts. But restart only happens in some certain scenarios. If a service is stopped using command systemctl stop service-name.service, systemd won't start the service again. And I think this is how child service gets stopped when parent service restarts and hence they don't restart. Maybe.

Working in someone else's kitchen


Yesterday I was pairing remotely with one of my colleague. He hosted a tmate session for me on his system. His preference of editor is Vim and I use Emacs. We were discussing some ideas on functions and what they would do and taking turn on writing the code. I know little bit of Vim, but my muscle memories are not tuned for Vim as much as they are for Emacs. So it took a while for me, I asked some silly questions on how he was doing certain things and it was nice how he was comfortably using the interface.

Today morning as I was preparing breakfast and looking for the tools in the kitchen it reminded me of yesterday's pairing session. In kitchen its the food and code in case of work. Just the tools are placed in different location and there are other ways of preparing things.

Both these exercise brings you out of your comfort zone. The keybindings for saving, editing, navigating are different in the editor. In kitchen, spices are in different box, the box itself is placed in different location, they grate the ginger instead of crushing it. It makes you more alert and self aware.

SoFee 2.0


From past few months I have not published publicly, but in drafts I was writing a small story. It took long to put together everything, and I now have a rough draft of story in place. But it is still not finished finished. I think it is same as personal software projects, experiments, never ending. There is always something to improve, fix, refactor/rewrite.

On that idea of on-going projects I will be picking up SoFee. With same features which I was aiming with first version. I want to make them modular which can work with each other on need to need basis.

I was also thinking of using Clojure this time. On that Punchagan correctly reminded me how fixing that could be counter productive to the project. In my first attempt for SoFee, I had decided to use python3 and many of web-page parsing libraries were still using python2. I spent long time to port abandoned archiving library of Warc to python3 and in the end that feature was not even shipped. So despite the temptation, as Punchagan suggested, it is best to look for best library available irrespective of language and put together a minimum feature, BUT complete module which can:

  1. Archive a link locally.
  2. Revisit those archives without an internet connection.
  3. Index the archives, make them searchable.
  4. Possibly a command line utility which can be extended with REST endpoints.

After that, I will pick up remaining features and try to build this, block by block.

One block at a time.

Youtube: a (healthy?)supplement for study material


In the past, I have visited NID to help out students with their Diploma projects. At times some students will show up with a buggy code whose source would be a youtube video. It always got me baffled. Students were trying to short the learning and they were writing code by pausing a video at a certain time and hoping that, that code would work for them just like it is shown in the video. There are so so so many issues with youtube as a reference for learning material. Something like:

  1. With programming, it can be particularly hard to cite and use the exact time-frame as reference. The video content and voice are not indexed. One can't find its way back to the video without a proper key word which is often limited.
  2. Hard to the reproduce environment which the presenter is using in the video. Often exact versions, OS details or other needed setup are missing from the 'details' section.
  3. Videos can get blocked for random copyright violations or even get taken down.

Having said that, recently, I got myself enrolled in an LLB course in a local government college. The college is pretty relaxed about the schedule, curriculum, assignment, and time-commitments. Regular classes happen but students are not "required" to attend them. There were very few basic requirements or assignments. I myself attended 4 class in all and one among them was right before the day exam was scheduled. There was some confusion around the optional subject and the class was meant for giving a rough overview of the syllabus. The downside of this approach was to self-study most of the things and score basic passing marks(half-assing a profession degree). I was mostly relying on last few years solved question paper to get an understanding of the subjects. Some teachers also helped out in the process. But, I think the most crucial part of the preparation was youtube videos, made by some random teachers, students, and professionals. They had recorded many small videos covering basics of certain Acts, where they applied, relevant cases and exceptions.

Listing down some references which were helpful(Shoutouts):

While these videos were really helpful, I still think for serious learning they are not good material compared to books, bare-acts, manuals. Such content can be referred back and revisited quickly.

Last minute...


Yesterday night I had a train to catch for Delhi. Train's departure time was ~10PM and that late, its hard to get autos to the station in Bikaner. So I reached early, almost an hour before. Generally I have a SMS from IRCTC(Indian online railways booking system) as handy reference for seat details and as a ticket. But the system can be unreliable at times and I didn't get the SMS when I booked my tickets. I remembered the coach and seat number, so I reached in front of my coach and kept goofing around, chatting with friends. The coach was still locked from inside, ground staff generally opens them around half an hour before the scheduled departure. Roughly around 9:45PM they opened it and I boarded the train to put my luggage. Whole coach was empty. As I took my seat, got comfortable, I thought, lets get ticket SMS, I will need it to show to the Ticket Checker.

I logged into the IRCTC, went to my "Booked tickets History" section, to the ticket, "GET SMS", and click. I looked at mobile in the anticipation of notification and soon there was, "Tingggggggggg". I checked message summary on phone and noticed DEE-BKN instead of expected BKN-DEE. I was like, "Hain?". I confirmed, in SMS, on IRCTC platform and I looked around at empty bogy, realising, shit, I booked wrong ticket. Time get a ticket in "Current booking", I was already logged in. I tried to book ticket for train, but system wasn't allowing to book online. I will now have to rush to ticket counter and get a ticket from there. Picked up my luggage, rushed to counter on the platform and person sitting there said, you can't get reservation ticket from here, you can get it only from the counter on first platform and hurry, even that counter closes at 10PM. I rushed, to first platform and one of its counter where the person was playing PubG on his mobile. I said, I want ticket for Delhi and he replied no tickets are available. I confirmed again, in all the classes? He replied seats are available only in 2nd AC. I knew he was lying but I had no choice, I said, please, can you give me one? He handed me a form to fill. I started doing that and I also tried to cajole him to start process to speed things up but he didn't budge. Eventually I gave him the form, money, exact change and he gave me the ticket and I walked out at 9:57 PM with a confirmed ticket for Delhi. Phew.

Unittest objects available on DBus: part 2


This is a follow up of my previous post. Basically I am learning things about mock library and DBus as we are removing technical debt, addressing our pain points and making product stable.

Partial mock DBus to unittest Object exported over DBus

last time I wrote about the pain to test object available on DBus. I got it reviewed with my teammates. As I paired with one, he agreed that with such a setup, these tests are not unittest but more like system or integration test.

First thing, let me share how I am exporting Object on dbus:

#!/usr/bin/env python3

import dbus
import dbus.service
import json

INTERFACE_NAME = "org.example.ControlInterface"

class ControlObject(dbus.service.Object):
    def __init__(self, bus_name, conf):
	super().__init__(bus_name, "/org/example/ControlObject")
	self._conf = conf
	print("Started service")

    @dbus.service.method(INTERFACE_NAME,
			 in_signature='s', out_signature='b')
    def Update(self, conf):
	try:
	    self._conf = json.loads(conf)
	except json.JSONDecodeError:
	    print('Could not parse json')
	    raise

    @dbus.service.method(INTERFACE_NAME,
			 in_signature='s', 
			 out_signature='s')
    def Get(self, key=''):
	if key == '':
	    # we can have empty strings as key to dict in python
	    raise KeyError
	try:
	    components = _conf[key]
	except (KeyError, TypeError):
	    raise
	else:
	    try:
		component = next(c for c in components if c['id'] == component_id)
	    except StopIteration:
		raise
	    else:
		return json.dump(component)

This Object takes a bus_name argument for initializing:

try:
    bus_name = dbus.service.BusName("org.example.ControlInterface",
				    bus=dbus.SystemBus(),
				    do_not_queue=True)
except dbus.exceptions.NameExistsException:
    logger.info("BusName is already used by some different service")
else:
    ControlObject(bus_name, {})

This way of setting up things coupled my Object to DBus setup tightly. I have to pass this bus_name as argument. As I was getting it reviewed with another of my colleague, he mentioned that I should be able to patch dbus and possibly get around with way I was setting up system with DBus, export object to it and then run test.

I had used partial mocking using with construct of mock, I put together following test using it:

from unittest import mock
from unittest import TestCase
import json
from service import ControlObject

class TestService(TestCase):
    def setUp(self):
	with mock.patch('service.dbus.SystemBus') as MockBus:
	    self.obj = ControlObject(MockBus, {})

    def tearDown(self):
	del self.obj

    def test_object_blank_state(self):
	self.assertFalse(self.obj._conf)

    def test_object_update_random_string(self):
	exception_string = 'Expecting value'
	with self.assertRaises(json.JSONDecodeError) as context:
	    self.assertFalse(self.obj.Update(''))
	self.assertIn(exception_string, context.exception.msg)
	self.assertFalse(self.obj._conf)

	with self.assertRaises(json.JSONDecodeError) as context:
	    self.assertFalse(self.obj.Update('random string'))
	self.assertIn(exception_string, context.exception.msg)
	self.assertFalse(self.obj._conf)

    def test_object_update(self):
	conf = {'id': 'id',
		'name': 'name',
		'station': 'station'}
	self.obj.Update(json.dumps(conf))

	self.assertTrue(self.obj._conf)
	for key in conf:
	    self.assertTrue(key in self.obj._conf)
	    self.assertEqual(conf[key], self.obj._conf[key])

This test worked directly and there was no need to, setup dbus-session on docker image, run a process where I export the object and call methods over DBus. Instead now, I can directly access all attributes and methods of the Object and write better tests.

$ python -m unittest -v test_service.py
test_object_blank_state (test_service.TestService) ... ok
test_object_update (test_service.TestService) ... ok
test_object_update_random_string (test_service.TestService) ... Could not parse json
Could not parse json
ok

----------------------------------------------------------------------
Ran 3 tests in 0.002s

OK

Another way(Better?) to export Object over DBus

While writing this post and exploring examples from dbus-python I found another way to write class which can be exported over DBus:

class ControlObject(dbus.service.Object):
    def __init__(self, conf, *args, **kwargs):
	super().__init__(*args, **kwargs)
	self._conf = conf
	print("Started service")

Now we don't even need to claim BusName and pass it as argument to this class. Instead we can make this Object available on DBus by:

system_bus = dbus.SystemBus()
bus_name = dbus.service.BusName('org.example.ControlObject',
				system_bus)
ComponentsService({}, system_bus, '/org/example/ControlObject')

And when we don't want to use DBus and just create instance of this Object we can directly do that also, by calling ComponentsService({}) directly. With this way of initializing, we don't need to partial mock DBus and write unittest directly.

Unittest object/interface available on DBus


There is better way to do this.

This is in continuation of one of my old post about writing unittests and mocking. In this post we will cover three points:

Write unittest for a DBus Object

Ideally an object exposed over DBus, is a regular object which can be created, accessed and tested like any other normal class in python. We had created our object/interface based on examples in dbus-python and from this blog post series. There DBus is very inherently coupled with the class making it impossible to create independent object. Like for example, I am using special decorators to expose methods over DBus. Because of this limitation, while writing unittest for this class, we ran into unique situation where, DBus service has to be running while we test.

Furthermore, DBus objects needs an ever running eventloop over which they are made available. dbus-python uses GLib main-loop, so we need to figure out a way by which, when we run tests, we are able to start this eventloop, make our object available over it, and then run unittest against it. While looking for answer StackOverflow came to rescue and I came across this thread and one of participant whose answer/comment contributed to the final solution says:

This is easy, not hard. And you MUST do it. Don't ever skimp on unit test just because someone tells you to, and it is really depressing that people give this kind of advice. Yes you should test your stuff without dbus, and yes you should test it with dbus.

The solution is to starts an independent process where DBus object is initialized and connected to eventloop running in that process, before running the unittest. Here is sample code, similar to solution suggested in StackOverflow:

class TestServices(unittest.TestCase):
    @classmethod
    def setUpClass(cls):
	# we start eventloop and make our class available on it
	cls.p = subprocess.Popen(['python3', '-m', test_services', 'server'])
	# This was needed to wait for service to become available
	time.sleep(2)
	assert cls.p.stdout == None
	assert cls.p.stderr == None

    @classmethod
    def tearDownClass(cls):
	# This is needed to clean up event loop we started in
	# setUpClass
	os.kill(cls.p.pid, 15)

    def setUp(self):
	bus = dbus.SessionBus()
	handler = bus.get_object("example.org",
				 "/example/org/DemoService")

    def test_add_component_random_strings(self):
	success, message = self.handler.demo_method('random string')
	self.assertFalse(success)

if __name__ == '__main__':
    arg = ""
    if len(sys.argv) > 1:
	arg = sys.argv[1]
	if arg == "server":
	    loop = GLib.MainLoop()
	    DBusGMainLoop(set_as_default=True)
	    bus_name = dbus.service.BusName("example.org",
					    bus=dbus.SessionBus(),
					    do_not_queue=True)
	    DemoService(bus_name)
	    try:
		loop.run()
	    except KeyboardInterrupt:
		pass
	    loop.quit()
	else:
	    unittest.main()

Getting dbus running on Travis-CI

We have Travis-CI and docker setup for running tests. With docker as I tried to run tests, it failed with:

======================================================================
ERROR: test_add_component_random_strings (test_services.TestServices)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/app/test_services.py", line 135, in test_add_component_random_strings
    bus = dbus.SessionBus()
  File "/usr/lib/python3/dist-packages/dbus/_dbus.py", line 211, in __new__
    mainloop=mainloop)
  File "/usr/lib/python3/dist-packages/dbus/_dbus.py", line 100, in __new__
    bus = BusConnection.__new__(subclass, bus_type, mainloop=mainloop)
  File "/usr/lib/python3/dist-packages/dbus/bus.py", line 122, in __new__
    bus = cls._new_for_bus(address_or_type, mainloop=mainloop)
dbus.exceptions.DBusException: org.freedesktop.DBus.Error.NotSupported: Unable to autolaunch a dbus-daemon without a $DISPLAY for X11

We are exposing our object over SessionBus, so we have to start DBus session-bus on docker image to run our unittest. I looked for other python based repository on github using DBus and passing travis-ci tests. I came across this really well written project: pass_secret_service and in its Makefile we found our solution:

bash -c 'dbus-run-session -- python3 -m unittest -v'

dbus-run-session starts SessionBus on docker image and that's exactly what we wanted. Apart from this solution, this project had even more better and cleaner way to unittest, it has a decorator which takes care of exporting the object over DBus. So far, I wasn't able to get this solution working for me. The project uses pydbus instead of dbus-python, ideally I should be able to shift to it, but will have to try that.

Mock object which are accessible to DBus object

Generally when we need to mock a behaviour, we can use patch decorator from mock library and set relevant behaviour(attribute of return_value or side-effect). But given the peculiarity of above setup, tests are running in a different process. So mocking behaviour around the unittest won't work, because DBus object is in different process and it won't have access to these mocked objects. To get around this we will need to mock things just before we start the MainLoop and create DemoService object:

with mock.patch('hue.Bridge') as MockBridge:
    with mock.patch('configparser') as mock_config:
	with mock.patch('requests') as mock_requests:
	    MockBridge.return_value.get_light.return_value = lights_dict
	    loop = GLib.MainLoop()
	    DBusGMainLoop(set_as_default=True)
	    bus_name = dbus.service.BusName("example.org",
					    bus=dbus.SessionBus(),
					    do_not_queue=True)
	    DemoService(bus_name)
	    try:
		loop.run()
	    except KeyboardInterrupt:
		pass
	    loop.quit()