2016-01-03

The Road To Rethink

In The Beginning.

Learning a new language has its ups and its downs, but one thing that always holds true is, once you understand it fully you feel so accomplished. The languages I am referring to are formal languages, languages like Python. This is the language I chose to learn when I started my journey through computer science. This type of journey has no real end as there is always something new to learn, albeit new features to the language of choice, or an entirely new language altogether. This realization of, no end, came to me while I was working on a project. The focus of this project was to analyze data gathered through twitter. In order to be able to analyze this data you have to store it somewhere, this is where databases come into play. There are many types of databases and the most common one to start out with when working with Python is SQLite. It is built into the language itself, which, makes it readily available to all Python programmers. SQLite uses the SQL query language to perform what is known as the CRUD methods. These are Create, Read, Update, Delete, and they are all essential when working with any database. Once you start to research and find new databases to work with, you will notice that most of them use a language that seemingly stems from SQL. This is often the case. Although, it is not SQL per se, the syntax is fairly similar. This is because, as every programmer knows, all databases use the same types of functions. However, how they all store data is different, and that is why I moved onto a new database.

Why RethinkDB?

This database was RethinkDB and it is open source and something referred to as NoSQL which means that it stores data in document form, usually JSON, or Javascript Object Notation. By storing the data using JSON you make it easier to retrieve. In Python one can use the syntax you use to work with dictionaries to retrieve the information. When it came time to incorporate this into my program I had a bit of a hard time as RethinkDB was different to setup and connect. I wound up having to write a function(EX.1) that connects to the database and makes sure the database and table exist, and if they do not, it creates both of them.

EX.1

def database_connect():

    '''Creates the database and the table if it doesn't already exist'''

    db_name = 'test'# Enter name of your database

    table_name = 'chat_test_1'# Enter name of your table

    conn = r.connect('localhost', 28015)

    try:

        try:

            r.db_create(db_name).run(conn)

     r.db(db_name).table_create(table_name).run(conn)

     print("Database setup completed!")

        except RqlRuntimeError:

     r.db(db_name).table_create(table_name).run(conn)

     print("Database already exists!")

    except:

        print("Database and table exist!")

    finally:

        conn.close()

However, as far as inserting information was concerned, I had to change very little code, but I was also able to save all of the data that twitter allowed you to see when grabbing tweets. I was able to do this because twitter sends the information in JSON format. RethinkDB, as I’ve already mentioned, stores the information in the database as a JSON object and since the data was already presented in JSON format all that was needed was a simple insert command. On the other hand, when storing information in the SQLite database I had to create columns in tables and break all the information up so that it went to the correct column. In short, it was much more tedious trying to store Twitter data in SQLite as compared to RethinkDB. RethinkDB uses a language the creators like to call ReQL, and the syntax for ReQL is similar to SQL. So changing from SQLite to RethinkDB was even easier knowing that much of the syntax would not be changing at all. That was a big plus in my eyes. One thing RethinkDB does, that I did not include in my program as of yet, is it allows you to get updates in real time. For example, if you are following a specific hashtag, or a specific user, you could use what the creators call a changefeed(RethinkDB, 2015). What that will do is send you an alert with what changes happened to the database, so in this case you would see the new tweet. This is very useful for those who want to perform real time analytics on their data sets.

Conclusion

So all in all you can see why going with RethinkDB was the better choice. SQLite is good to get you started, if you have never worked with databases before, but picking up RethinkDB is not that difficult. It also will provide the user with more room for growth because now you can update your application or program so that it performs real time analysis, or updates your site, whenever a change is detected in your database. As for my project, I know I have room to grow, and I fully intend to use everything RethinkDB has to offer.

Works Cited

RethinkDB. (2015). Changefeeds in RethinkDB. Retrieved from https://rethinkdb.com/docs/changefeeds/python/