Posts

REST - Representational State Transfer

Image
Representational State Transfer Year 1996 Berners-Lee writes that the "Web's major goal was to be a shared information space through which people and machines could communicate." Year 2000 Hypermedia was chosen as the user interface because of its simplicity and generality. Hypermedia, an extension of the term called hypertext, is a non-linear medium of information that includes graphics, audio, video, plain text, and hyperlinks. A non-linear medium is any medium that can be navigated through random access.
The rapid growth of the Internet and the consequently deployed architecture had significant limitations in its support for extensibility, shared caching, and intermediaries, which made it difficult to develop ad-hoc solutions to the growing problems.
The challenge was to introduce a new set of functionality to an architecture that was already widely deployed, and how to ensure that its introduction does not adversely impact, or even destroy, the architectural properties t…

Software Architecture

Image
This post is an abstract of some chapters in the book 'Software Systems Architecture' by Eoin Woods and Nick Rozanski
Software Architecture Definition Software elements that you need to specify and or design in order to meet a particular set of requirements, plus the hardware required to run those software elements on
Key Parts of the definitionStructure - System’s elements, pieces that can be constructed, and their relationshipsStatic structureSoftware classes, Relational entities, Network, Hardware etcDynamic structureSystem response to an external stimulusInformation flow, parallel/serial execution of tasks, effects on data (create, update, delete)Properties - Fundamental properties of a systemExternally visible propertiesFunctional behaviorQuality propertiesScalability, Performance, Security etcPrinciples - of its design and evolutionFundamental beliefs, approach or intent - that guides the architectureConventions that makes system easily understood and allow extensions i…

Some simple questions that may need some thinking

Image
In this blog post, I have put down some questions that are kind of random questions but they are important questions whenever you develop any software. The answers to these questions depend on the context and require a lot of experience and knowledge to make a good judgment.

Where should I store media images for my web application? Inside web applicationOn a web server In a database On a cloud Where should I write an application log messages?In a local file SyslogRDBMSNoSQL databaseWhere should a client store authentication token?CookiesLocal StorageWhat kind of authentication mechanism should I use for my web application?Stateless session tokensSession IdsWhat should be the format of log file messages?Free text string formatKey-Value Pairs string formatJSONWhat should I use for notifying a service for some action? A message queueA table in a shared databaseCan I use a single load balancer to handle the load of hundreds of application servers behind it?For my new project which database…

Big Data, Streaming Data - ETL Analytics Pipeline

Image

Apache Hadoop Ecosystem

Image
Hadoop HDFS - 2007 - A distributed file system for reliably storing huge amounts of unstructured, semi-structured and structured data in the form of files. Hadoop MapReduce - 2007 - A distributed algorithm framework for the parallel processing of large datasets on HDFS filesystem. It runs on Hadoop cluster but also supports other database formats like Cassandra and HBase. Cassandra - 2008 - A key-value pair NoSQL database, with column family data representation and asynchronous masterless replication. HBase - 2008 - A key-value pair NoSQL database, with column family data representation, with master-slave replication. It uses HDFS as underlying storage. Zookeeper - 2008 - A distributed coordination service for distributed applications. It is based on Paxos algorithm variant called Zab. Pig - 2009 - Pig is a scripting interface over MapReduce for developers who prefer scripting interface over native Java MapReduce programming. Hive - 2009 - Hive is a SQL interface over MapReduce for de…

Big Data After The Internet

Image
Till 1995 most of the people did not know about the internet. It was hard to use, till the Netscape browser arrived and its famous IPO happened. The arrival of Netscape meant anyone could create material and anyone with a connection could view it.
Internet's popularity resulted in mushrooming of websites like AOL, MSN, Yahoo, CNN, Napster and so many more. They provided free information sharing services like emails, chats, photograph sharing, video sharing, blogging, news, weather, music, games etc. These sites were generating, collecting and sharing an enormous amount of data, for the people all over the globe. There were, of course, new generation e-commerce companies like Amazon and eBay that also contributed to the overall information available, but sharing of information was not at the core of their strategy.  
Why this phenomenon of information sharing noteworthy? There are two good reasons: 
The data on the Internet was freely available to everyone on the Internet. It was no l…

Big Data Before The Internet

Image
The term ‘Big Data’ was used for the first time in a scientific journal published by NASA, back in 1997 “Visualisation provides an interesting challenge for computer systems: data sets are generally quite large, taxing the capacities of main memory, local disk, and even remote disk. We call this the problem of big data. When data sets do not fit in main memory (in core), or when they do not fit even on local disk, the most common solution is to acquire more resources.” [1][2]. This was ‘Big Data’ in 1997 which is different from ‘Big Data’ of current times, but the fundamental problem of our ability to scale remains the same. Conceptually, ‘Big Data’ is the data that is beyond the storage and processing power of current systems. It is a moving target. The purpose of this post is to go over this particular aspect of data engineering called Big Data, focusing on big architectural improvements in data management systems.
The genesis of Internet dates back as early as the 1960s - 1970s in t…