Semi-Structured Data
Semi-structured data lies between structured and unstructured data. Data that get stored in the traditional database system or excel sheet can be denoted as structured data and organized in COLUMNS and ROWS. Unstructured data can be considered as any data or piece of information which can’t be stored in Databases/RDBMS etc. Email, Facebook comments, news paper etc. are the examples of unstructured data.
Semi-structured data do not follow strict data model structure and neither raw data nor typed data in a traditional database system. To represent information as semi-structured data, certain format has to be followed. We can use JSON (JavaScript Object Notation), XML format as well as to transport over wire. Specific parser is mandatory to retrieve desire data from JSON or XML at the data consumer end.
JSON is light weight and efficient compare to XML and easily human readable but we can’t store/persist or query from traditional database system. NoSQL databases like HBase, MongoDb, Cassandra, Hadoop distributed file system (HDFS) etc can be leveraged to store, query, analyze etc . In a typical client-server web application, JSON format widely used for bi-directional data interchange.
Here is the sample unstructured data ” The two company named ABCD and EFGH are located in Bangalore and Chennai respectively. ABCD is a pharmaceutical company and have 150 employees. They are into medical drugs supplier and associated with HDFC bank for all business transaction. Company EFGH is into manufacturing of PVC pipes and have 300 employs and doing financial transaction with State Bank Of India “. Above information or data can be transformed into semi-structured data using JSON format. Also possible to persist in NoSQL Database and transmit over wire as REST service request/response.
[
{
“CompanyName”: “ABCD”,
“Description”: “pharmaceutical company”,
“Type” : “Medical drugs supplier”,
“EmployNo”: “150”,
“BusineesBank”: “HDFC Bank”,
“Location” : “Bangalore”
},
{
“CompanyName”: “EFGH”,
“Description”: “Manufacturing company”,
“Type” : “PVC Pipes”,
“EmployNo”: “300”,
“BusineesBank”: “State Bank Of India”,
“Location” : “Chennai”
}
]
Facebook graph API provides semi-structured data in JSON format when we query from a specific node using GET method in REST service.