DBMS-NoSQL Database-5.3-NoSQL Database- Structured Data vs Semi-Structured Data-SPPU

 Main Memory Databases


A main memory database system is a DBMS that primarily relies on main memory for computer

data storage. In contrast, conventional database management systems typically employ hard disk

based persistent storage.

Advantages


The main advantage of MMDBMS over normal DBMS technology is superior performance, as I/O

cost is no more a performance cost factor. With I/O as main optimization focus eliminated, the

architecture of main memory database systems typically aims at optimizing CPU cost and CPU

cache usage, leading to different data layout strategies (avoiding complex tuple representations)

as well as indexing structures (e.g., B-trees with lower-fan-outs with nodes of one or a few CPU

cache lines).

While built on top of volatile storage, most MMDB products offer ACID properties, via the following

mechanisms: (i) Transaction Logging, which records changes to the database in a journal file and

facilitates automatic recovery of an in-memory...


ACID properties of Database


•Atomicity − This property states that a transaction must be treated as an atomic unit, that is, either all of its

operations are executed or none. There must be no state in a database where a transaction is left partially

completed. States should be defined either before the execution of the transaction or after the

execution/abortion/failure of the transaction.

•Consistency − The database must remain in a consistent state after any transaction. No transaction should have any

adverse effect on the data residing in the database. If the database was in a consistent state before the execution of

a transaction, it must remain consistent after the execution of the transaction as well.

•Durability − The database should be durable enough to hold all its latest updates even if the system fails or restarts.

If a transaction updates a chunk of data in a database and commits, then the database will hold the modified data.

If a transaction commits but the system fails before the data could be written on to the disk, then that data will be

updated once the system springs back into action.

•Isolation − In a database system where more than one transaction are being executed simultaneously and in

parallel, the property of isolation states that all the transactions will be carried out and executed as if it is the only

transaction in the system. No transaction will affect the existence of any other transaction.



Structured Data


What is Structured Data?

Structured data is information that is formatted and stored into a well-defined data model. The

raw data is mapped into predesigned fields that can then be extracted and read through SQL

easily.

SQL relational databases, consisting of tables with rows and columns, are the perfect example of

structured data.

Structured data is more inter-dependent and less flexible.



Semi-Structured Data


Semi-structured data is the data which does not conform to a data model but has some structure. It lacks a fixed or

rigid schema. It is the data that does not reside in a rational database but that has some organizational properties

that make it easier to analyze. With some process, we can store them in the relational database.

Characteristics of semi-structured Data:

•Data does not conform to a data model but has some structure.

•Data can not be stored in the form of rows and columns as in Databases

•Semi-structured data contains tags and elements (Metadata) which is used to group data and describe how the data

is stored

•Similar entities are grouped together and organized in a hierarchy


Evolution of Semi-Structured

Data


The increase in digitization of almost everything we interact with, along with multiple transactions

has resulted in a massive amount of data. The tremendous increase in the speed of digital

information has led the global data to double in very short time intervals. As per Gartner, around

80% of data with organization is unstructured data/semi-structured data which is comprised of

data from emails, social media feeds and customer calls.

This is in addition to information logged by the user devices. It has been increasingly tough to

make proper sense of this unstructured data.


Characteristics


•Data has some structure which, however, does not conform to the structure of a data model.

•A hierarchy is defined wherein all similar entities form a group, and such groups are organised into the hierarchy

of semi structured data examples.

•It is not storable as table columns and rows like data in a relational database.

•The data, which is semi-structured, has metadata/elements and tags to help group it and describe its storage.

•The attributes in any group of items typically are different.

•The group of entities in a group may not or may have the same properties and attributes.

•Semi structured data is hard to manage or automate as its metadata is insufficient and hence cannot be put into a

table with rows & columns.

•Programming such data is difficult as it lacks a sufficient defined structure.


Examples/Sources


•Sources of semi-structured Data:

• E-mails

• XML and other markup languages

• Binary executables

• TCP/IP packets

• Zipped files

• Integration of data from different sources

• Web pages


Advantages


•The data is not constrained by a fixed schema

•Flexible i.e Schema can be easily changed.

•Data is portable

•It is possible to view structured data as semi-structured data

•Its supports users who can not express their need in SQL

•It can deal easily with the heterogeneity of sources


Disadvantages


•Lack of fixed, rigid schema make it difficult in storage of the data

•Interpreting the relationship between data is difficult as there is no separation of the schema

and the data.

•Queries are less efficient as compared to structured data.



Nested Data Types


Nested data types are structured data types for some common data patterns. Nested data types

support structs, arrays, and maps.

A struct is similar to a relational table. It groups object properties together

As a general definition, nested data exists whenever multiple records are sampled from a single

record. The data then consists of two, with header information populated at Level 1 and details or

itemized information populated at Level 2


Nested Data Types – XML


•XML stands for eXtensible Markup Language

•XML is a markup language much like HTML

•XML was designed to store and transport data

•XML was designed to be self-descriptive

•XML is a W3C (World Wide Web Consortium) Recommendation


Difference between HTML and

XML


XML and HTML were designed with different goals:

•XML was designed to carry data - with focus on what data is

•HTML was designed to display data - with focus on how data looks

•XML tags are not predefined like HTML tags are


Advantages of XML


•It simplifies data sharing

•It simplifies data transport

•It simplifies platform changes

•It simplifies data availability

Many computer systems contain data in incompatible formats. Exchanging data between incompatible systems (or upgraded

systems) is a time-consuming task for web developers. Large amounts of data must be converted, and incompatible data is often

lost.

XML stores data in plain text format. This provides a software- and hardware-independent way of storing, transporting, and sharing

data.

XML also makes it easier to expand or upgrade to new operating systems, new applications, or new browsers, without losing data.

With XML, data can be available to all kinds of "reading machines" like people, computers, voice machines, news feeds, etc.


Examples of XML data


Nested Data Types – JSON


▪JSON stands for JavaScript Object Notation

▪JSON is a lightweight format for storing and transporting data

▪JSON is often used when data is sent from a server to a web page

▪JSON is "self-describing" and easy to understand


JSON Syntax


•Data is in name/value pairs

•Data is separated by commas

•Curly braces hold objects

•Square brackets hold arrays


The JSON format is syntactically identical to the code for creating JavaScript objects.

Because of this similarity, a JavaScript program can easily convert JSON data into native

JavaScript objects.


JSON Data


JSON Objects

JSON objects are written inside curly braces.

Just like in JavaScript, objects can contain multiple name/value pairs:

'{"name":"John", "age":30, "car":null}’

JSON Arrays

JSON arrays are written inside square brackets.

Just like in JavaScript, an array can contain objects:

"employees":[

{"firstName":"John", "lastName":"Doe"},

{"firstName":"Anna", "lastName":"Smith"},

{"firstName":"Peter", "lastName":"Jones"}

]


Jason Data Examples


Advantages of JSON


•Less Verbose: JSON has a more compact style than XML, and it is often more readable. The lightweight approach of

JSON can make significant improvements in RESTful APIs working with complex systems.

•Faster: The XML software parsing process can take a long time. One reason for this problem is the DOM

manipulation libraries that require more memory to handle large XML files. JSON uses less data overall, so you

reduce the cost and increase the parsing speed.

•Readable: The JSON structure is straightforward and readable. You have an easier time mapping to domain objects,

no matter what programming language you're working with.

•Structure Matches the Data: JSON uses a map data structure rather than XML's tree. In some situations, key/value

pairs can limit what you can do, but you get a predictable and easy-to-understand data model.

•Objects Align in Code: JSON objects and code objects match, which is beneficial when quickly creating domain

objects in dynamic languages.


Semantic Databases


• Semantics is the study of meaning

• It focuses on the relationship between:

• Signifiers: words, phrases, signs and symbols

• Denotation: what they stand for


• Semantic Database is typically used in conjunction with the Semantic Data Model

• By exposing the semantics of the data, machines can then utilize the information in more interesting

ways than just storing it or displaying it

◦ Semantics are useful for understanding the structure of a "thing", however we need ontologies to relate

things with other things. Therefore, one would expect that a semantic database be relational -- it should

be able to relate structured data into ontologies.


Semantic Databases


In a semantic database (going back to the very early definition of semantics), the schema:

◦ describes denotations

◦ describes relationships between denotations

The job of the database then is to associate signifiers (values) to those denotations. Therefore:

◦ Structure resolves to concrete properties to which instance values can be associated


Semantic Data Model


Semantic data model (SDM) is a high-level semantics-based database description and structuring

formalism (database model) for databases. ... It is a conceptual data model in which semantic

information is included. This means that the model describes the meaning of its instances


A method of organizing data that reflects the basic meaning of data items and the relationships

among them. This organization makes it easier to develop application programs and to maintain

the consistency of data when it is updated.

Comments