Essay from Oʻrinboyeva Ziynatjon

The most widely used big data and database concepts in the current era attract everyone’s attention. There are five main features that distinguish Big Data from the database. First of all, the volume of data is much larger than the database, that is, what was considered 100 gigabytes in the initial period is now measured in terabytes and petabytes. Secondly, each piece of information must have a certain value. That is, it is important that each piece of collected data has certain values. Third, ensuring that data in large databases is accurate and reliable, and that each piece of data is accurate and of high quality, is one of the most important issues.

Fourth, it is important that the data in a large database does not consist of only one type. For example, only relational tables are used in the database. Big data includes text, audio, video, and sensor data. Fifth, and most importantly, the speed of the data, that is, each piece of data must be created at high speed. And this is analyzed in real time in a large database. So, big data is any constantly changing set of data collected from any large-scale sources. Large amounts of data are usually measured in exabytes, terabytes, and petabytes.

Everyone is wondering how this term came about. The term “big data” was popularized in 2008 by Nature editor Clifford Lynch in an article about the rapid growth of data. The term big data emerged in 2008, but before that, 5 exabytes of data were collected by 2003, according to IDC. By 2025, this figure will reach 185 exabytes. 1 exabyte = 1 billion gigabytes. It is clear that the rapid increase in users on social media, the use of artificial intelligence in the economy and banking sectors, and the digitization of every industry are leading to an increase in the size of the database.

The most important thing is that large amounts of data are not just collected, but processed. And in the case of the above-mentioned features, it is processed. It is important to ensure that any information is reliable, and that the collected data retains its value for later use. We mainly use Apache and NoSQL systems for big data processing. Apache processes large amounts of data very quickly, allowing for real-time analysis in a short period of time. Essentially, Apache Spark processes data in RAM, not on disk. The ML library is also available in Apache Spark, which we need the most. NoSQL is a state-of-the-art database created to store large amounts of data as well as data in a variety of formats (videos,images,audios,sensors).Unlike SQL we used in the database itself.

Because the database is mainly relational tables, it is convenient to use SQL. NoSQL can also contain data in different formats. The term big data can be used in conjunction with the term machine learning. As the name suggests, machine learning is when a machine learns from data. ML collects data in real time and clears unnecessary data from memory. There are several ML algorithms in big data analysis. In this case, each algorithm performs a specific task. Regression algorithms are mainly used to forecast market prices and demand.

Artificial neural networks analyze complex data based on artificial intelligence. In conclusion, the use of ML in large data gives us several advantages and disadvantages: it analyzes large data quickly, provides transparency, performs decision-making processes without human intervention, is constantly updated, further increases the efficiency of work in Banking, Economics, e-commerce and medicine. One disadvantage is that each algorithm requires large volumes of data and powerful servers. So, when we use any systems, we need to thoroughly study them and be able to correctly use large amounts of data if we make the right decisions.

Oʻrinboyeva Ziynatjon, Uzbekistan 

Leave a Reply

Your email address will not be published. Required fields are marked *