Big Data: How the open-source software will open the doors for innovation in libraries

Authors

  • Zainab Abdullahid Salman

Keywords:

big data, university libraries, Hadoop platform, data processing, open source systems)

Abstract

Big Data, defined in this paper as the gathering and storage of information and analysis on a scale typically untenable for traditional, mass-market data-processing software, has previously been one of the biggest obstacles facing tech companies, startups, and analytic researchers. The ability to process such large data loads has been a significant barrier of entry to the market for many young companies or not for profit research organization, but recent open-source software, such as Hadoop, have removed those barriers. Hadoop, a programming framework that allows for large-scale data storage and processing, is free and available to all developers. This software allows independent developers,

The research aims to determine how to deal with a set of data whose size exceeds the ability of well-known database programs to capture, store, manage and analyze, which requires innovative and effective forms of processing that differ from ordinary data processing so that its users can improve vision and decision-making. The research sample is university theses and theses available digitally in PDF and Word format and available in the central library of Al-Mustansiriya University , amounted to (107345) theses and thesis, representing 2,49 terabytes, compared to 25661 e-books stored in this library, and representing 5852 megabytes in its full text , and thus the total number of archived data reached 3.08 terabytes . Despite the diversity of databases between the different university libraries, the dominant feature in the research is by subject, author or title. This research method is used in most types of library databases , and through several criteria, including time, accuracy, and the size of the sources that are called at one time . University theses and theses and the corresponding strong competition from scientific research, as researchers are turning to them at the present time and with the complexities of accessing full information for the content of these theses and theses and not being available in full text in most databases due to the lack of appropriate techniques to deal with large data and absorb this The amount of data, this means a deterioration in the demand for university theses, compared to the increasing demand for scientific research due to the complexities of accessing its contents in the full text and the inadequacy of traditional research strategies to keep pace with the needs of the beneficiaries, especially with the increasing availability of books in digital, despite the presence of some limitations to access the full digital content of books digital. The researcher recommended that it is necessary to use techniques that respond to search strategies, especially in big data and advanced research, by using Hadoop program to cover intellectual outputs in the future, and the possibility of Hadoop investing in the field of big data and choosing The central library at Al-Mustansiriya University is a model for dealing with big data and how it can contribute to organizing it

Downloads

Published

2023-02-24