Big Data Integration

Big Data Integration at Python.Engineering

Big Data Integration

See more books

The big data era is upon us: data are being generated, analyzed, and used at an unprecedented scale, and data-driven decision making is sweeping through all aspects of society. Since the value of data explodes when it can be linked and fused with other data, addressing the big data integration (BDI) challenge is critical to realizing the promise of big data.

178 pages, published in 2015
The big data era is the inevitable consequence of datafication: our ability to transform each event and every interaction in the world into digital data, and our concomitant desire to analyze and extract value from this data. Big data comes with a lot of promise, enabling us to make valuable, data-driven decisions to alter all aspects of society. Big data is being generated and used today in a variety of domains, including data-driven science, telecommunications, social media, large-scale e-commerce, medical records and e-health, and so on. Since the value of data explodes when it can be linked and fused with other data, addressing the big data integration (BDI) challenge is critical to realizing the promise of big data in these and other domains. As one prominent example, recent efforts in mining the web and extracting entities, rela- tionships, and ontologies to build general purpose knowledge bases such as Freebase [Bollacker et al. 2008], the Google knowledge graph [Dong et al. 2014a], ProBase [Wu et al. 2012], and Yago [Weikum and Theobald 2010] show promise of using integrated big data to improve applica- tions such as web search and web-scale data analysis. As a second important example, the flood of geo-referenced data available in recent years, such as geo-tagged web objects (e.g., photos, videos, tweets), online check-ins (e.g., Foursquare), WiFi logs, GPS traces of vehicles (e.g., taxi cabs), and roadside sensor networks has given momentum for using such integrated big data to characterize large-scale human mobility [Becker et al. 2013], and influence areas like public health, traffic engineering, and urban planning. In this chapter, we first describe the problem of data integration and the components of traditional data integration in Section 1.1. We then discuss the specific challenges that arise in BDI in Section 1.2, where we first identify the dimensions along which BDI differs from traditional data integration, then present a number of recent case studies that empirically study the nature of data sources in BDI. BDI also offers opportunities that do not exist in traditional data integration, and we highlight some of these opportunities in Section 1.3. Finally, we present an outline of the rest of the book in Section 1.4.
Z. Meral Ozsoyogˇlu, Case Western Reserve University

Latest publications

Underline (_) in Python

The following are the various places where _ is used in Python:

  1. Single underscore:
    • In the translator
    • After the name
    • Before name
  2. 18/07/2021

__name__ (special variable) in Python

Consider two separate files File1 and File2.

# ...


Stripping and searching ordered words in a dictionary using Python

Ordered word — it is a word in which letters are displayed in alphabetical order. For example, abbey and dirt . The rest of the words are unordered, f...


SunPy | Plotting a solar image in Python

At the command line, enter:

 pip install sunpy 

Download sample data

The SunPy package contains a s...