基于数据湖架构的大数据平台:品高云与Gartner联合报告正式上线

  • 数据湖的数据资源支持按主题、组织、专题等维度编目数据,保障数据的可检索性

    Data resources of the data lake can be catalogued according to subjects, organizations and features, ensuring data’s findability

  • 可通过数据及时性、数据完整性、数据一致性、数据准确性等多个维度监控和分析数据湖的数据质量,并能够实现数据质量监控、分析、检查、报告的闭环管理,此外,还支持数据消费者对数据资源的质量进行评价评论,持续提升数据湖的数据质量

    Data quality canbe monitored and analyzed in terms of data’s timeliness, integrity, consistency, and accuracy, and it’s possible to perform a closed-loop management of the monitoring, analysis, inspection and report of the data quality. Moreover, data consumers can also evaluate and comment on the quality of the data resources, which will continuously improve the data quality of the data lake

  • 能够实现从数据集成、数据存储、数据处理、数据消费的全过程性能指标的监控分析,实时监控分析各个环节的处理情况,帮助管理人员第一时间掌握数据湖的整体运行状况,对于数据湖的运营、可持续发展具有指导意义

    Monitoring and analysis of the performance indexes can be achieved throughout the process ofthe integration, storage, processing and consumption of data. It will monitorand analyze in real time the handling of each link, which can help the managers to grasp the overall running conditions of the data lake in the first place andhas guiding significance for the operation and sustainable development of a data lake


数据分析与消费

Data Analysis and Consumption

当大量数据被采集到数据湖中,经过开发处理,再将处理后的可用数据存入回数据湖,为各类大数据分析应用提供数据支撑。联系品高云家的小表妹(ID:pingaoyunzzm)了解更多。

Massive data can be collected into the data lake and then developed and processed. Processed available data can then be stored back into the data lake, providing data support for various big data analysis applications.

品高数据湖方案中提供大数据分析平台,通过自助分析、数据可视化等多种方式让用户进行数据消费,自由发掘数据的潜能和价值。平台中内置仪表盘、数据源管理、数据报表、数据报告以及与地理位置信息结合的数据运算和展示等多种分析组件,同时还可以支持第三方的数据分析工具、以及用户自己开发的分析工具等。

Bingo Data Lake solutions provide platforms for big data analytics, and enable users to conduct data consumption and explore the potential and value of data by means of self-analysis and data visualization. Built-in analysis components in the platforms include dashboards, data source management, data reports, and data processing and demonstration combined with geographic positions. Meanwhile, third-party data analysis tools and tools developed by users are also supported.

  • 提供内置的自助查询工具,可直接通过图形化界面建立数据分析,用户可通过配置数据模型、过滤条件、结果字段等查询条件,即可获得相应的数据分析结果报表

    Built-in query tools can help to perform data analysis with graphic interfaces. Users can set query conditions such as data model, filter condition and result field, andacquire relevant result reports of the data analysis

  • 提供多样化的数据分析呈现图表,如地图工具、数据报表、数据脑图、数据报告等,依据数据可视化的科学方法以合理的方式为用户呈现分析结果,极大提升分析结论的可读性

    Diverse data analysis charts are provided, such as maps, data reports, data mind maps, etc. Analysis results are presented in the scientific and reasonable way of data visualization, contributing to much greater readability

  • 支持数据分析过程的协作共享,从源数据到得出分析结果的过程中,可分别由不同的用户分工协作,其中可能包含数据管理员、分析人员、一线业务人员等等,让各类用户均能够参与到数据分析的过程中来,并以社交化的方式分享数据分析报告

    Collaboration and sharing is allowed during data analysis. In the process of getting a result from source data, users can coordinate and distribute responsibilities. Persons involved might include data managers, analysts, first-line business personnel, etc., which allows participation of various users in the process of data analysis and enables the sharing of data analysis reports in a socialized manner


应用场景

Application Scenarios

基于上文中介绍的品高数据方案的功能特性和创新点,以下列举三个适合于应用数据湖方案的应用场景。

In accordance with the characteristics and innovations of Bingo data solutions, 3 scenarios suitable for data lake solutions are listed as follows.


场景1:跨组织边界的数据共享

Scenario 1: Data Sharing Across Organizational Boundaries

随着大数据的深入发展,各企业、政府纷纷建设了大数据平台,对于提升企业生产效率、销售模式以及政府治理水平等起到了有效的推动,数据应用不再局限于自身拥有的数据,要求通过多方数据共享后的汇聚分析实现更大力度的数据创新,进而促进企业或政府组织的治理质量提升。联系品高云家的小表妹(ID:pingaoyunzzm)了解更多。

As big data further develops, enterprises and governments have successively established their big data platforms, which contributes tothe improvement of the enterprises’ production efficiency and sales patternsand the governments’ governance. The applications of data are no more confinedto one’s own data, and the convergence analysis following data sharing among multiple parties can realize greater data innovation and improve the governance of enterprises or government organizations.

传统解决方案存在的问题

Problems with the Traditional Solutions

  • 难实现异构技术融合

    Difficulties in Achieving Heterogeneous Technology Convergence

组织机构产生的数据复杂多样,数据汇聚难度大。Hadoop技术仅能够解决单个部门的数据存储和处理,但无法解决跨组织边界的技术融合和共享权限问题。跨组织边界的大数据技术路线不一,技术融合难度大。

Complicated and diverse data generated from organizations result in huge difficulty of data convergence. Hadoop technology is able to settle the data storage and processing of a single department, while unable to address issues over data integration and sharing rights across organizations. Big data technical routes across organizational boundaries are varied, which causes huge difficulty in technology integration.

  • 数据共享模式存在不足

    Defects of Data Sharing Modes

跨组织边界的数据共享开放常见模式有数据查询接口、FTP 文件交换、大数据交易所等。

Common modes of data sharing across organization boundaries include data query interface, FTP file exchange, big data exchange, etc.

  • FTP 文件交换存在安全性弱、交换性能差、数据主权难界定、需拷贝数据等问题

    FTP file exchange is weak in terms of security and exchange performance. Here, data sovereignty is hard to define, and data has to be replicated

  • 大数据交易所缺乏数据汇聚基础,难以满足大量数据的关联碰撞

    Big data exchange is in lack of a basis for data convergence, and is hard to fulfill the association and collision of massive data

  • 缺乏对运营体系的支持

    Lack of Support for Operation Systems

大数据平台往往重技术、轻运营、轻质量,导致大数据平台无法可持续发展,有必要从数据评价、数据质量和数据开放指数建立全面的数据运营体系,保障数据共享的可持续发展。

Big data platforms often pay more attention to technologies than their operation and quality, which results in its difficulty in sustainable development. It is essential to create a comprehensive data operating system by referring to data’s assessment, quality and index of opening, and protect the sustainable development of data sharing.

应对与解决

Coping Solutions

针对以上问题和需求,品高数据湖方案通过深度融合云计算和大数据技术,以数据存储为基础,通过在本文所述的数据集成、数据开发、数据管理、数据消费四个方面的创新能力,解决组织部门之间、跨组织、跨行业的数据共享和开放,帮助组织构建可持续、健康的数据生态链,通过数据关联进一步挖掘数据价值,推动数据创新。

Aiming at problems and demands listed above, on the basis of data storage, by integrating cloud computing and big data technology, and by taking advantages of its innovative capabilities on the integration, development, management and consumption of data, Bingo Data Lake solutions settle the data sharing and opening across departments, organizations and industries, help organizations to create a healthy and sustainable data ecological chain, and further excavate data values through data association so as to promote data innovation.


场景2:促进基于数据的产学研的合作

Scenario 2: Promoting Production-Study-Research Cooperation Based on Data

行业生产数据与科研之间的矛盾

Contradiction between Production Data and Research

政府机构、大型企业拥有大量生产数据,但技术储备和算法模型较弱,而高校、科研机构有技术、有算法模型,苦于没数据。

Government agencies and large scale enterprises possess massive production data but weak technical reserves and algorithm models, while universities and research institutions turn out to be the opposite.

利用数据湖建立生产和科研的桥梁

Building A Bridge between Production and Research with A Data Lake

基于上述问题,可通过数据湖将行业生产数据脱敏后存储到数据湖,开放给科研机构、高校进行研究性探索,同时,研究成果可反馈应用于企业,从而有效促进基于数据的产学研合作。联系品高云家的小表妹(ID:pingaoyunzzm)了解更多。

On account of the problems above, production data can be desensitized through the data lake, stored in it, and opened to research institutions and universities for research purposes. Meanwhile, research results can in turn be applied by enterprises, which may effectively promote the Production-Study-Research Cooperation based on data.


场景3:联邦数据湖

Scenario 3: Federated Data Lake

跨组织的数据集中存在安全和信任问题

Security and Trust Issues in Cross-organizational Data Collection

在数据湖的建设过程中,会常常遇到跨企业间、不同政府部门间的跨组织数据湖建设。如果通过统一的数据湖来集中管理所有数据,数据的采集将会变得比较困难,包括组织间的数据互信、数据主权、数据安全等一些列问题。

During the constructions of data lakes, we will frequently encounter cross-organizational constructions across enterprises or different government departments. If we manage all data with a unified datalake, data collection will become difficult, and issues like mutual trust, sovereignty and security of the data will occur.

利用联邦数据湖构建开放的数据生态

Data Ecology Based on Federated Data Lakes

应对上述情况,品高数据湖方案提供去中心化的联邦数据湖,平台基于联邦数据湖实现跨部门、跨组织的数据共享,并通过数据开放平台,将数据相关的目录、工具、服务、模型开放出来,各组织和数据模型相关软件开发商均可在上面进行数据协作,帮助企业、政府构建可持续发展的数据生态链。

To address the situation, Bingo Data Lake solutions offer federated data lakes that are decentralized. The platform based on federated data lakes can realize data sharing across departments and organizations. Relevant catalogs, tools, services and models can be opened for all organizations and relevant software developers to collaborate, thus helping enterprises and governments to establish a healthy and sustainable data ecological chain.


联系我们

如想了解更多品高云数据湖或索取产品文档,请联系品高云家的客服小表妹!添加她为好友,任何需求一键直达。

基于数据湖架构的大数据平台:品高云与Gartner联合报告正式上线


分享到:


相關文章: