Design and Implementation of ZTE Object Storage System

2012-05-22 01:24:08HuabinRuanXiaomengHuangandYangZhou

ZTE Communications 2012年4期

Huabin Ruan ,Xiaomeng Huang ,and Yang Zhou

(1.Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China;

2.Center of Earth System Science,Tsinghua University,Beijing 100084,China;

3.Communication Services R&D Institute,ZTE Corporation,Nanjing 210012,China)

Abstract This paper introduces the basic concepts and features of an object storage system.It also introduces some related standards,specifications,and implementations for several existing systems.ZTE’s Object Storage System(ZTEOSS)was designed by Tsinghua University and ZTECorporation and is designed to manage large amounts of data.ZTEOSShas a scalable architecture,some open source components,and an efficient key-value database.ZTEOSSis easy to scale and highly reliable.Experiments show that ZTEOSSperforms well with mass data and heavy

Keyw ords cloud storage;object storage

1 Introduction

C loud storage has become more well-known over the past few years,and industry has taken a great interest in it.A handful of commercial cloud products have come to the fore,including S3[1]by Amazon,Windows Azure[2]by Microsoft,and Atmos[3]by EMC.Recently,domestic Chinese companies such as China Mobile and China Telecom have developed corresponding cloud standards and prototypes.According to a report by IDC,the worldwide market for cloud storage systems was worth 1.5 billion dollars in 2009 and will climb to 7 billion dollars by 2014.In this paper,we focus on object storage,specifically,large-scale distributed object storage,which is one of the most important techniques for cloud storage systems.

First,we address the questions of what is an object,and what is the difference between objects and common files used in local file systems to store pictures and documents?We can take the object in an object storage system as the common files we use every day.An object is similar to a file in which documents,pictures,and videos are stored.However,an object also has some differences.

An object usually contains more information than a file does.Compared with a file,an object is self-contained and usually contains more metadata.An object can be understood as a better encapsulation of a file.Moreover,it is intelligent because the object system itself can determine the distribution of the physical storage location of objects and the numbers of copies.The storage architecture of objects is also more elastic,so intelligent management can be implemented in the storage layer,and different QoScan be provided to various objects.Finally,objects can teach each other equally.A traditional file system is organized in a tree style,but in an object storage system,all objects are placed flat.This kind of organization is very flexible and allows the creation of different architectures,including the tree style.The container concept exists in some object storage systems,such as Amazon S3,and a container can be seen as a special file.We can also take a container as a special object;then,the object can be classified as a data object and container object.

Now that we have explained what an object is,we measure the advantages of the system.People may doubt the need to encapsulate files into objects because we already have a mature file system.An object storage system has capacity that exceeds that of a single hard disk,disk array,or even more professional storage devices.This capacity will increase markedly.S3 from Amazon emerged in early 2006,and was used to store 20 billion objects in the following two years.This number doubled each year,with more than 50 billion objects stored by 2009 and more than 100 billion objects stored by 2010.If we suppose every object is about 100 kBwithout considering redundancy,the total capacity required is 10 PB.More importantly,S3 is not limited in service range and is available worldwide.In order to achieve this,sufficient bandwidth guarantee,organization,and allocation are essential.

In 2007,MITstudent Drew Houston set up a company to develop a product used for personalized data backup and synchronization.The product was based on Amazon S3 and is the ancestor of Dropbox[4],which attracted millions of users within a year and with only 10 staff.Because of the features of S3,Dropbox developers do not have to worry about fundamental construction and can focus on products and services.In contrast,the forerunner,Kingsoft Kuaipan[5]has to maintain its storage devices and to contrast a content delivery network(CDN).Kuaipan has no advantages in terms of R&D costs.

Object storage systems relieve developers from having to construct fundamentalinfrastructure,but there are also benefits to enterprise customers.Siemens developed a new software distribution and upgrade platform to substitute its former system.The ITdepartment at Siemens no longer has to worry about maintaining three data center networks and sets of equipment,and operating costs are cut sharply.NBC and GEalso set up their ITservices on the storage provided by Nirvanix to reduce resource waste and cost.

2 Standards for Object Storage Systems

2.1 Amazon S3

Since 2006,Amazon has provided developers with S3,which has become the de facto standard for object storage systems.Amazon S3 is based on the idea that high-quality internet storage should be easy to obtain.Then developers no longer need to worry about security,capacity,or how to store their data.S3 frees developers from establishing and maintaining storage solutions a large-scale investment in storage is not required.Amazon S3 has simple and reliable functionality that allows cheap and secure storage of any amount of data,and data is accessible forever.With the help of Amazon S3,developers can concentrate on how to use data instead of how to store data.

An object defined by S3 contains object,bucket,and key.An object comprising object data and metadata is a basic entity stored in Amazon S3.The bucket is the container for storage objects in Amazon S3.Every object is held in a bucket.The key is the only identifier for each object in a bucket;one object in a bucket can only have one key.

For manipulating objects in S3,there are functions such as creating a bucket,writing an object,deleting an object,and listing keys.S3 is a simple storage system that provides object operation semantics to users.Users can operate S3 by putting objects into a bucket and accessing objects from a bucket.There is a simple interface for WEB service that can offer data access on the network anytime,anywhere.S3 uses highly expandable,reliable,fast,and cheap fundamental data-storage infrastructure to run its global websites.Any developer is authorized to use the same data storage infrastructure.S3 aims to expand scale to obtain benefits and pass them on to developers.

2.2 SINA CDMI

The storage industry standard organization SINAreleased a document called“Cloud Data Management Interface(CDMI)”[6].CDMIis mainly about the client platform and data center server.It defined a standard interface for data exchange between these platforms.HTTPis used to encapsulate the representational state transfer(REST)communication command.The data center responds to client requests for web service and provides the service.

SNIA CDMIwas the first cloud storage specification and contains object types such as data object,container object,domain object,queue object,and capability object.Containers can nest and contain objects.

SNIACDMIdefines capabilities,creates a container and an object in container,lists objects in a container,reads the object and deletes object.SNIAis implementing a prototype system according to the SNIA CDMIv1.0 specification.This prototype is based on JAVA and willsoon be released to the public.China Mobile has also defined a correlated inter-enterprise specification to prepare for entry into the cloud storage field and to provide object-oriented storage.

3 ZTEObject Storage System

3.1 System Architecture

We collaborated with Tsinghua University to implement an object-oriented system[7].Fig.1 shows the system architecture,which contains an object interface layer,object service layer,and object storage layer.

The object interface layer provides a RESTinterface and APIinterface.The RESTinterface is used for accessing the object system based on HTTPprotocol.The APIinterface is used to access a web service requested by clients.ZTE’s Object Storage System(ZTEOSS)uses Apache Axis2/C,which is a web services/SOAP/WSDL engine that handles the HTTPrequest[8].Apache Axis2/C provides a complete object model and a modular architecture that makes it easy to add functionality and support new web-service-related specifications and recommendations.Axis2/C allows the creation and use of a REST-based web service.

The object-service layer mainly provides object management and control,container management and control,and system management and security.To abstract storage resources,this layer provides a series of storage-adaptation interfaces called the adaptation interface layer.Using this method,the system can substitute physical storage devices without modifying the storage interface codes on the service layer,and the system can be scaled.The object-storage layer provides constant storage services for objects and containers.

Because of the interface design,the object-storage layer can adopt various kinds of physical storage,for example,local file system,network file system,distributed parallel file system,or storage area network.

▲Figure 1.ZTEOSSarchitecture.

3.2 Metadata Storage

Metadata storage is one of the core modules of ZTEOSS.ZTEOSShas an efficient method for managing metadata.This method requires the underlying key-value storage system to support sorting of items by key.This method is also used in the metadata storage management of PVFS2,a well-known parallel file system.High-performance key-value storage system with key sorting,such as Berkeley DB and Hadoop HBase,are relatively common.With key value and sorting,the frequent operations of an object storage system,such as creating an object,deleting an object,reading an object’s data,and listing a container,a can be done efficiently.

In ZTEOSS,the object metadata is saved in a key-value table called Meta Table,and the structure of the container is saved in a key-value table called Entry Table.In Meta Table,object's ID is the key,and the object’s attributes are the values.The attributes of an object include object creation time,last modified time,parent information,layout information,and access control list.In Entry Table,there is a special ID for each container,and this is used to associate a container with its subcontainers.We callthis special IDthe container handle ID.The items related to container handle ID in the Entry Table are

With(1),we allocate a unique container handle ID for a specific container.

Symbol$in(1)is the items unique identifier.Item(2)in Entry Table saves the total number of sub-objects and sub-containers in a container so that we can easily obtain the number of items in a specific container.This is useful for the administrator.Symbol@in(2)is used to locate this information.Item(3)is used for associating the sub-objects and sub-containers in a specific container.The key of this item is the container handle IDconnected with the sub-object and container names,and the values are the IDs of sub-objects or sub-containers,which can be used to locate the object detail information in Meta Table.

Tables 1 and 2 are a metadata storage instances for the container structure shown in Fig.2,where a is a container,b and c are sub-containers,and d is a data object in container b.We assume 1,2,3,4 are the IDs for a,b,c,d,respectively,and 5,6,7 are the container handle IDs for the container a,b,c,respectively.

With the above storage mechanism,we can perform frequent object operations efficiently.Taking Fig.2 as example,if we want to create a new object e(with ID 10)in container c,we only need to insert(4)in Entry Table and(5)in Meta Table and update(6)in Entry Table.

▼Table 1.Entry Table for Fig.1

▼Table 2.Meta Table for Fig.1

▲Figure 2.Container structure instance.

If we want to list a container,we only need to locate the items with prefix 5 and then read items in sequence until the items no longer have this prefix because our underling key-value storage system is sorted by key.

Other operations such as delete and read objects can also be implemented efficiently with no more than three operations totally to Entry Table and Meta Table.

3.3 Data storage

ZTEOSSallow object data to be stored in different devices through the same internal IO interface.In this way,application developers can store their application data in a simple way using the same object interface provided by ZTEOSSwith different layout information in the request message.In ZTE OSS,we use layout for the data storage mechanism.Layout is one of the object attributes that indicates where object data is to be stored.As well as layout in the request message,we also have a storage adapter layer,which is a library used to save data in different storage devices according to layout value in object operation request message.The object’s public interface in the adapter layer is

where argument data is the object data to be written;data_size is the length to be written,and layout is the value specifying the device to be written to.

ZTEOSSsupports four storage devices that are currently being adapted.These devices and their layout values are shown in Table 3 and can be extended easily when new storage devices are added.We only need to add a new value for the layout and implement new private I/O interface for a new device.This is transparent to the user because the object public interface is not changed.For example,when the layout is key-value,object data will be stored together with the object attribute through the object attribute object_data field;that is,object data is stored directly in Meta Table.Small data,such as configuration data,is suitable in this case.When layout is DFS,the object data is saved in a distributed file request message.The reason we only encrypt the key content and not all the content in a request message is performance.The time taken for encryption depends on the length of the message,and checking the key content is enough to determine whether the message has been modified during network transmission.Therefore,in order to reduce the affect of security checking on performance of the ZTEOSS,we only encrypt the key content in the request message.The algorithm we use for encrypting the key content is SHA1.The key content in the request message includes system,which is suitable for storing large data such as streaming media data.Other values for the layout and corresponding storage devices are shown in Table 3.

3.4 Security Checking

ZTEOSShas a security mechanism that ensures data is transmitted safely.It encrypts the key content in the object

▼Table 3.Layoutfor objectdata

?HTTPmethod.The method used in HTTPrequest message,include:PUT,GET,POST,DELETE,HEAD.

?object or container information.This is the ID or name of the object and container.

?user access key.This value can be user name or user ID in ZTEOSS.

?request time.This is the timestamp of current request generated time.

We create a container called MyContainer and assume this request is issued by user James on 03-08-2012 at 12:00:00.The string to be encrypted will be

According to the semantic of HTTP,creating a container is like a creating new resource,so we use PUTas HTTPmethod to create container.

3.5 Functions and Interfaces

The interface provides APIin C language and RESTin HTTP message.Object-correlated functions include object management and control.Object management involves creating an object,deleting an object,copying an object,moving an object,setting an object’s metadata,reading an object,and reading the metadata of an object.

Container-correlated functions include container management and control.Container management includes creating a container,deleting a container,listing allobjects of a container,reading metadata of container,setting metadata of container,and so on.

Security is ensured through validating key information in request.The key information in request should be encrypted in MD5.

4 Testing and Performance

ZTEOSSparallel testing and capacity testing are based on an eight-node Linux cluster(Fig.3).Paralleltesting means testing the number of requests completed during the unit time using Request/Per Second as the unit.Capacity testing involves testing the maximum quality of the objects stored in ZTEOSSwith fixed hardware resources.Every node in the Linux cluster has the same software and hardware configuration(Table 4).The object storage system was deployed on four nodes of the cluster.Then,the other four nodes were configured as the clients sent the request.The nodes from Client_1 to Client_12 sent the request to the nodes from OSS_1 to OSS_4.Every client sent the common operations to the OSS,creating an object,creating a container,deleting an object,and listing all of objects in a container.

The result of the testing showed that the degree of parallelism(DOP)between the common ZTEOSS operations—creating and deleting objects and containers,and listing all objects in a container—is is the same as that using Axis2/C parsing.With the same software and hardware configuration,the DOPfor Axis2/C parsing is up to 1800 requests per second,and DOPof common ZTEOSS operations(except listing the containers)is greater than 1700 request per second.The DOPof the list container depends on the number of objects in a container.When the number of objects in a container is less than or equal to 10,DOPis 1724accommodate more objects.For example,with 1 GB free memory and 100 GBfree disk in the system,23.5 million objects can be contained in ZTEOSS.According to the test results,the performance of ZTEOSSon the parallel testing and capacity testing is good[9].request per second,equivalent to that of Axis2/C parsing.If the number of objects in a container is 100,the DOPis 1307 requests per second.If the number of objects in a container is 1000,the DOPis 281 requests per second.Creating a name list requires parsing the object name and merging into the name list,and it is spent on List container operation depending on the number of the objects in a container.The result of test shows that the DOPis not linear with the number of objects in a container.The result of parallel testing is shown in Table 5.

▼Table 4.Software and hardware configuration of the node in the Linux cluster

▲Figure 3.ZTEOSStesting networks and nodes.

▼Table 5.ZTEOSSparallel test result

Capacity testing shows that the capacity of ZTEOSS depends on the idle hardware resources(Table 6).A greater the number of idle hardware resources in the system can

▼Table 6.Capacity test result

5 Conclusion

Popular object storage systems are imperfect in areas such as data access,consistency,data encryption,data ownership,and data isolation.When evaluating cloud storage or object-based storage,most companies are usually concerned with safety and reliability.As a mature commercial object storage system,Amazon S3 has been serving many companies and organizations for years.However,companies using Amazon S3 have suffered losses because of breakdowns in the system.Developers need to evaluate benefits and risks and do reasonable tradeoffs.

ZTE Communications2012年4期

ZTE Communications的其它文章: 2013 IET International Conference on Information and Communications Technologies; ZTELaunches the First PC-Based CPTfor LTENetworks; ZTECommunications Guidelines for Authors; ZTELaunches Innovative Energy-Saving Solution for LTENetworks; Hierarchical Template Matching for Robust Visual Tracking with Severe Occlusions; Parallel Web Mining System Based on Cloud Platform

国产日韩欧美一区二区三区三州_亚洲少妇熟女av_久久久久亚洲av国产精品_波多野结衣网站一区二区_亚洲欧美色片在线91_国产亚洲精品精品国产优播av_日本一区二区三区波多野结衣 _久久国产av不卡