A Framework for Storage and Retrieval of Distributed Vector Graphics Objects

(1)

ISSN: 2005-4238 IJAST 681

A Framework for Storage and Retrieval of Distributed Vector Graphics Objects

1 K. Nithya, ²Amartya Das Sharma, ³B. Vijayakumar

Computer Science Department Birla Institute of Technology and Science, Pilani

Dubai Campus, UAE.

Abstract

Vector Graphical Objects play an important role in many image processing applica- tions such as web, mobile authoring, printing and GIS mapping. These objects are stored in various formats such as scalable vector graphics, postscript, portable document for- mat and adobe illustrator artwork. These objects are stored at one or more sites in the internet or a local area network. Hence, it is essential to have a framework for storing and retrieving these objects in an organized manner. The present work proposes a framework for efficient storage and retrieval of vector graphical objects in a distributed environment, using content and/or metadata. It comprises of a control site and many other sites from where the users can initiate queries for execution. The control site com- prises of distributed query execution manager and query results aggregator. The users can submit queries on any vector graphic object, based on its content and/or metadata.

The global query analyzer at the control site parses the input query and sets it up for ex- ecution by distributed query execution manager. The allocation schema provides infor- mation on distribution of data. The sub queries run in parallel at different sites. The ag- gregator combines, the output SVG objects and transfers them to the users at the query initiation sites. This framework provides both flexibility and scalability in the manage- ment of distributed vector graphical objects.

Keywords: content-based image retrieval, distributed database, metadata, NO-SQL, vector graphics, XML parser.

List of Abbreviations: BLOBs: Binary Large OBjects, CDR: CorelDRAW, DWT:

Drawing Template, DXF: Drawing eXchange Format, GIF: Graphics Interchange For- mat, GIS: Geographic Information System, GOMDB: Graphical Object MongoDB, GUI:

Graphical User Interface, JPEG: Joint Photographic Experts Group, MAD: Meta-Array Data, NoSQL: Not Only SQL, PGML: Precision Graphics Markup Language, PNG:

Portable Network Graphics, RDBMS: Relational Database Management System, RTF:

Rich Text Format, SVD: Singular Value Decomposition, SVG: Scalable Vector Graphics, VML: Vector Markup Language, W3C: World Wide Web Consortium, XML: Extensible Markup Language.

1. Introduction

Information Retrieval of graphical objects in a distributed environment remains an active area of research. These objects are usually stored in formats such as SVG, PDF, PS, CDR, DXF and so on. Vector graphical objects offer benefits in terms of compact file size, reso- lution independence and scalability to any size without loss of quality. Vector images contain geometric objects such as points, lines, poly-lines, polygons, circles and splines. Using these basic shapes, it also supports paths, cubes and curves as well as text [1]. Business organizations represent these images as a recognizable trademark, which is unique. It

(2)

ISSN: 2005-4238 IJAST 682

helps to identify and distinguish their products and services from other sources. SVG script is easily readable and it can provide faster search based on mathematical description of graphical objects. Hence it is well suitable for metadata and content-based retrieval appli- cations.

The algorithms for search and retrieval of graphical objects make use of various image formats, based on browsing and navigation methods [18]. However, it still poses chal- lenges in locating the desired images from a large collection of distributed image repository.

This is the main motivation for the current work to build a framework for efficient storage and retrieval of distributed vector graphical objects.

The SVG specification is an open standard developed by the W3C during 1999. It drew on the designs of the standards PGML (developed from Adobe's PostScript) and VML (developed from Microsoft's RTF) [17]. SVG is an XML based vector image, stored in text format. SVG formats do not contain pixel data, and are rich with spatial data, which can be extracted and processed smoothly [3]. It is built on readable/searchable XML text files. It is a markup language with a whole new set of tags and elements. For data visualization, SVG files scale very well on any type of output devices such as Mobile phones, Tablet PCs and Desktop systems. The SVG scripts can be modified easily by using different tools, which can accommodate the changes to data. The metadata is usually embedded within these files [18].

The present work deploys a MongoDB database for storing and retrieving the graphical objects. It is an open source, document-oriented and NoSQL database. It makes use of metadata and content pertaining to the vector graphic objects. Its data structure can be designed as an independent document. It does not restrict the data model [14]. J. Yoo et al [22]

discussed on MongoDB, a document-oriented NOSQL database with two phases such as schema and data migration. It supports improvement in query processing. It is highly capa- ble of storing semi-structured and unstructured data [15]. It can handle the complex queries and BLOBs.

The data sets considered in this work include different categories of SVG images such as flags, graphs, electrical diagrams, logos and trademarks. The rest of this paper is organized as follows: Section 2 gives an overview of related work on retrieval of vector graphical objects. Section 3 presents a model description covering metadata in SVGs, data specification, data sharding and data distribution. The SVG documents are stored over one or more shards. The shards are assigned to one or more sites in a distributed environment. In Sec- tion 4, an algorithm DIST_VG_DATA_MGT is formulated. It carries out actions such as database creation, text extraction and query processing. The later sections describe the experimental setup, implementation and performance analysis.

2. Background

As digital media grows, good retrieval and management techniques become more important.

The traditional search method is limited to keyword-based queries. The multimedia data was annotated with textual information. However, for multimedia databases, there is a need for content-based search to retrieve the relevant information in a fast manner. A fast similarity retrieval method for vector images uses pre-calculation results for similarity matching. These are obtained in advance by matching database images with previously selected images called representative queries [4]. The user specified input string is matched with the already stored fingerprints in the multimedia database. The fingerprints can be calculated using hash functions or similarity measures such as Euclidean distance, correlation coefficient and cosine similarity [2]. To protect the copy of video content from unauthorized dis-

(3)

ISSN: 2005-4238 IJAST 683

tribution, the binary images are used as watermarks to embed into chrominance blue channel [6]. The channel used the dual tree complex wavelet transform. SVD decomposes the watermark coefficient matrix into three independent matrices such as U, V and S in the au- dio and video streams.

A structure of a vector image can be represented as a tree in which each node is assigned to an object region of vector image [7]. Every link represents the inclusion relationship between two object regions. The trees are generated by separating the object regions from their background. This method of vector image segmentation, is used for a content-based vector image retrieval system. R. Fujita et al. [8] proposed a system, that retrieves vector images from a database using a sketch given by the user. This system finds similar vector images from the database by matching feature points which are extracted from the sketch of each vector image in the database. T. Rayashi et al. [9] discussed about a method for extracting the WFPs (Weighted Feature Points) from images and proposed a system for sketch-based 2D vector image retrieval. P. Sousa et al. [10] suggested a mechanism to describe the spatial arrangement of drawing elements. For validation, this mechanism devel- ops a prototype for the retrieval of clip art drawings in SVG format.

A collaborative image annotation scheme provides a flexible data model to assist metadata and multimedia annotations in 2-D medical images [11]. This method implements an SVG based data model, which can support textual, spatial and collaborative queries on annotations. The architecture of collaborative annotations is also discussed for storing, querying and exchange of annotations. A software architectural model [5] is designed for a distributed multimedia database system, which deals with the components and their inter- relationships for carrying out tasks such as allocation, storage, indexing and retrieval of multimedia information, which also includes vector graphical objects. A descriptor for retrieving clip arts from a database using combining information, is discussed in [13]. The combining information is extracted from both raster and vector images. S. Jeschke et al.

[23] presented an automatic diffusion curve coloring algorithm to calculate the optimal colors of vector graphics. The parameters regulated the Gabor noise to simulate the textual information. The algorithm estimated the noise color, scale and direction to develop a tool for editing the curve parameters of vector graphics.

A MAD search approach for retrieving information over array and semi-structured distributed databases supports different services to access resource descriptions [12]. It builds a catalog based on meta information, which provides access over different protocols and uniformly examines the existing data with the support of XML based web service. K.

Abe et al. [24] extracted the feature vectors of an object’s contour shape. They organized the contour points on the contour of each local objects, extracted the shape-based features using centroid distance, and measured the angle at the points for similarity retrieval of trademark images denoted by vector graphics.

The existing literature on vector graphical objects mainly focused on various techniques such as indexing, segmentation and similarity measures. In the present work, we propose a framework for efficiently storing and retrieving the distributed vector graphics objects. This model uniquely describes the semantic data of these objects based on different tags. The tags are extracted automatically using an XML parser. Grouping nodes are stacked in hierarchical fashion, which are supportive for describing the queries. The experimental test of this work primarily focuses on SVG format using MongoDB. The classification of SVG collections perform on the range of key values of attributes in the documents, which are existing in MongoDB shards. The data model representation proposed in this work is equally applicable to other vector graphics formats as well.

(4)

ISSN: 2005-4238 IJAST 684

3. Description of SVG Metadata

Descriptive elements are components, which give supplementary information about their parents. SVG 1.1 standard proposed by W3C defines the following three descriptive elements:

 <desc> – This tag contains semantic information about the data that following it. It is never rendered on output. It is optional for creators.

 <title> – This tag contains an explanatory title about the parent tag, where it resides in. The standard mentions that this tag is recommended for creators.

 <metadata> - The Metadata in SVG description should be denoted within the metadata elements. This tag should have elements from other XML Namespaces.

The present work considers the tags <title>, <desc> and <text> as the sources for extracting the semantic data about the images in the repository. This data is extracted through XML parser.

3.1 Metadata and Categorization

SVGs can scale up or scale down easily for vector graphical objects. Depending on the usage, SVG can be classified into one or more categories. The information about these categories are specified in the XML script. This enables indexing, searching and retrieval in a rapid manner. Editing SVGs requires the creators to insert the appropriate (opening and closing) tags with data wherever it is required. The tags are as follows:

 The parent node for the metadata begins with the tag <svg>.

 The <title> tag, is inserted at the beginning of appropriate block.

 The <title> tag is followed by a <desc> tag in any block.

The SVG images are stored at one or more shards. The shards are assigned to one or more sites in a distributed environment. The insertion of the metadata into every SVG image requires a finite amount of time, based on the total number of images. The current work automates the above insertion process and makes it more efficient.

3.2 Data Specification

This section describes the representation and storage details of SVG image database.

The test images fall under the following classification: Deck of Cards, Printing Arts, Elec- trical Diagrams, Logic gates, Geometrical Shapes, Icons, Posters, Flags and Graphs. The image database is named as GOMDB (Graphical Object MongoDB) as shown in Figure 1.

It exists in one or more shards. The shards are assigned to one or more nodes in a distributed environment. The description and setup of GOMDB shards are described in the suc- ceeding sections. The compact unit of GOMDB is filled with SVG files. It is defined as SVGtrialCollection; its definition is shown as follows:

SVGtrialCollection = n(SVGtrialDocument) (1) SVGtrialDocument = <key: value> (2) SVGtrialDocument in expression (2) is an ordered key-value pair, the key  String, which is equal to the name of the value. Figure 12 shows the example SVGtrialDocument as <id: 574…2cea5>, <title: “Hole flow of conductor”>, <desc: “electrical diagrams”>,

<text: “hole flow conventional current flow”>. The fundamental unit of the data, stored in GOMDB is termed as SVGtrialDocuments, which are the set of SVGtrialCollection in ex-

(5)

ISSN: 2005-4238 IJAST 685

pression (1). The SVGtrialDocuments are divided into two parts of data. The parts are static data and dynamic data. Static data is the invariable data identity such as object ID.

The object ID refers to a unique value for each document in the collection.

Figure 1: Data model representation of GOMDB documents

Dynamic data comprises the values of fields. Each value denotes the state of the fields such as title, desc and text. The framework handles all the given SVG classification in the SVGtrialDocuments. This classification is organized within multiple collections. Each collection contains different data fields pertaining to the SVG images. Each collection is observed as a table and each document is observed as a record in RDBMS terminology.

MongoDB supports semi-structured and unstructured data with free of schema.

The data model representation of GOMDB documents is shown in Figure 1. The data model shows the structure of normalization and aggregation. Though join operation is not directly supported by MongoDB, the sub documents are linked by reference operation for embedding the documents. The embedded documents include one-to-one relationship and one-to-many relationship. These relationships vary for different classification of SVG files.

The collection of flags shows one-to-one relationship and the collection of graphs indicates one-to-many relationship. For the better use of graphical objects, the precise metadata and/or content are extracted from the original SVGtrialCollections.

3.3 Data Sharding and Query Plan

Sharding is a technique of partitioning and distributing the data across multiple servers.

It enables the nodes to store and manage a greater load among several sites [16, 19]. The distribution of data over the shards is handled by the shard key. In current work, the data distribution mechanism of GOMDB divides the SVGtrialCollections into multiple parti- tions and distributes among S sites, as shown in Figure 3. The partition and distribution processes are performed based on the range of key values of attributes in the SVGtrialCol- lections. The ranges are chosen as per the classification of SVG collections residing in

(6)

ISSN: 2005-4238 IJAST 686

GOMDB. It is achieved by considerably distributing the files over Ps shards. All shards are logically sharded by the shard key and configured at various sites. Based on the size of SVG categories, each site handles the prime shards. The prime shards consist of the original data of SVGtrialCollections. The partition and distribution processes provide more space to store a huge volume of SVG objects. The work flow for distributed query processing of SVG objects is shown in Figure 2.

The sequence of work flow for distributed query processing of SVG objects is described as follows:

 The user authentication step verifies the user login and password for validity.

 For any valid user, the input query is read from standard input.

 The Mongo server configuration is verified for its active/inactive state. If active, the connection is established and the query is passed to the global query analyzer/parser.

 The distributed query execution manager recognizes the parsed query to choose the SVG object. It executes the algorithm DIST_SVG_MGT concurrently at each site with the support of application driver and query router. It processes the input query, which is submitted by the user, from the query initiation site. The workflow for the distributed query execution manager at the control site is shown in Figure 3.

Figure 2: Work flow for distributed query processing of SVG Objects

 The meta data or content description, pertaining to the input query is accepted and extracted from the specific shard, where the data is located.

 The extracted data is sent to the MongoDB aggregator, which aggregates the results.

Finally, the results are retrieved and returned to the user.

(7)

ISSN: 2005-4238 IJAST 687

Figure 3: Distributed Query Execution Manager

4. Algorithm: DIST_SVG_MGT

The algorithm DIST_SVG_MGT works in two phases, such as INITIALIZER and QUERY_PROCESSOR.

INITIALIZER: This phase populates the SVG image database when a repository with images is given to it at any given site. This involves inserting metadata, extracting metadata and feature data from the files. The extracted data is then inserted into MongoDB. The logical flow of the INITIALIZER algorithm is shown in Figure 4.

Figure 4: Logical flow of the INITIALIZER at each site

QUERY_PROCESSOR: This phase accepts and facilitates query execution as input SVG object or textual descriptor of any SVG object. The logical flow of the QUERY_PROCESSOR algorithm at any site is shown in Figure 5.

Figure 5: Logical flow of the QUERY_PROCESSOR at each site

Extract filenames and path for each file

Insert the extracted data into a SVGs<title>

<desc> tags and save files

Scan over every file in the system and extract

the metadata Extract all the tex-

tual description from all files Detect SVG files

within a sub-directory

Make a MongoDB database and insert all the data from the repo-

sitory

(8)

ISSN: 2005-4238 IJAST 688

4.1 INITIALIZER (Phase 1)

The detailed algorithm for INITIALIZER is given below.

Input: A directory name Dtry

A MongoDB collection object coll

Output: The input collection object coll populated with metadata and feature data

1: begin

2: for each subdirectory Stry in Dtry do 3: Initilizer (Stry,coll);

4: for each .svg file vecf in Dtry do 5: load the file into memory;

6: if innertext in <title> = null then 7: innertext in <title> = filename of vecf;

8: if innertext in <desc> = null then 9: innertext in <desc> = Dtry;

10: save file to memory;

11: for each .svg file vecf in Dtry do 12: load the file into memory;

13: title = innertext in <title>;

14: desc = innertext in <desc>;

15: text = TextExtractor(vecf);

16: insert title, desc, text in coll;

17: close file;

18: return 19: end

Figure 6: INITIALIZER algorithm at each site

The actions of the INITIALIZER algorithm in Figure 6, are explained as follows:

• The INITIALIZER connects to the MongoDB server, and creates a new collection for the files.

• It accepts the directory name, where the SVG files are stored

• It loops over every subdirectory and every SVG image resides in subdirectory, opening them

(with the XML package) and inserting the directory and file name into the <title> and

<desc> tags. If there are n files in the directory, then the process takes O(n) time.

• The INITIALIZER begins to insert data into the database. It extracts the following from every file:

• The titles

• The descriptions (i.e. the metadata)

• The text 4.1.1 Text Extractor

The Text_Extractor algorithm uses an iterative loop to find every <text> tag within an SVG file. It terminates once the tags have been exhausted. <g> nodes are stacked in a hierarchical fashion and can be accessed together using XPath for that particular level of nesting. This process is visualized in Figure 7.

(9)

ISSN: 2005-4238 IJAST 689

Figure 7: Logical flow of the Text_Extractor at each site The detailed steps of Text_Extractor algorithm is given below:

Input: A svg file vecf

Output: A string of all the text in the file txt 1: begin

2: CREATE the initial <xml> tag as the root;

3: retString = null;

4: while true do

5: txtlist = list of all nodes in vecf with 6: XPath = root + <text>;

7: if txtlist = null then 8: break:

9: for every node n in txtlist do

10: APPEND innertext of n to retString;

11: //now it is need to extract from <text>

12: //nested within <g> tags 13: root = root + <g>;

14: return retString;

15: end

Figure 8: Text_Extractor algorithm at each site The steps of the Figure 8, Text_Extractor algorithm are described as follows:

• A grouping node <g> is used to group the sets of nodes together, that shares the common properties. A great attention is taken for grouping the nodes, which helps to extract the content in an efficient manner. The grouping nodes are stacked in hierarchical fashion and accessed by XPath, which are supportive for describing queries.

• Assuming there are an arbitrary number of <g> nodes at every level (let us say g1, g2,

…., gn) and the root <svg> node is g1 with an arbitrary number of <text> nodes at every level (t1, t2, ……., tr) as shown in expression (3), then the algorithm works as stated below.

• Switch to g1 in constant time O(c1). This is constant time since switching to g1

requires one operation, namely appending a <g> tag to the XPath string.

• Iterate over all t1 nodes and extract text with time complexity O(t1), since it is a linear loop.

• Switch to g2 in constant time O(c2)

• Iterate over all t2 nodes and extract text with time complexity O(t2).

• Continue till some gn which exhausts all the tags in the file.

• Hence the overall expression for time complexity can be written as (3)

 The significant term in expression (3) can also be written as O(t), where t is the total number of <text> tags in the document.

(10)

ISSN: 2005-4238 IJAST 690

 In the Initializer step, instead of calling the method recursively through the subdirectories, the algorithm pushes the subdirectories onto a stack and iteratively pops the directories from the stack until it is empty. The program requires O(m) time to run, where m indicates the number of subdirectories.

 Within every directory, the SVG files are accessed and the Text_Extractor me- thod is called once per file. Assuming that there are m subdirectories and n files in each subdirectory, the worst-case time complexity of the text extractor algorithm is O(mnt).

4.2 Query_Processor (Phase 2)

The detailed algorithm for QUERY_PROCESSOR phase is given below.

Algorithm: Text_Query Input: A string txt Output: none

1: begin

2: query (text, text, text);

3: return 4: end

Figure 9: Text_Query algorithm at each site

Algorithm: MatchList_Query Input: Strings title, desc, text Output: A list of matches mtchlist

1: begin

2: load all working MongoDB collections into collgp;

3: for every collection coll in collgp do

4: mtchlist.append(coll.find(“title” , title));

5: mtchlist.append(coll.find(“desc” , desc));

6: mtchlist.append(coll.find(“text” , txt));

7: return mtchlist;

8: end

Figure 10: MatchList_Query algorithm at each site Algorithm: Image_Query

Input: An SVG image img Output: none

1: begin

2: load img into memory

3: title = innertext of <title> tag in img;

4: desc = innertext of <desc> tag in img;

5: txt = TextExtractor(img);

6: Query (title, desc, txt);

7: return 8: end

Figure 11: Image_Query algorithm at each site The actions of the QUERY_PROCESSOR algorithm is given below:

(11)

ISSN: 2005-4238 IJAST 691

 It accepts a text or an image query

 If it is an image query, it is converted to a text query as follows:

 The image is opened with XML parser

• The contents of the <title>, <desc> and <text> tags are extracted.

• This is converted into string arrays. Then the program sends requests to the MongoDB driver linked to the database, with the textual queries.

 The program then retrieves the relevant images and then updates the GUI.

4.2.1 Query_Processor: Time Complexity

• The Text_Query method in Figure 9, calls the Query method; hence it takes constant time to complete. Assume it takes O(c1) time.

• The Image_Query method in Figure 11, opens the image, extracts its contents and calls the Query method. The <title> and <desc> extraction takes constant time (assume O(c2)) and the Text_Extractor takes O(t) time, t being the number of <text>

tags in the image. hence in total, the method takes O(c2) + O(t) time.

• The Query method queries all available collections in GOMDB with the given data.

The algorithm loops over the collections, hence the method completes on O(X) time with X being the number of available collections.

• Hence for text queries, the algorithm takes O(c1X) time and for image queries the algorithm takes O (X (c2 + t)) time.

5. Experimental Setup

The development environment for the current work is Visual Studio 2015. The source code is written in C#. The XML package parses the SVG files. The SVG image database is created with the name GOMDB. This database organizes the storage and retrieval of graphical objects using the MongoDB system. The Methods of C# and MongoDB drivers generate the query results.

6. Implementation

This section describes the database collection and test scenario for graphical objects under MongoDB, at the control site. The sources of test data are the SVG files from wikimedia commons [20]. They are extracted using the batch downloader tool Imker [21]. In the test scenario, nearly 24,000 image files, of total size 1.42 GB, have been downloaded. Fig- ure 12, illustrates data collection for GOMDB for a small section of three documents, al- though the actual collection contains 24,000 documents. The contents of the documents have different fields. The details are given as follows:

 Object ID – Auto generated unique ID

 File name – The name of the file

 File path – The directory where the file is located

 Title – The contents of the <title> tag in the file (the document title)

 Desc – The contents of the <desc> tag in the file (the document description)

 Text – All the text in the file (extracted as feature).

(12)

ISSN: 2005-4238 IJAST 692

Figure 12: A typical GOMDB database collection at a single site 6.1 Search by Content

This section discusses the search strategy based on textual content data present inside the SVG objects. The user specifies the following input parameters from the query initiation site: repository name(s), textual content data and the number of results to be returned.

At this stage, a drop-down menu is displayed and it gives a list of SVG object descriptors from one or more sites relevant to the textual content data. The user can select the desired SVG object descriptor from the list and preview the corresponding image. Figure 13, shows the results for the matching textual content data "current".

Figure 13: Results for Content based Search

6.2 Search by Metadata

This section discusses the search strategy based on metadata of the SVG objects. The user specifies the following input parameters from the query initiation site: repository name(s), metadata and the number of results to be returned. At this stage, a drop-down menu is displayed and it gives a list of SVG object descriptors from one or more sites relevant to the metadata. The user can select the desired SVG object descriptor from the list

(13)

ISSN: 2005-4238 IJAST 693

and preview the corresponding SVG image. Figure 14, shows the results for the matching metadata "andorra".

Figure 14: Results for Metadata based Search

7. Experimental Results

The experimental analysis of the Initializer algorithm has been carried out for the GOMDB database write operations at a local site as well as a remote site. The number of SVG images processed during successive runs is varied between 371 and 5281 images corresponding to the total size of 16.4 MB and 434 MB respectively. For the remote site, the average data transfer rate considered for the test environment has been 5.36 MB/sec.

Figure 15, shows the performance graph for the Initializer Algorithm, at the local site and remote site. In both cases, the database write time increases with the total size of the images.

Figure 15: Performance Graph for Initializer Phase

The Retrieval Time for the user query depends on the number and size of the matched SVG images from the database and the data transfer rate. It is computed as follows:

Retrieval Time at the Local Site = P / Q

Retrieval Time from a Remote Site = R (1/S + 1 / T)

(14)

ISSN: 2005-4238 IJAST 694

where,

P: Total Size of all the matched SVG images from the database at the local site.

Q: Data Transfer Rate for the local Server’s Hard Drive.

R: Total Size of all the matched SVG images from the database at the remote site.

S: Data Transfer Rate for the Remote Server’s Hard Drive.

T: Data Transfer Rate for network communication between Remote Site and the Query Initiation Site.

8. Conclusions and Future Work

This paper dealt with a novel framework for storage and retrieval of vector graphical objects in a distributed environment. It includes components for user authentication, global query formulation, global query parser, distribution query execution manager and query result aggregator. It performs the core algorithm DIST-SVG-MGT to insert and extract the metadata and/or content of SVG images using XML parser. Grouping nodes are stacked in hierarchical fashion to describe the queries. The user has the flexibility to specify metadata or content or both in framing queries over scalable vector graphics objects. The data model representation of SVG classification has been implemented using MongoDB. The classification process of SVG collections perform on the range of key values of attributes in the documents, which are existing in MongoDB shards. The present work can be extended further by including additional functions such as ranking of vector images on similarity measures. The framework discussed in the present work provides both flexibility and scalability in the management of distributed vector graphical objects.

Acknowledgments

The authors would like to acknowledge BITS PILANI DUBAI CAMPUS, Dubai, UAE for providing the experimental facilities.

References

1. H. Sever, A. Şenol, and E. Elbaşı, “Block Size Analysis for Discrete Wavelet Watermarking and Em- bedding a Vector Image as a Watermark,” International Arab Journal of Information Technology., 16(6), 2019, pp. 1036-1043.

2. B.Jonathan, “Image Detection, Watermarking Vs Fingerprinting,”

https://www.plagiarismtoday.com/2009/12/02/image-detection-watermarking-vs-fingerprinting. 2009.

3. Pdengler, and Clilley, “Secret origin of SVG,”

https://www.w3.org/Graphics/SVG/WG/wiki/Secret_Origin_of_SVG. 2008.

4. T. Hayashi, and A. Sato, “Fast similarity retrieval of vector images using representative queries,” In 2013 IEEE International Symposium on Multimedia. Anaheim., 2013, pp. 498-499.

5. K. Nithya, and B.Vijayakumar, Architectural Design for Allocation, “Storage and Retrieval of Distri- buted Multimedia Data,” In Proc. of the Int. Multi conf. of Engineers and Scientists., 2015, pp. 184-188.

6. T. Geetamma and T. Prabhakar, “Evaluating SVD and DTCWT based Hybrid Video Watermarking using Chrominance Embedding,” International journal of advanced science and technology., vol. 28(15), 2019, pp. 628-640.

7. T. Hayashi, R. Onai, and K. Abe, “Vector image segmentation for content-based vector image retrieval,”

In Proc. of the Seventh IEEE Int. Conf. on Computer and Information Technology., 2007, pp. 695-700.

8. R. Fujita, and T. Hayashi, “Vector image retrieval based on approximation of bezier curves with line segments,” In Proc. of the IEEE Pacific Rim Conf. on Communications, Computers and Signal Processing., 2011, pp. 431- 436.

9. T. Rayashi, T. Kiyono, K. Abe, and R. Onai, “Retrieval of 2D vector images by matching weighted feature points,” In Proc. of the Fifteenth IEEE Int. Conf. on Image Processing., 2008, pp. 961-964.

10. P. Sousa, and M.J. Fonseca, “Sketch-based retrieval of drawings using spatial proximity,” Journal of Visual Languages & Computing., 21(2), 2010, pp. 69-80.

(15)

ISSN: 2005-4238 IJAST 695

11. F. Wang, C. Rabsch, and P. Liu, “Native web browser enabled SVG-based collaborative multimedia annotation for medical images,” In Proc. of the IEEE Twenty Fourth Int. Conf. on Data Engineering., 2008, pp. 1219-1228.

12. P. Liakos, P. Koltsida, G. Kakaletris, P. Baumann, Y. Ioannidis, and A. Delis, “A distributed infra- structure for earth-science big data retrieval,” International Journal of Cooperative Information Systems., doi: 10.1142/S0218843015500021, 24(02), 2015, p.1550002.

13. P. Martins, R. Jesus, M.J. Fonseca, and N. Correia, “Clip art retrieval combining raster and vector methods,” In Proc. of the Eleventh IEEE Int. Workshop on Content-Based Multimedia Indexing., 2013, pp. 35-40.

14. Y.S. Kang, I.H. Park, J. Rhee, and Y.H. Lee, “Mongodb-based repository design for iot-generated rf- id/sensor big data,” IEEE Sensors Journal., 16(2), 2016, pp. 485-497.

15. W. Jiang, L. Zhang, W. Qiang, H. Jin, and Y. Peng, “MyStore: A High Available Distributed Storage System for Unstructured Data,” In Proc. of the IEEE Fourteenth Int. Conf, on High Performance Compu- ting and Communication & 2012 IEEE Ninth Int. Conf. on Embedded Software and Systems., 2012, pp.

233-240.

16. A. Verma, X. Llora, S. Venkataraman, D.E. Goldberg, and R.H. Campbell, “Scaling eCGA model building via data-intensive computing,” In Proc. of the IEEE Congress on Evolutionary Computation., 2010, pp. 1-8.

17. S. Probets, J. Mong, D. Evans, and D. Brailsford, “Vector graphics from PostScript and Flash to SVG,”

In Proc. of the 2001 ACM Symposium on Document engineering., 2001, pp. 135-143.

18. J. Jiang, C.J. Yang, Y.C. Ren, and M. Jiang, “The research and application of WFS based GML,” In Proc. of the Twenty First ISPRS Congress., 2008. pp. 1769-1772.

19. E. Dede, M. Govindaraju, D. Gunter, R.S. Canon, and L. Ramakrishnan, “Performance evaluation of a mongodb and hadoop platform for scientific data analysis,” In Proc. of the Fourth ACM workshop on Scientific cloud computing., 2013, pp. 13-20.

20. Perhelion, Category: “Scalable Vector Graphics,”

https://commons.wikimedia.org/wiki/Category:Scalable_Vector_Graphics. 2005.

21. MarcFake, “ImkerV15.08.1,” https://github.com/MarcoFalke/wiki-java-tool/releases/ tag. 2015.

22. J. Yoo, K.H. Lee, and Y.H. Jeon, “Migration from RDBMS to NoSQL Using Column-Level Denormali- zation and Atomic Aggregates,” Journal of Information Science & Engineering., 2018, 34(1).

23. S. Jeschke, D. Cline and P. Wonka, “April. Estimating color and texture parameters for vector graphics,”

In Computer Graphics Forum., vol. 30(2), 2011, pp. 523-532.

24. K. Abe, H. Morita, and T. Hayashi, “Similarity retrieval of trademark images by vector graphics based on shape characteristics of components,” In: ICCAE., 2018, pp. 82–86.