• No results found

A Framework forhigh Performance Image AnalysisPipelines overcloud Resources

N/A
N/A
Protected

Academic year: 2021

Share "A Framework forhigh Performance Image AnalysisPipelines overcloud Resources"

Copied!
20
0
0

Loading.... (view fulltext now)

Full text

(1)

A Framework for High Performance Image Analysis Over Cloud Resources

A Framework for High Performance Image

Analysis Pipelines over Cloud Resources

Raúl Ramos-Pollán, Ángel Cruz-Roa, Fabio González-Osorio

Bioingenium Research Group

(2)

A Framework for High Performance Image Analysis Over Cloud Resources

Motivation

Bioingenium research group

www.bioingenium.unal.edu.co

Large scale image processing pipelines

Initially focused on medical imaging

Support for machine learning processes

Seldom availability of computing resources

Limitation on collections size, applicability,

algorithms design

(3)

A Framework for High Performance Image Analysis Over Cloud Resources

Strategy

Build on experiences with Hadoop / Grid, etc.

Profit from whatever resources available

Decouple algorithm design from deployment

Adopt Big Data principles and technologies

Streamline software development process

Unify coherent algorithms repository

(4)

A Frameworkfor High Performance Image Analysis Over Cloud Resources

Image Processing Pipelines

2 Image Processing Framework

PATCH EXTRACTION INPUT IMAGES FEATURES EXTRACTION FEATURES CLUSTERING LATENT SEMANTICS AUTOMATIC ANNOTATION

(5)

A Frameworkfor High Performance Image Analysis Over Cloud Resources

Image Processing Framework

2 Image Processing Framework

NOSQL DATABASE CLIENT at EXPERIMENTER’S DESKTOP SCHEDULE GENERATOR EXPERIMENTER SCHEDULE PIPELINE DEFINITION STAGE 1 STAGE 2 STAGE 3 STAGE 4 INPUT OUTPUT DATA WORKERS (AMAZON, DESKTOPs) WORKER ADDED LATER WORKER WORKER

(6)

A Framework for High Performance Image Analysis Over Cloud Resources

Pipeline definition example

2 Image Processing Framework

####################################### # FIRST STAGE: Patch Sampling

####################################### stage.01.task: ROIsFeatureExtractionTask stage.01.numberOfPartitions: 10 stage.01.roiAlgorithm: RandomPatchExtractor stage.01.feAlgorithm: GrayHistogram stage.01.RandomPatchExtractor.blockSize: 18 stage.01.RandomPatchExtractor.Size: 256 stage.01.input.table: corel5k.imgs stage.01.output.table: corel5k.sample ########################################### # SECOND STAGE: CodeBook Construction

########################################### stage.02.task: KMeans stage.02.KMeans.kValue: 10 stage.02.KMeans.maxNumberOfIterations: 5 stage.02.KMeans.minChangePercentage: 0.1 stage.02.KMeans.numberOfPartitions: 10 stage.02.input.table: corel5k.sample stage.02.output.table: corel5k.codebook #################################### # THIRD STAGE: Bag of Features Histograms #################################### stage.03.task: BagOfFeaturesExtractionTask stage.03.numberOfPartitions: 10 stage.03.maxImageSize: 256 stage.03.minImageSize: 20 stage.03.indexCodeBook: true stage.03.featureExtractor: GrayHistogram stage.03.patchExtractor: RegularGridPatchExtractor stage.03.spatialLayoutExtractor: IdentityROIExtractor stage.03.codeBookID: STAGE-OUTPUT 2 stage.03.RegularGridPatchExtractor.blockSize: 18 stage.03.RegularGridPatchExtractor.stepSize: 9 stage.03.input.table: corel5k.imgs stage.03.output.table: corel5k.histograms SCHEDULE OF JOBS

(7)

A Frameworkfor High Performance Image Analysis Over Cloud Resources

Framework architecture

2 Image Processing Framework

ITERATIVE DATA PARTITION

CORE

schedule generator data management jobs wrapper W O R K E R CMD LINE TOOL STORAGE API DYNAMO DB IMPL KMEANS BAG of FEATURES JWS LAUNCHER CMD LINE LAUNCHER TASKS API HBASE IMPL FILE SYSTEM IMPL AMZON VM LAUNCHER EXPERIMENTER UNDERLYING NoSQL STORAGE TASK PATTERN API SCHEDULES INPUT/OUTPUT DATASETS WEB CONSOLE WORKER API CLIENT API

(8)

A Framework for High Performance Image Analysis Over Cloud Resources

Algorithms patterns

2 Image Processing Framework

BEFORE PROCESSING ALL PARTITIONS AFTER PROCESSING ALL PARTITIONS S A M E P R O C E S S START PARTITION FINALIZE PARTITION PROCESS DATA ITEM S A M E P R O C E S S START PARTITION FINALIZE PARTITION PROCESS DATA ITEM

PARTITION 1 PARTITION 2 S A M E P R O C E S S START PARTITION FINALIZE PARTITION PROCESS DATA ITEM PARTITION N INPUT DATA PARTITION OUTPUT DATA PARTITION PROCESS DATA ITEM BEFORE PROCESSING ALL PARTITIONS AFTER PROCESSING ALL PARTITIONS P A R T IT IO N 1 P A R T IT IO N 2 P A R T IT IO N N FINALIZE ITERATION START ITERATION

DATA PARTITION

ITERATIVE ALGORITHMS

ONLINE ALGORITHMS

(9)

A Framework for High Performance Image Analysis Over Cloud Resources

Goals for Experiments on the Cloud

After using our framework in an opportunistic

setting

Understand how to deploy our framework on

Amazon Cloud Services

Validate transparency for the experimenter

Understand cost patterns

(10)

A Frameworkfor High Performance Image Analysis Over Cloud Resources

Cloud deployment

3 Experimentation EXPERIMENTER Dynamo DB S3 LOCAL FRAMEWORK PIPELINES SCHEDULES DATASETS WORKER WORKER WORKER WEB BROWSER AMAZON API AMAZON LOCAL NETWORK WORKER WORKER MANAGE WORKERS (LAUNCH, SHUTDOWN) AMAZON CLOUD

(11)

A Framework for High Performance Image Analysis Over Cloud Resources

Cloud usage

3 Experimentation

# prepare files with AWS credentials and desired pipeline

> load.imgs images /home/me/clef/images

306,530 files loaded into table images

> pipeline.load bof.pipeline

pipeline successfully loaded. pipeline number is 4

> pipeline.prepare 4

pipeline generated 56 jobs

> aws.launchvms ami-30495 15

launching 15 instances of ami-30495

> pipeline.info 4

…..

> aws.termvms ami-30495 5

(12)

A Frameworkfor High Performance Image Analysis Over Cloud Resources

Experiments setup

3 Experimentation

simple experiments before committing budget for Amazon

500KB images

Amazon EC2 t1.micro instances (613MB 1 virtual core)

fe-simple simmulated image feature extraction 0-2 secs

fe-complex simmulated image feature extraction 5-8 secs

(13)

A Frameworkfor High Performance Image Analysis Over Cloud Resources 3 Experimentation

FE SIMPLE

FE COMPLEX

Experiments result 100 imgs

(14)

A Frameworkfor High Performance Image Analysis Over Cloud Resources 3 Experimentation

FE SIMPLE

FE COMPLEX

Experiments results 1000 imgs

(15)

A Frameworkfor High Performance Image Analysis Over Cloud Resources

1000 imgs

3 Experimentation

100 imgs

(16)

A Frameworkfor High Performance Image Analysis Over Cloud Resources 3 Experimentation

Additional experiments

Images from ImageCLEF2012 medical challenge.

http://www.imageclef.org/2012/medical

306,530 medical images

from journal papers in biomedical sciences

x-rays, MRI, PET, diagrams, ultrasound, microscopy, gammagraphy, …

mostly in JPEG format at medium resolution (avg width 512 pixels).

(17)

A Frameworkfor High Performance Image Analysis Over Cloud Resources 3 Experimentation

Additional experiments

CEDD EXTRACTION OF 3K ImageCLEF Medical

3 STAGE PIPELINE ON 30K

ImageCLEF Medical

1 Patch extraction

2 Kmeans – codebook

(18)

A Framework for High Performance Image Analysis Over Cloud Resources 3 Experimentation

Additional experiments

Submission to ImageCLEF2012 medical

All 300K images processed and annotated

36 workers over 6 scattered servers

One run (parameters configuration)

116 mins elapsed time

40 hours compute time

Ad-hoc image retrieval task:

1st out of 53 in textual classification

3rd out of 36 in visual classification

(19)

A Framework for High Performance Image Analysis Over Cloud Resources

Conclusions

Tool for supporting the full image processing lifecycle

First scalability behavior on Amazon Cloud

Decoupled algorithm design from deployment

Adapted to our reality

grasp whatever computer

resources available

Streamlined internal software process

Unified software repository

AGILE EXPERIMENTATION LIFECYCLE

LITTLE A PRIORI KNOWLEDGE ON COMPUTING

(20)

A Framework for High Performance Image Analysis Over Cloud Resources

Future work

Extend support to additional machine learning processes

Larger scalability experiments (# workers)

Optimize Amazon usage (DynamoDB)

Better understand Amazon costs patterns

Experience with NoSQL scalability (HBase)

Better understand limitations of each deployment model

(opportunistic, cluster, cloud)

Move to >1M image datasets

References

Related documents

Solar energy: Thermal solar systems: Test methods (2013) • Test methods for a.o Energy performance of Solar Collectors • Worldwide. • Liquid

Methods: The Wrap-ups were timely (within 48 hours of a death), consistent (conducted after each pediatric intensive care unit (PICU) death), multidisciplinary (all care providers

Within the last years, new global climate change simulations have been produced and published in the IPCC AR4 process. They have also been used for regional

Assisting agencies making initial response on fires outside their jurisdiction will ensure, through VPSCC and/or PCREDC, that the jurisdictional agency is promptly notified

Test Connections & Test Modes Three-Stack Arrestor Surge Arrestors H-V Test Cable L-V Lead GND Lead I&W Meter Test Set Step. Up Transformer A B Guard

The study revealed that basic science students taught using activity-based teaching strategy performed significantly higher than their counterparts who were only taught

W:1000mm x D:470mm without shelves SO-DST40G SO-DST65K SO-DST78WH 100% Extension 35 KG Suspension Filing Colours Available. Anti-Tilt Locking Drawer Capacity

Effective January 1, 2021, Physical Medicine services (Physical, Occupational, and Speech Therapy) will require Prior Authorization from NIA for all services provided to all