© 2012 IBM Corporation
IBM
®
AbilityLab
™
Media Captioner and Editor
©
Ali Sobhi – Human Ability & Accessibility Center – IBM Research
[email protected]
© 2012 IBM Corporation
Feb 29, 2012 IBM Media Captioner & Editor 2 /
IBM AbilityLab Media Captioner & Editor
(MCE)
Speaker: Ali Sobhi ([email protected])
Authors: Ali Sobhi, Reiko Nagatsuma
Session ID:
DHH-014
Wednesday February 29, 2012
10:40 AM – 11:40 AM
IBM Media Library Metrics
Total media count = 78,546
Media w/o transcript = 93%
A great portion of all
media content
on Media Library does
not have
captions or transcriptions
23.27%
4.70%
69.79%
2.25%
IBM Media Library
Media & T ranscript s - June 2010
M P 3
M P 3+tra n scrip t Video
© 2012 IBM Corporation
Feb 29, 2012 IBM Media Captioner & Editor 4 /
Matrix of Media and Technology Components
Media Source
Digital
Recording
Media
Transformation
Transcription
Edit Translation Edit Media Generation /
Synchronization
Broadcast - TV
• Live
• Recorded
Broadcast –
Radio
Web
• Webcast
• Podcast
• Audio
• Video
Lecture
Presentation
• Speaker notes
• Charts
VoIP
• Teleconference
• One-to-one
Internet
• IM
• A / V Chat
Media Captioner & Editor Technology Effort
Corporate requirement to caption videos published on
www.ibm.com
and w3
Provide Captioning, Transcription and Editing
Capability
for
Ever Increasing Multimedia Content
On
IBM w3 Intranet
A Collaborative Effort between IBM Research
Groups
Human Ability & Accessibility Center
China Research Lab
Tokyo Research Lab
Using state-of-the-art
© 2012 IBM Corporation
Feb 29, 2012 IBM Media Captioner & Editor 6 /
Internal Announcement of MCE
MCE Users
HA & AC Deploys Video Captioning and Editing System (Nov 2008)
•
Embraced by W3 team to caption executive videos
•
Alternative to current expensive commercial captioning service
•
Result is a more cost effective and faster solution
What W3 Team is Saying
•
“Love this tool ... was able to transcribe/caption a 4minute video in
under an hour. This could really change how the w3 team does a lot of
things …”
•
“I'm really, really happy with the results. Really happy. Thanks for a
great tool.”
•
“ I'm getting more and more people up and running on DigiCapE ...”
The Project
•
World-wide collaboration between HA & AC, CRL, TRL and WRL
•
Speaker independent Automatic Speech Recognition WRL
•
Caption Editing Client (CES) – TRL
•
Closed-Caption Media (BCC Platform) – CRL
© 2012 IBM Corporation
Feb 29, 2012 IBM Media Captioner & Editor 8 /
IBM AbilityLab Media Captioner & Editor
Complete solution to caption, edit and create Web-ready digital media
Helping people who are deaf or hard of hearing
Utilizing state-of-the-art advanced, speaker-independent IBM Automatic Speech Recognition
Web-based GUI for complete job
management
Web-ready media with closed-captioning enabled
Intuitive and easy-to-use caption editor with
synchronized audio and editing
© 2012 IBM Corporation
Feb 29, 2012 IBM Media Captioner & Editor 10 /
MCE Jobs and Captioned Media Stats – July 2010
20 08 -11 20 09 -01 20 09 -03 20 09 -05 20 09 -07 20 09 -09 20 09 -11 20 10 -01 20 10 -03 20 10 -05 20 10 -07 0 50 100 150 200 250 300 350 400 0 10 20 30 40 50 60 70 80 90DigiCapE Jobs & Captioned Media Hours
T o-Dat e: July 2010
Jobs Media (hrs)Months
J
o
b
s
© 2012 IBM Corporation
Feb 29, 2012 IBM Media Captioner & Editor 12 /
MCE Jobs and Captioned Media Stats – July 2011
2008- 11
2009- 01
2009- 03
2009- 05
2009- 07
2009- 09
2009- 11
2010- 01
2010- 03
2010- 05
2010- 07
2010- 09
2010- 11
2011- 01
2011- 03
2011- 05
2011- 07
0
50
100
150
200
250
300
350
400
0
20
40
60
80
100
120
DigiCapE Monthy Jobs & Media (Hours)
T o-Dat e: July 2011
Jobs
Media
Months
Jo
b
s
M
e
d
ia
(h
rs
)
Total Number of Jobs = 3680
Total Media Hours = 1005
System Overview and Architecture
DigiCapE
Server
DigiCapE
Server
CES Components:
• Controlling GUI
• Editing – Master / Client
CES Components:
• Controlling GUI
• Editing – Master / Client
BCC Components:
• Open-Caption /
Closed-Caption
• Media generation
• Sync
BCC Components:
• Open-Caption /
Closed-Caption
• Media generation
• Sync
HTML SMIL …. HTML SMIL ….ST
HTML SMIL …. HTML SMIL ….ST
ST
Client
Server
Services
Client
Server
Services
Browser
Browser
RCP
RCP
VB GUI
VB GUI
Transcription Text Transcription TextYesterday
Today
Tomorrow ???
Objectives:
• Video captioning based
on speech recognition
with editing capability
Status:
• Internal Use
• Available on SCIC for
pilots – 2012
Objectives:
• Video captioning based
on speech recognition
with editing capability
Status:
• Internal Use
• Available on SCIC for
pilots – 2012
ASR Components:
• Web Service – SOA
• ASR
• Timing and Files
ASR Components:
• Web Service – SOA
• ASR
© 2012 IBM Corporation
Feb 29, 2012 IBM Media Captioner & Editor 14 /
Usage Scenario
Who will use this?
• PwD – deaf / hard of hearing
• Captioning of media archives
• Search engines
• Data mining
•
Examples:
IBM
NCSU
NASA
National Archives
Broadcasting companies
How to use this?
• Login to the Web
• Upload appropriate media format to server
• Notification e-mailed - “Caption Ready”
• If still logged in, status gets updated on Web GUI
• Edit the captions, as needed, synchronized with audio
track
• Submit for final merge – Open caption / Closed caption
• Show-off your handiwork and get bonuses
Possible Business Model
HTML SMIL …. HTML SMIL ….ST
HTML SMIL …. HTML SMIL ….ST
ST
Client
Server
Services
Client
Server
Services
Transcription Text Transcription TextClient:
• Direct sale
• Updates
• Upgrades
• Training
SOA Services:
• Services-based
• Registration fee
• Usage fee
• Can be upgraded
w/o any service
interruption
Enterprise Sale
Complete sale and
service inside
Client & Web
Services Model
Enterprise
Solution Model
© 2012 IBM Corporation
Feb 29, 2012 IBM Media Captioner & Editor 16 /
© 2012 IBM Corporation
Feb 29, 2012 IBM Media Captioner & Editor 18 /
© 2012 IBM Corporation
Feb 29, 2012 IBM Media Captioner & Editor 20 /