• No results found

Auto Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low Resource Machine Translation

N/A
N/A
Protected

Academic year: 2020

Share "Auto Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low Resource Machine Translation"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

Loading

Figure

Figure 1: Illustration of Algorithm 1. The shaded area,here with value ηλ = 2, represents how much the ℓ∞proximal step will remove from a sorted vector.
Figure 2: Architecture of the Transformer (Vaswaniet al., 2017). We apply the auto-sizing method to thefeed-forward (blue rectangles) and multi-head attention(orange rectangles) in all n layers of the encoder anddecoder
Table 1: Number of parallel sentences in training bi-texts. The French-English and Arabic-English data isfrom the 2017 IWSLT campaign (Mauro et al., 2012).The much smaller Hausa-English and Tigrinya-Englishdata is from the LORELEI project.
Table 2: Comparison of BLEU scores, model size, and training time on Tigrinya-English, Hausa-English, andFrench-English
+3

References

Related documents

This paper has presented an experimental study on the typical phase distribution, quantified through the volumetric void fraction, in vertical two-phase bubbly

This restrictive definition is likely to strongly increase the default rate and consequently, the provision for credit loss (cost of risk) of the bank at constant perimeter in

You might see the following problem on systems running Windows NT and Windows 2000 when you are using the Start Before Logon feature of the VPN Client with third-party dialer.. If

The amplitude of the sea- sonal temperature cycle in the Holocene is determined by seasonal insolation distribution in the low and mid latitudes and by sea-ice cover and the

We hope you’ll join us on Thursday, March 25th for our next virtual education meeting that will review “Corrosion Resistant Acid Waste Piping, Drainage & Venting”, presented

The Board’s IT division will play a critical role in designing the overall data environment, including providing the supporting IT infrastructure in coordi- nation with System

This risk aversion can, in the absence of greater participation incentives, lead to a shortfall (solid line) in the aggregate supply of peacekeepers ( S(P) ) below optimal

The discovery of the Big Seven factor model of natural language personality description (Tellegen, 1993; Tellegen & Waller, 1987; Waller, in press; Waller & Zavala,