基于Android的农管家交流系统设计外文翻译资料

2022-09-07 14:49:22

J Supercomput (2014) 70:649–659

DOI 10.1007/s11227-014-1119-8

Android TM development and performance analysis

Alejandro Acosta · Francisco Almeida

Dept. Estadiacute;stica, I.O. y Computacioacute;n, ETS Ingenieriacute;a Informaacute;tica,

La Laguna University, Santa Cruz de Tenerife, Spain

e-mail: aacostad@ull.edu.es

Renderscript· Android· SoC· Performance

Abstract The advent of emergent systems on chip and MPSocs opens a new era in

the small mobile devices (Smartphones, Tablets,hellip;) in terms of computing capabilities

and applications to be addressed. Currently, these devices have multicore processors

and GPUs which provide high computational power. The efﬁcient use of such devices,

including the parallel power, is still a challenge for general-purpose programmers. In

the last years Android has become the dominant platform in the small mobile devices.

In addition, it has a large community of developers. For application development,

Android provides two development kits, the Software Development Kit and Native

Development Kit. To exploit the high computational capabilities on current devices,

Android provides Renderscript, an API that allows the execution of parallel applica-

tions and it is designed to be used in applications that require high computing power.

The development model used involves an important impact in the performance of the

applications. In this paper, we address the evaluation of the performance on Android

platforms. A set of benchmark applications has been implemented to evaluate the

performance of the different development models. Sequential and parallel versions of

the different development kits are considered in the computational experience. This

benchmark and the computational experience achieved are greatly helpful to the pro-

grammer for understanding sources of overhead and bottlenecks in the developed

code.

Keywords

Published online: 21 February 2014

copy; Springer Science Business Media New York 2014

F. Almeida

e-mail:falmeida@ull.edu.es

A. Acosta ( )· F. Almeida

650

A. Acosta, F. Almeida

1 Introduction

Systems on chip (SoCs [1]) have been the enabling technology behind the evolution of

many of todays ubiquitous technologies, such as Internet, mobile wireless technology,

and high-deﬁnition television. The information technology age, in turn, has fuelled

a global communications revolution. With the rise of communications with mobile

devices, more computing power has been put in such systems. The technologies avail-

able in desktop computers are now implemented in embedded and mobile devices. We

ﬁnd new processors with multicore architectures and GPUs developed for this market

like the Nvidia Tegra [2] with two and ﬁve ARM cores and a low-power GPU, and the

OMAPTM 5 [3] platform from Texas Instruments that also goes in the same direction.

On the other hand, software frameworks have been developed to support the build-

ing of software for such devices. The main actors in this software market have their

own platforms: Android [4] from Google, iOS [5] from Apple and Windows phone [6]

from Microsoft are contenders in the smartphone market. Other companies like Sam-

sung [7] and Nokia [8] have been developing proprietary frameworks for low-proﬁle

devices. Coding applications for such devices is now easier, but creating efﬁcient and

maintainable programs to run on them [9] is still an unsolved problem.

Conceptually, the architectural model can be viewed as a traditional heterogeneous

CPU/GPU with a uniﬁed memory architecture, where memory is shared between

the CPU and GPU and acts as a high-bandwidth communication channel. In the non-

uniﬁed memory architectures, it was common to have only a subset of the actual mem-

ory addressable by the GPU. Technologies like algorithmic memory [10], GPUDirect

and uniﬁed virtual addressing (UVA) from Nvidia [11] and HSA from AMD [12]

are going in the direction of an uniﬁed memory system for CPUs and GPUs in the

traditional memory architectures. Memory performance continues to be outpaced by

the ever increasing demands of faster processors, multiprocessor cores and parallel

architectures.

Android is an open source platform with a very high level of market penetration and

it has a large community of developers [13]. Usually the applications are developed in

Java, using the development tools proposed by the platform. Although the actual Java

compiler provides a performance similar to the compiler for native languages like C

or C , Android includes development tools to implement code sections of Android

applications in native languages. Android also provides the Renderscript language to

achieve high-performance computing in the devices.

The native C code is executed in Android using the Java Native Interface (JNI)

provided by Java. Several studies have been conducted on the performance of applica-

tionsusingJNI[14–16],and over the increased yield obtained when using Renderscript

[17–19], but all these studies do not consider the different optimization parameters

available on the programming models.

In this paper, we present a comparative performance analysis between the different

programming models in Android. We have implemented a set of testing problems with

different inherent features. The main contribution of the paper is that the experimental

analysis provides an overview on the behav

剩余内容已隐藏，支付完成后下载完整资料

J Supercomput (2014) 70:649–659

DOI 10.1007/s11227-014-1119-8

Android 开发和性能分析

Alejandro Acosta · Francisco Almeida

Dept. Estadiacute;stica, I.O. y Computacioacute;n, ETS Ingenieriacute;a Informaacute;tica,

La Laguna University, Santa Cruz de Tenerife, Spain

e-mail: aacostad@ull.edu.es

Renderscript· Android·系统芯片性能

摘要： 意外的芯片系统的出现和多核芯片的出现使得在小型移动设备（智能手机、平板电脑、...）的计算能力和解决应用程序方面打开了一个新的时代。目前,这些设备有多核处理器和gpu提供高计算能力。这类设备的高效利用，包括并行处理能力，对大部分程序员仍然是一个挑战。在最近几年安卓已经成为占主导地位的平台在移动设备。此外,它有一个巨大的开发者社区。对于应用程序开发,Android提供了两种开发工具,软件开发工具包和本机开发工具包。利用当前设备上的高计算能力,Android提供了一个API “Renderscrip”,它允许并行应用的执行规划设计,是设计用于在应用程序需要很高的计算能力。使用的开发模型涉及的性能的重要影响应用程序。在本文中,我们解决Android上的性能的评估平台。一组基准应用程序实施评估的性能不同的发展模式。顺序和并行版本的不同的开发工具被认为是计算体验。这个基准测试和计算经验取得了极大地帮助程序员理解开销的来源和瓶颈在发达的代码。

关键词

发表:2014年2月21日

copy;Springer科学商业媒体纽约2014

F. Almeida

e-mail:falmeida@ull.edu.es

A. Acosta ( )· F. Almeida

650

A. Acosta, F. Almeida

1 介绍

系统芯片(soc[1])的演变背后的使能技术今天无处不在的许多技术,如互联网,移动无线技术,和高清电视。信息技术时代,反过来刺激了全球通信革命。随着移动通信设备的崛起,更多的计算能力已经放在这样的系统。现在的技术可以在台式电脑中实现嵌入式和移动设备。我们找到新的处理器和多核架构gpu开发这个市场。像Nvidia Tegra[2]和两个五ARM内核和低功耗的GPU,和OMAPTM 5[3]平台从德州仪器方向相同。

另一方面,软件框架开发了支持软件等设备的建设。主要演员在这个软件市场有自己的平台:iOS，谷歌的Android[4]，[5]来自苹果和微软的Windows phone[6]是一支在智能手机市场上的竞争者。其他公司如诺基亚三星[7]和[8]发展自主框架为低调的设备。编码申请这些设备现在是容易,但创建高效运行和维护程序[9]仍是一个尚未解决的问题。

从概念上讲,建筑模型可以被看作是一个传统的异构CPU / GPU统一内存架构,内存是共享CPU和GPU之间充当一个高带宽通信通道。在不统一的内存体系结构,它是常见的实际只有一个子集GPU的可寻址内存。技术和算法内存[10],GPU管理和统一的虚拟寻址(UVA)从Nvidia[11]和HSA AMD[12]是一个统一的方向的cpu和gpu内存系统吗传统的内存架构。内存性能继续被超越更快的处理器的需求不断增加,多处理器核心和并行架构。

安卓是一个开源的平台和一个非常高水平的市场渗透。它有一个大型社区的开发人员[13]。通常应用程序开发的Java,使用提出的开发工具平台。尽管实际的Java编译器提供了一个性能类似于本机语言如C编译器c ,Android包括Android的开发工具实现的代码部分应用程序的本地语言Android还提供了渲染脚本语言实现高性能计算设备。

执行本机C代码在Android使用Java native Inter fac e (JNI)提供的Java。几项研究已经进行使用JNI(14 - 16)的应用程序的性能,当使用Renderscript获得的产量增加(17 - 19),但所有这些研究不考虑不同的优化参数上可用的编程模型。

在本文中,我们提出一个比较性能分析之间的不同在Android编程模型。我们实现了一组测试问题不同的固有特性。论文的主要贡献是实验性的分析概述了编程模型的行为,因此这个问题经验可以使用当最终解决类似的问题。未来的开发人员可以参考这些结果显示选择要使用的编程模型要实现的问题来获得最佳的性能。几个实现生成的每一个问题,这些都是测试一个华硕变压器主要TF201设备。

AndroidTM

development

651

本文结构如下:在部分2我们介绍发展模式在Android和不同的选择利用设备,有些困难相关的发展模式。在部分3,我们比较不同的编程模型的性能在Android使用不同的优化参数。的一组被认为是基于渲染脚本的测试问题图片处理基准(改变图像灰度、卷积3times;3和5times;5和水平),另一个是一般的卷积实现由自己。四个不同版本(对应于不同的编程模型在Android)已经实现。一个Java版本,本机C并和两个Renderscript实现(顺序和并行)。结果显示优化参数如何影响不同的编程模型的性能。

2 Android的开发模型

Android是一个基于linux的操作系统主要是为移动设备设计等手机和平板电脑,尽管它也用于嵌入式设备随着智能电视和媒体飘带。这是作为一个软件栈,包括一个设计操作系统、中间件和关键应用程序。

Android应用程序是用Java编写的,安卓软件开发工具包(SDK)提供了所需的API库和开发工具构建,测试和调试应用程序的软件开发工具包(SDK)。中央部分图1显示了一个Java的编译和执行模型Android应用程序。将Java编译模型。Dalvik-compatible java文件。敏捷(Dalvik)可执行文件。应用程序运行在Dalvik虚拟机(Dalvik VM)管理系统资源分配给此应用程序(通过Linux内核)。

除了开发Java应用程序,Android提供了包的开发工具和库开发本地应用程序,本机开发工具包(NDK)[20]。NDK使实现的部分应用程序中运行Dalvik VM使用本机代码语言如C和c 。这执行本机代码使用Java native Interface(JNI)提供的Java。部分的权利

图1显示了应用程序的编译和执行模型的一部分代码已经编写使用NDK。本机。使用GNU c代码编译编译器(GCC)。编译器默认使用ARM架构;在这种情况下代码优化的基于arm的cpu支持ARMv5TE指令集[21]。大多数设备支持ARMv7-a指令集[21]。v7版本扩展了ARMv5指令集,包括支持Thumb-2指令集[22]了在VFP硬件和FPU指令[22]。根据[20],使用本机代码不会导致一个自动的性能提升,但总是提高应用程序吗复杂性,其使用在不分配的cpu密集型操作建议多少内存,如信号处理、物理模拟、等等。本机代码将现有的本地代码移植到Android很有用,不是为了加快零件的Android应用程序。一些设备支持OpenCL gpu执行。的OpenCL的上下文中实现本机代码开发工具包(NDK)。

利用现有设备的高计算能力,Android提供了Renderscript[23],它是一个高性能计算在本地级别API和一个C语言编程(C99标准)。Renderscript允许并行应用程序的执

652

A. Acosta, F. Almeida

行等几种类型的处理器CPU、GPU或DSP,执行自动的工作负载分布可用的处理核心设备。左边的部分的图1显示了编译渲染脚本使用的执行模型。Renderscript(。rs文件)代码编译使用LLVM编译器基于叮当声[24];此外,它生成一组Java类包装渲染脚本代码。再次,根据[23]的使用渲染脚本代码不会导致一个自动的性能提升。是很有用的对于做图像处理的应用程序,数学建模,或任何操作这需要大量的数学计算。

Fig. 1 在Android应用程序的编译和执行模型

3 数值结果

比较不同的编程模型的性能在Android,我们考虑五个不同的应用程序,这些应用程序是基于四个Renderscript图片处理基准[25](将图像转换为灰度,水平卷3times;3和5times;5),另一个是一个额外的卷积实现由自己开发。在这种情况下,我们卷的大小不同内核在3times;3,5times;5、7times;7和9times;9。这些问题的代表使用Android时常见的解决各种各样的问题。大多数的设备周围支持Android是为了获取信息,相机就是其中之一最重要的组成部分。在这个背景下是非常重要的。由于这个原因,在这个背景下图像处理算法是非常重要的。可以使用测试的问题,我们在这里使用增强现实等更复杂的应用程序的基础。另一个重要特性的研究使用的情况下,不同粒度的每个问题,这让我们分析每个编程模型与不同层次的行为的粒度。在所有情况下,我们实现了四个版本的代码,Java版本,本机C实现,两个Renderscript实现(顺序和并行)。我们执行这些代码在SoC设备上运行Android,华硕变压器 TF201(华硕TF201)。这个设备由NVIDIA Tegra 3四核手臂V7

AndroidTM

development

653

Table 1 执行时间使用缺省配置实现的问题

Java

Native C

Renderscript

Sequential

Parallel

Grayscale

Levels

286

254

999

144

338

100

133

650

Convolve

3times; 3

1,975

4,775

3,503

526

195

405

5times; 5

12,287

1,365

General convolve

3times; 3

2,337

5,779

4,492

12,096

23,195

37,248

505

936

195

323

473

686

5times; 5

7times; 7

9times; 9

10,768

17,497

1,542

2,350

cortex - a9处理器(1400 MHz,1.5 GHz的单核心模式),1 GB的RAM内存和GPU NVIDIA ULP GeForce。安卓版本是4.1使用NDK r9机型。在所有情况下,将被用作Java版本参考计算加速。所有的问题我们使用的图像的大小1600times;1067。

表1显示了执行时间以毫秒为单位提出的所有问题。本机实现编译为默认ARM架构;在这种情况下,代码优化的基于ARM的cpu支持ARMv5TE指令集。默认使用的渲染脚本执行浮点精度,在应用程序遵循IEEE 754 - 2008标准定义的规范[26]。每个问题的Java实现概述的粒度。我们可以看到灰度问题最好的粒度。卷积问题,粒度增加时缠绕内核大小较高。正如所料,渲染脚本实现的所有问题得到最好的结果。

表2中我们将展示使用的Java堆内存Dalvik虚拟机为每一个实现(Dalvik VM)。基列显示的内存使用当应用程序是开放和图像加载算法实现不是在执行。在所有情况下,基本内存使用是一样的。执行列收集基本的内存和内存使用的执行算法实现的。在这种情况下,我们得到两个不同的内存使用。Java使用更多的内存和本地版本由于Java的变换对象表示像素的图像到一个数组中。这些转换得到最佳的性能在Java实现。Renderscript版本不变换图像的Java对象,不需要额外的内存。注意,所有渲染脚本版本的内存使用Renderscript上下文不代表Java堆内存。这个内存必须添加到表中所示的值。

表3显示了CPU活动执行每个问题时。如果应用程序是开放但算法实现不执行,所有核心的CPU活动处于低能量状态。在这种状态下,CPU频率降低和一些核心离线节约能源。算法执行时,正如所料,cpu的活动取决于执行的类型。顺序执行的只有一个核心的核心是活跃的和剩下的低能量状态。在并行执行核心是活动的。

654

A. Acosta

剩余内容已隐藏，支付完成后下载完整资料

资料编号：[146613]，资料为PDF文档或Word文档，PDF文档可免费转换为Word

您需要先支付 30元 才能查看全部内容！立即支付

注册

找回密码