什么是Fake data
Fake data顧名思義假數(shù)據(jù),是在真實(shí)產(chǎn)品數(shù)據(jù)無法使用的情況下,產(chǎn)生地接近于產(chǎn)品環(huán)境的數(shù)據(jù),多用于開發(fā)和測試。
Fake data的使用場景
有哪些開發(fā)或測試場景會使用fake data?
- 當(dāng)你需要開發(fā)一個UI原型,但是API還沒開發(fā)完成繼而無法獲取相關(guān)數(shù)據(jù)來顯示到前端,這個時候,就可以使用mock data來模擬API,從而不阻礙UI的開發(fā)工作且使UI和API的開發(fā)并行,也有可能提早發(fā)現(xiàn)一些問題
- 當(dāng)需要產(chǎn)生大量的數(shù)據(jù)填充數(shù)據(jù)庫的時候,可以使用自動化填充接近于產(chǎn)品數(shù)據(jù)的fake data到數(shù)據(jù)庫來滿足開發(fā)測試需求
- 當(dāng)需要大量類產(chǎn)品環(huán)境數(shù)據(jù)進(jìn)行壓力測試的時候
- 單元測試需要產(chǎn)生dummy data的時候
Fake data的原則
除了刻意設(shè)計的破壞性的test data,我們需要的test data應(yīng)該是接近于產(chǎn)品環(huán)境和現(xiàn)實(shí)生活的,而不是固定的搭配。接近于產(chǎn)品數(shù)據(jù)的fake data能夠更好地揭露產(chǎn)品環(huán)境潛在的問題,讓產(chǎn)品看起來具有真實(shí)的使用價值和意義。
Fake data的實(shí)現(xiàn)方式
在我目前工作的項(xiàng)目中,需要填寫各種各樣的表單,這些表單收集中不同的用戶數(shù)據(jù),如果每一次我都填不同的數(shù)據(jù)來測試,是不是會更接近于產(chǎn)品的真實(shí)使用情況,說不定還能挖掘出一些潛在問題。產(chǎn)品環(huán)境的數(shù)據(jù)由于安全隱私不能觸碰,那么如何產(chǎn)生大量的假數(shù)據(jù)呢?主要有下面兩種
- 在線服務(wù)。比如mock、fakename、randomuser和randomapi
- 各類編程語言的庫
這篇文章會介紹Python的四個用于產(chǎn)生fake data的module
- lipsum - is a simple Lorem Ipsum generator library which can be used in your Python applications
- radar - Random date generation
- mimesis - is a fast and easy to use library for Python programming language, which helps generate mock data for a variety of purposes in a variety of languages
- Faker - is a Python package that generates fake data for you
安裝Python3
在開始之前,先升級下python吧,官方都說了
Python 2.x is legacy, Python 3.x is the present and future of the language
況且很多流行庫比如numpy都會不在繼續(xù)維護(hù)python2,繼而開始在python3上開發(fā)維護(hù)。那還有什么理由堅(jiān)持python2呢?
想看python2還有多久退休,請參考這里。
我現(xiàn)在的Python開發(fā)環(huán)境還是macOS自帶python 2.7.10,所以需要通過Homebrew去安裝python3.具體的教程可以參考這里還有這里。
┌─[diyu@CNdiyu] - [~] - [Wed Jan 10, 16:14]
└─[$] <> python3 --version
Python 3.6.4
大功告成!
Lorem Ipsum 亂數(shù)假文
lipsum是一個隨機(jī)文本語句和片段生成器。生成的文本有意義的lorem ipsum文本。
代碼非常簡單
import lipsum
print("generate 10 words")
print(lipsum.generate_words(10))
print("*" * 50)
print("generate 3 sentences")
for x in lipsum.generate_sentences(3).split('.'):
print(x.strip())
print("*" * 50)
print("generate 3 paras")
for x in lipsum.generate_paragraphs(3).split('\n'):
print(x)
輸出為
generate 10 words
Quae cum dixissem, magis ut illum provocarem quam ut ips!
**************************************************
generate 3 sentences
Hunc vos beatum; ratio quidemvestra sic cogit
At ego quem huic anteponam non audeo dicere;dicet pro me ipsa virtus necdubitabit isti vestro beato M
Regulumanteponere, quem quidem, cum sua voluntate, nulla vi coactuspraeter fidem, quamdederat hosti, ex patria Karthaginemrevertisset, tum ipsum, cum vigiliis et fame cruciaretur, clamatvirtus beatioremfuisse quam potantem in rosa Thorium
**************************************************
generate 3 paras
Atque haec quidem de rerum nominibus. de ipsis rebus autem saepenumero, Brute, vereor ne reprehendar, cum haec ad te
scribam, qui cum in philosophia, tum in optimo genere philosophiae tantum processeris. quod si facerem quasi te
erudiens, iure reprehenderer. sed ab eo plurimum absum neque, ut ea cognoscas, quae tibi notissima sunt, ad te mitto,
sed quia facillime in nomine tuo adquiesco, et quia te habeo aequissimum eorum studiorum, quae mihi communia tecum sunt,
existimatorem et iudicem. attendes igitur, ut soles, diligenter eamque controversiam diiudicabis, quae mihi fuit cum
avunculo tuo, divino ac singulari viro. nam in Tusculano cum essem vellemque e bibliotheca pueri Luculli quibusdam
libris uti, veni in eius villam, ut eos ipse, ut solebam, depromerem. quo cum venissem, M. Catonem, quem ibi esse
nescieram, vidi in bibliotheca sedentem multis circumfusum Stoicorum libris. erat enim, ut scis, in eo aviditas legendi,
nec satiari poterat, quippe qui ne reprehensionem quidem vulgi inanem reformidans in ipsa curia soleret legere saepe,
dum senatus cogeretur, nihil operae rei publicae detrahens. quo magis tum in summo otio maximaque copia quasi helluari
libris, si hoc verbo in tam clara re utendum est, videbatur. quod cum accidisset ut alter alterum necopinato
videremus, surrexit statim. deinde prima illa, quae in congressu solemus: Quid tu, inquit, huc? a villa enim, credo, et:
Si ibi te esse scissem, ad te ipse venissem.
Heri, inquam, ludis commissis ex urbe profectus veni ad vesperum. causa autem fuit huc veniendi ut quosdam hinc libros
promerem. et quidem, Cato, hanc totam copiam iam Lucullo nostro notam esse oportebit; nam his libris eum malo quam
reliquo ornatu villae delectari. est enim mihi magnae curae - quamquam hoc quidem proprium tuum munus est - ut ita
erudiatur, ut et patri et Caepioni nostro et tibi tam propinquo respondeat. laboro autem non sine causa; nam et avi eius
memoria moveor - nec enim ignoras, quanti fecerim Caepionem, qui, ut opinio mea fert, in principibus iam esset, si
viveret - et Lucullus mihi versatur ante oculos, vir cum virtutibus omnibus excellens, tum mecum et amicitia et omni
voluntate sententiaque coniunctus.
Praeclare, inquit, facis, cum et eorum memoriam tenes, quorum uterque tibi testamento liberos suos commendavit, et
puerum diligis. quod autem meum munus dicis non equidem recuso, sed te adiungo socium. addo etiam illud, multa iam mihi
dare signa puerum et pudoris et ingenii, sed aetatem vides.
類似loripsum.net這樣網(wǎng)站也提供在線生成服務(wù)。
radar 隨機(jī)日期生成
radar用來生成時間非常方便。
代碼也非常簡單
import radar
import datetime
#隨機(jī)日期
print(radar.random_date())
#隨機(jī)日期+時間
print(radar.random_datetime())
#隨機(jī)時間
print(radar.random_time())
#指定范圍隨機(jī)日期
print(radar.random_date(
start=datetime.datetime(year=1985, month=1, day=1),
stop=datetime.datetime(year=1989, month=12, day=30)))
#指定范圍隨機(jī)日期+時間
print(radar.random_datetime(
start=datetime.datetime(year=1985, month=1, day=1),
stop=datetime.datetime(year=1989, month=12, day=30)))
#指定范圍隨機(jī)時間
print(radar.random_time(
start="2018-01-10T09:00:10",
stop="2018-01-10T18:00:00"))
#radar默認(rèn)使用python-dateutil庫來解析日期,但是這個庫非常heavy,可以選擇使用輕量級的radar.utils.parse(快5倍)
print(radar.random_datetime(
start="2018-01-10T09:00:10",
stop="2018-01-10T18:00:00",
parse=radar.utils.parse))
#radar.utils.parse usage
start = radar.utils.parse('2018-01-01')
stop = radar.utils.parse('2018-01-05')
print(radar.random_datetime(start=start, stop=stop))
輸出為
2011-07-01
1997-10-25 16:59:15
04:45:21
1985-01-21 18:16:57
1988-06-27 02:49:24
12:49:16
2018-01-10 16:26:11
2018-01-02
Mimesis 產(chǎn)生mock data
mimesis提供了各類各樣數(shù)據(jù)。這些數(shù)據(jù)涉及到十幾種真實(shí)使用場景,比如Dummy data about transport (truck model, car etc.), Personal data (name, surname, age, email etc.), Payment data (credit_card, credit_card_network etc.)。
使用Mimesis首先要確定locale,Mimesis支持多達(dá)33種不同的語言,下面列子展示了德文和中文數(shù)據(jù)。
from mimesis import Personal
person_en = Personal('en')
print(person_en.full_name())
print(person_en.age())
print(person_en.favorite_movie())
print("*" * 20)
person_zh = Personal('zh')
print(person_zh.full_name())
print(person_zh.age())
print(person_zh.favorite_movie())
輸出為
Karoline Schneider
33
21
********************
香茗 米
22
星際迷航3:超越星辰
本文都以介紹英文為主
之前提到Mimesis提供多個不同的data provider來產(chǎn)生不同類別的數(shù)據(jù),下面介紹一些常用的provider以及基本使用方法。更多的providers請參考官方網(wǎng)站。
from mimesis import Personal, Address, Business, Payment, Text, Food
from mimesis.enums import Gender
person = Personal('en')
#可以傳遞性別給full_name()
print(person.full_name(Gender.MALE))
print(person.level_of_english())
print(person.nationality())
print(person.work_experience())
print(person.political_views())
print(person.worldview())
#自定義名字pattern
# pattern 可以有 ('U-d', 'U.d', 'UU-d', 'UU.d', 'UU_d', 'U_d', 'Ud', 'l-d', 'l.d', 'l_d', 'ld', 'default')
templates = ['l-d', 'U-d']
for item in templates:
print(person.username(template=item))
address1 = Address('en')
print(address1.coordinates())
print(address1.city())
business1 = Business('en')
print(business1.company())
print(business1.company_type())
payment1 = Payment('en')
print(payment1.paypal())
print(payment1.credit_card_expiration_date())
#mimesis也可以生成文字
text1 = Text('en')
print(text1.alphabet())
print(text1.answer())
print(text1.quote())
print(text1.title())
print(text1.word())
print(text1.words())
print(text1.sentence())
food1 = Food('en')
print(food1.drink())
print(food1.fruit())
print(food1.spices())
Generic(*args, **kwargs)方法提供了統(tǒng)一的接口,所有的provider都可以從這個方法進(jìn)入
from mimesis import Generic
g = Generic('en')
print(g.food.fruit())
print(g.address.postal_code())
如果你想使用自己的數(shù)據(jù),想定制化一下,可以自定義類,通過類屬性和方法輸出數(shù)據(jù)。
g1 = Generic('en')
class oneProvider(BaseProvider):
name = "dante"
class Meta:
name = "oneprovider"
def get_age(self):
return "31"
g1.add_provider(oneProvider)
print(g1.oneprovider.get_age())
print(g1.oneprovider.name)
如果不想一個一個生成數(shù)據(jù),而是想依據(jù)schema批量生成多個,可以使用Field對象和Schema對象完成。
field = Field('en')
body = (
lambda: {
#field里面的是API名稱
"name" : field('full_name', gender=Gender.FEMALE),
"age" : field('age'),
"email" : field('email'),
"occupation" : field('occupation')
}
)
schema = Schema(schema=body)
print(schema.create(iterations=1))
Faker 產(chǎn)生fake data
Faker的使用和Mimesis很類似。
Faker支持多語言。比如下面的列子就會輸出默認(rèn)的en_US和中文zh_CN
from faker import Faker
fake = Faker()
print(fake.name())
print(fake.address())
print(fake.city())
print("*" * 20)
fake = Faker('zh_CN')
print(fake.name())
print(fake.address())
print(fake.city())
輸出為
eggy Wood
17796 Johnson Fork Apt. 744
Donaldhaven, DC 41460-2738
Cannonland
********************
向鵬
廣西壯族自治區(qū)梅縣大興夏路w座 617055
杭州市
Faker提供多個不同的data provider來產(chǎn)生不同類別的數(shù)據(jù),下面介紹一些常用的provider以及基本使用方法。
from faker import Faker
fake = Faker()
# lorem ipsum
print(fake.word())
print(fake.text())
print(fake.paragraphs(5))
print(fake.company())
print(fake.credit_card_full())
print(fake.address())
print(fake.phone_number())
print(fake.date() + ' ' + fake.time())
print(fake.profile())
faker.providers支持創(chuàng)建自定義的provider
from faker import Faker
from faker.providers import BaseProvider
fake = Faker()
class oneProvider(BaseProvider):
def hello(self):
return "I am one provider"
fake.add_provider(oneProvider)
print(fake.hello())
Faker提供了命令行工具也非常方便,具體使用方法和可用參數(shù),請參考官方文檔
└─[0] faker -h
usage: faker [-h] [--version] [-o output] [-l LOCALE] [-r REPEAT] [-s SEP]
[-i [INCLUDE [INCLUDE ...]]]
[fake] [fake argument [fake argument ...]]
faker version 0.8.8
positional arguments:
fake name of the fake to generate output for (e.g. profile)
fake argument optional arguments to pass to the fake (e.g. the
profile fake takes an optional list of comma separated
field names as the first argument)
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
-o output redirect output to a file
-l LOCALE, --lang LOCALE
specify the language for a localized provider (e.g.
de_DE)
-r REPEAT, --repeat REPEAT
generate the specified number of outputs
-s SEP, --sep SEP use the specified separator after each output
-i [INCLUDE [INCLUDE ...]], --include [INCLUDE [INCLUDE ...]]
list of additional custom providers to user, given as
the import path of the module containing your Provider
class (not the provider class itself)
supported locales:
ar_AA, ar_EG, ar_JO, ar_PS, ar_SA, bg_BG, bs_BA, cs_CZ, de_AT, de_DE, dk_DK, el_GR, en, en_AU, en_CA, en_GB, en_TH, en_US, es, es_ES, es_MX, et_EE, fa_IR, fi_FI, fr_CH, fr_FR, he_IL, hi_IN, hr_HR, hu_HU, id_ID, it_IT, ja_JP, ka_GE, ko_KR, la, lt_LT, lv_LV, ne_NP, nl_BE, nl_NL, no_NO, pl_PL, pt_BR, pt_PT, ru_RU, sk_SK, sl_SI, sv_SE, th_TH, tr_TR, tw_GH, uk_UA, zh_CN, zh_TW
faker can take a locale as an argument, to return localized data. If no
localized provider is found, the factory falls back to the default en_US
locale.
examples:
$ faker address
968 Bahringer Garden Apt. 722
Kristinaland, NJ 09890
$ faker -l de_DE address
Samira-Niemeier-Allee 56
94812 Biedenkopf
$ faker profile ssn,birthdate
{'ssn': u'628-10-1085', 'birthdate': '2008-03-29'}
$ faker -r=3 -s=";" name
Willam Kertzmann;
Josiah Maggio;
Gayla Schmitt;
使用seed()可以重現(xiàn)之前的隨機(jī)數(shù)據(jù),這樣的話,每次運(yùn)行代碼就會產(chǎn)生一樣的數(shù)據(jù)。下面的代碼片段會產(chǎn)生一樣的結(jié)果,而不是每次都是隨機(jī)數(shù)據(jù)。
from faker import Faker
fake = Faker()
fake.seed(9527)
print(fake.name())